glibc/glibc-upstream-2.34-332.patch

commit a6b81f605dfba8650ea1f80122f41eb8e6c73dc7
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Tue Nov 2 18:33:07 2021 -0700

    Add LLL_MUTEX_READ_LOCK [BZ #28537]
    
    CAS instruction is expensive.  From the x86 CPU's point of view, getting
    a cache line for writing is more expensive than reading.  See Appendix
    A.2 Spinlock in:
    
    https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/xeon-lock-scaling-analysis-paper.pdf
    
    The full compare and swap will grab the cache line exclusive and cause
    excessive cache line bouncing.
    
    Add LLL_MUTEX_READ_LOCK to do an atomic load and skip CAS in spinlock
    loop if compare may fail to reduce cache line bouncing on contended locks.
    
    Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
    (cherry picked from commit d672a98a1af106bd68deb15576710cd61363f7a6)

diff --git a/nptl/pthread_mutex_lock.c b/nptl/pthread_mutex_lock.c
index a04e0158451c8fff..9f40928cc6b9a067 100644
--- a/nptl/pthread_mutex_lock.c
+++ b/nptl/pthread_mutex_lock.c
@@ -65,6 +65,11 @@ lll_mutex_lock_optimized (pthread_mutex_t *mutex)
 # define PTHREAD_MUTEX_VERSIONS 1
 #endif
 
+#ifndef LLL_MUTEX_READ_LOCK
+# define LLL_MUTEX_READ_LOCK(mutex) \
+  atomic_load_relaxed (&(mutex)->__data.__lock)
+#endif
+
 static int __pthread_mutex_lock_full (pthread_mutex_t *mutex)
      __attribute_noinline__;
 
@@ -142,6 +147,8 @@ PTHREAD_MUTEX_LOCK (pthread_mutex_t *mutex)
 		  break;
 		}
 	      atomic_spin_nop ();
+	      if (LLL_MUTEX_READ_LOCK (mutex) != 0)
+		continue;
 	    }
 	  while (LLL_MUTEX_TRYLOCK (mutex) != 0);
Import glibc-2.34-48.fc35 from f35 * Thu Oct 13 2022 Arjun Shankar <arjun@redhat.com> - 2.34-48 - Handle non-hostname CNAME aliases during name resolution (#2129005) - Sync with upstream branch release/2.34/master, commit e3976287b22422787f3cc6fc9adda58304b55bd9: - nscd: Drop local address tuple variable [BZ #29607] - x86-64: Require BMI1/BMI2 for AVX2 strrchr and wcsrchr implementations - x86-64: Require BMI2 and LZCNT for AVX2 memrchr implementation - x86-64: Require BMI2 for AVX2 (raw\|w)memchr implementations - x86-64: Require BMI2 for AVX2 wcs(n)cmp implementations - x86-64: Require BMI2 for AVX2 strncmp implementation - x86-64: Require BMI2 for AVX2 strcmp implementation - x86-64: Require BMI2 for AVX2 str(n)casecmp implementations - x86: include BMI1 and BMI2 in x86-64-v3 level - nptl: Add backoff mechanism to spinlock loop - sysdeps: Add 'get_fast_jitter' interace in fast-jitter.h - nptl: Effectively skip CAS in spinlock loop - Move assignment out of the CAS condition - Add LLL_MUTEX_READ_LOCK [BZ #28537] - Avoid extra load with CAS in __pthread_mutex_clocklock_common [BZ #28537] - Avoid extra load with CAS in __pthread_mutex_lock_full [BZ #28537] - resolv: Fix building tst-resolv-invalid-cname for earlier C standards - nss_dns: Rewrite _nss_dns_gethostbyname4_r using current interfaces - resolv: Add new tst-resolv-invalid-cname - nss_dns: In gaih_getanswer_slice, skip strange aliases (bug 12154) (#2129005) - nss_dns: Rewrite getanswer_r to match getanswer_ptr (bug 12154, bug 29305) - nss_dns: Remove remnants of IPv6 address mapping - nss_dns: Rewrite _nss_dns_gethostbyaddr2_r and getanswer_ptr - nss_dns: Split getanswer_ptr from getanswer_r - resolv: Add DNS packet parsing helpers geared towards wire format - resolv: Add internal __ns_name_length_uncompressed function - resolv: Add the __ns_samebinaryname function - resolv: Add internal __res_binary_hnok function - resolv: Add tst-resolv-aliases - resolv: Add tst-resolv-byaddr for testing reverse lookup - gconv: Use 64-bit interfaces in gconv_parseconfdir (bug 29583) - elf: Fix hwcaps string size overestimation - nscd: Fix netlink cache invalidation if epoll is used [BZ #29415] - Apply asm redirections in wchar.h before first use - Apply asm redirections in stdio.h before first use [BZ #27087] - elf: Call __libc_early_init for reused namespaces (bug 29528) Resolves: #2129005 Resolves: #2116960 2022-10-14 12:18:43 +00:00			`commit a6b81f605dfba8650ea1f80122f41eb8e6c73dc7`
			`Author: H.J. Lu <hjl.tools@gmail.com>`
			`Date: Tue Nov 2 18:33:07 2021 -0700`

			`Add LLL_MUTEX_READ_LOCK [BZ #28537]`

			`CAS instruction is expensive. From the x86 CPU's point of view, getting`
			`a cache line for writing is more expensive than reading. See Appendix`
			`A.2 Spinlock in:`

			`https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/xeon-lock-scaling-analysis-paper.pdf`

			`The full compare and swap will grab the cache line exclusive and cause`
			`excessive cache line bouncing.`

			`Add LLL_MUTEX_READ_LOCK to do an atomic load and skip CAS in spinlock`
			`loop if compare may fail to reduce cache line bouncing on contended locks.`

			`Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>`
			`(cherry picked from commit d672a98a1af106bd68deb15576710cd61363f7a6)`

			`diff --git a/nptl/pthread_mutex_lock.c b/nptl/pthread_mutex_lock.c`
			`index a04e0158451c8fff..9f40928cc6b9a067 100644`
			`--- a/nptl/pthread_mutex_lock.c`
			`+++ b/nptl/pthread_mutex_lock.c`
			`@@ -65,6 +65,11 @@ lll_mutex_lock_optimized (pthread_mutex_t *mutex)`
			`# define PTHREAD_MUTEX_VERSIONS 1`
			`#endif`

			`+#ifndef LLL_MUTEX_READ_LOCK`
			`+# define LLL_MUTEX_READ_LOCK(mutex) \`
			`+ atomic_load_relaxed (&(mutex)->__data.__lock)`
			`+#endif`
			`+`
			`static int __pthread_mutex_lock_full (pthread_mutex_t *mutex)`
			`__attribute_noinline__;`

			`@@ -142,6 +147,8 @@ PTHREAD_MUTEX_LOCK (pthread_mutex_t *mutex)`
			`break;`
			`}`
			`atomic_spin_nop ();`
			`+ if (LLL_MUTEX_READ_LOCK (mutex) != 0)`
			`+ continue;`
			`}`
			`while (LLL_MUTEX_TRYLOCK (mutex) != 0);`