glibc/glibc-RHEL-15696-66.patch

From d672a98a1af106bd68deb15576710cd61363f7a6 Mon Sep 17 00:00:00 2001
From: "H.J. Lu" <hjl.tools@gmail.com>
Date: Tue, 2 Nov 2021 18:33:07 -0700
Subject: [PATCH] Add LLL_MUTEX_READ_LOCK [BZ #28537]
Content-type: text/plain; charset=UTF-8

CAS instruction is expensive.  From the x86 CPU's point of view, getting
a cache line for writing is more expensive than reading.  See Appendix
A.2 Spinlock in:

https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/xeon-lock-scaling-analysis-paper.pdf

The full compare and swap will grab the cache line exclusive and cause
excessive cache line bouncing.

Add LLL_MUTEX_READ_LOCK to do an atomic load and skip CAS in spinlock
loop if compare may fail to reduce cache line bouncing on contended locks.

Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
---
 nptl/pthread_mutex_lock.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/nptl/pthread_mutex_lock.c b/nptl/pthread_mutex_lock.c
index 60ada70d..eb4d8baa 100644
--- a/nptl/pthread_mutex_lock.c
+++ b/nptl/pthread_mutex_lock.c
@@ -56,6 +56,11 @@
 #define FORCE_ELISION(m, s)
 #endif
 
+#ifndef LLL_MUTEX_READ_LOCK
+# define LLL_MUTEX_READ_LOCK(mutex) \
+  atomic_load_relaxed (&(mutex)->__data.__lock)
+#endif
+
 static int __pthread_mutex_lock_full (pthread_mutex_t *mutex)
      __attribute_noinline__;
 
@@ -136,6 +141,8 @@ __pthread_mutex_lock (pthread_mutex_t *mutex)
 		  break;
 		}
 	      atomic_spin_nop ();
+	      if (LLL_MUTEX_READ_LOCK (mutex) != 0)
+		continue;
 	    }
 	  while (LLL_MUTEX_TRYLOCK (mutex) != 0);
 
-- 
GitLab
Import Intel hyperscale improvements (RHEL-15696) Resolves: RHEL-15696 Includes two additional (well, 1.5) upstream patches to resolve roundeven redirects. 2023-12-14 22:33:45 +00:00			`From d672a98a1af106bd68deb15576710cd61363f7a6 Mon Sep 17 00:00:00 2001`
			`From: "H.J. Lu" <hjl.tools@gmail.com>`
			`Date: Tue, 2 Nov 2021 18:33:07 -0700`
			`Subject: [PATCH] Add LLL_MUTEX_READ_LOCK [BZ #28537]`
			`Content-type: text/plain; charset=UTF-8`

			`CAS instruction is expensive. From the x86 CPU's point of view, getting`
			`a cache line for writing is more expensive than reading. See Appendix`
			`A.2 Spinlock in:`

			`https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/xeon-lock-scaling-analysis-paper.pdf`

			`The full compare and swap will grab the cache line exclusive and cause`
			`excessive cache line bouncing.`

			`Add LLL_MUTEX_READ_LOCK to do an atomic load and skip CAS in spinlock`
			`loop if compare may fail to reduce cache line bouncing on contended locks.`

			`Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>`
			`---`
			`nptl/pthread_mutex_lock.c \| 7 +++++++`
			`1 file changed, 7 insertions(+)`

			`diff --git a/nptl/pthread_mutex_lock.c b/nptl/pthread_mutex_lock.c`
			`index 60ada70d..eb4d8baa 100644`
			`--- a/nptl/pthread_mutex_lock.c`
			`+++ b/nptl/pthread_mutex_lock.c`
			`@@ -56,6 +56,11 @@`
			`#define FORCE_ELISION(m, s)`
			`#endif`

			`+#ifndef LLL_MUTEX_READ_LOCK`
			`+# define LLL_MUTEX_READ_LOCK(mutex) \`
			`+ atomic_load_relaxed (&(mutex)->__data.__lock)`
			`+#endif`
			`+`
			`static int __pthread_mutex_lock_full (pthread_mutex_t *mutex)`
			`__attribute_noinline__;`

			`@@ -136,6 +141,8 @@ __pthread_mutex_lock (pthread_mutex_t *mutex)`
			`break;`
			`}`
			`atomic_spin_nop ();`
			`+ if (LLL_MUTEX_READ_LOCK (mutex) != 0)`
			`+ continue;`
			`}`
			`while (LLL_MUTEX_TRYLOCK (mutex) != 0);`

			`--`
			`GitLab`