diff --git a/glibc-RHEL-2419-1.patch b/glibc-RHEL-2419-1.patch new file mode 100644 index 0000000..932bd01 --- /dev/null +++ b/glibc-RHEL-2419-1.patch @@ -0,0 +1,447 @@ +commit 1db84775f831a1494993ce9c118deaf9537cc50a +Author: Frank Barrus +Date: Wed Dec 4 07:55:02 2024 -0500 + + pthreads NPTL: lost wakeup fix 2 + + This fixes the lost wakeup (from a bug in signal stealing) with a change + in the usage of g_signals[] in the condition variable internal state. + It also completely eliminates the concept and handling of signal stealing, + as well as the need for signalers to block to wait for waiters to wake + up every time there is a G1/G2 switch. This greatly reduces the average + and maximum latency for pthread_cond_signal. + + The g_signals[] field now contains a signal count that is relative to + the current g1_start value. Since it is a 32-bit field, and the LSB is + still reserved (though not currently used anymore), it has a 31-bit value + that corresponds to the low 31 bits of the sequence number in g1_start. + (since g1_start also has an LSB flag, this means bits 31:1 in g_signals + correspond to bits 31:1 in g1_start, plus the current signal count) + + By making the signal count relative to g1_start, there is no longer + any ambiguity or A/B/A issue, and thus any checks before blocking, + including the futex call itself, are guaranteed not to block if the G1/G2 + switch occurs, even if the signal count remains the same. This allows + initially safely blocking in G2 until the switch to G1 occurs, and + then transitioning from G1 to a new G1 or G2, and always being able to + distinguish the state change. This removes the race condition and A/B/A + problems that otherwise ocurred if a late (pre-empted) waiter were to + resume just as the futex call attempted to block on g_signal since + otherwise there was no last opportunity to re-check things like whether + the current G1 group was already closed. + + By fixing these issues, the signal stealing code can be eliminated, + since there is no concept of signal stealing anymore. The code to block + for all waiters to exit g_refs can also be removed, since any waiters + that are still in the g_refs region can be guaranteed to safely wake + up and exit. If there are still any left at this time, they are all + sent one final futex wakeup to ensure that they are not blocked any + longer, but there is no need for the signaller to block and wait for + them to wake up and exit the g_refs region. + + The signal count is then effectively "zeroed" but since it is now + relative to g1_start, this is done by advancing it to a new value that + can be observed by any pending blocking waiters. Any late waiters can + always tell the difference, and can thus just cleanly exit if they are + in a stale G1 or G2. They can never steal a signal from the current + G1 if they are not in the current G1, since the signal value that has + to match in the cmpxchg has the low 31 bits of the g1_start value + contained in it, and that's first checked, and then it won't match if + there's a G1/G2 change. + + Note: the 31-bit sequence number used in g_signals is designed to + handle wrap-around when checking the signal count, but if the entire + 31-bit wraparound (2 billion signals) occurs while there is still a + late waiter that has not yet resumed, and it happens to then match + the current g1_start low bits, and the pre-emption occurs after the + normal "closed group" checks (which are 64-bit) but then hits the + futex syscall and signal consuming code, then an A/B/A issue could + still result and cause an incorrect assumption about whether it + should block. This particular scenario seems unlikely in practice. + Note that once awake from the futex, the waiter would notice the + closed group before consuming the signal (since that's still a 64-bit + check that would not be aliased in the wrap-around in g_signals), + so the biggest impact would be blocking on the futex until the next + full wakeup from a G1/G2 switch. + + Signed-off-by: Frank Barrus + Reviewed-by: Carlos O'Donell + +# Conflicts: +# nptl/pthread_cond_common.c (Missing spelling fixes) +# nptl/pthread_cond_wait.c (Likewise) + +diff --git a/nptl/pthread_cond_common.c b/nptl/pthread_cond_common.c +index c35b9ef03afd2c64..b1565b780d175d3a 100644 +--- a/nptl/pthread_cond_common.c ++++ b/nptl/pthread_cond_common.c +@@ -341,7 +341,6 @@ static bool __attribute__ ((unused)) + __condvar_quiesce_and_switch_g1 (pthread_cond_t *cond, uint64_t wseq, + unsigned int *g1index, int private) + { +- const unsigned int maxspin = 0; + unsigned int g1 = *g1index; + + /* If there is no waiter in G2, we don't do anything. The expression may +@@ -362,84 +361,46 @@ __condvar_quiesce_and_switch_g1 (pthread_cond_t *cond, uint64_t wseq, + * New waiters arriving concurrently with the group switching will all go + into G2 until we atomically make the switch. Waiters existing in G2 + are not affected. +- * Waiters in G1 will be closed out immediately by setting a flag in +- __g_signals, which will prevent waiters from blocking using a futex on +- __g_signals and also notifies them that the group is closed. As a +- result, they will eventually remove their group reference, allowing us +- to close switch group roles. */ +- +- /* First, set the closed flag on __g_signals. This tells waiters that are +- about to wait that they shouldn't do that anymore. This basically +- serves as an advance notificaton of the upcoming change to __g1_start; +- waiters interpret it as if __g1_start was larger than their waiter +- sequence position. This allows us to change __g1_start after waiting +- for all existing waiters with group references to leave, which in turn +- makes recovery after stealing a signal simpler because it then can be +- skipped if __g1_start indicates that the group is closed (otherwise, +- we would have to recover always because waiters don't know how big their +- groups are). Relaxed MO is fine. */ +- atomic_fetch_or_relaxed (cond->__data.__g_signals + g1, 1); +- +- /* Wait until there are no group references anymore. The fetch-or operation +- injects us into the modification order of __g_refs; release MO ensures +- that waiters incrementing __g_refs after our fetch-or see the previous +- changes to __g_signals and to __g1_start that had to happen before we can +- switch this G1 and alias with an older group (we have two groups, so +- aliasing requires switching group roles twice). Note that nobody else +- can have set the wake-request flag, so we do not have to act upon it. +- +- Also note that it is harmless if older waiters or waiters from this G1 +- get a group reference after we have quiesced the group because it will +- remain closed for them either because of the closed flag in __g_signals +- or the later update to __g1_start. New waiters will never arrive here +- but instead continue to go into the still current G2. */ +- unsigned r = atomic_fetch_or_release (cond->__data.__g_refs + g1, 0); +- while ((r >> 1) > 0) +- { +- for (unsigned int spin = maxspin; ((r >> 1) > 0) && (spin > 0); spin--) +- { +- /* TODO Back off. */ +- r = atomic_load_relaxed (cond->__data.__g_refs + g1); +- } +- if ((r >> 1) > 0) +- { +- /* There is still a waiter after spinning. Set the wake-request +- flag and block. Relaxed MO is fine because this is just about +- this futex word. +- +- Update r to include the set wake-request flag so that the upcoming +- futex_wait only blocks if the flag is still set (otherwise, we'd +- violate the basic client-side futex protocol). */ +- r = atomic_fetch_or_relaxed (cond->__data.__g_refs + g1, 1) | 1; +- +- if ((r >> 1) > 0) +- futex_wait_simple (cond->__data.__g_refs + g1, r, private); +- /* Reload here so we eventually see the most recent value even if we +- do not spin. */ +- r = atomic_load_relaxed (cond->__data.__g_refs + g1); +- } +- } +- /* Acquire MO so that we synchronize with the release operation that waiters +- use to decrement __g_refs and thus happen after the waiters we waited +- for. */ +- atomic_thread_fence_acquire (); ++ * Waiters in G1 will be closed out immediately by the advancing of ++ __g_signals to the next "lowseq" (low 31 bits of the new g1_start), ++ which will prevent waiters from blocking using a futex on ++ __g_signals since it provides enough signals for all possible ++ remaining waiters. As a result, they can each consume a signal ++ and they will eventually remove their group reference. */ + + /* Update __g1_start, which finishes closing this group. The value we add + will never be negative because old_orig_size can only be zero when we + switch groups the first time after a condvar was initialized, in which +- case G1 will be at index 1 and we will add a value of 1. See above for +- why this takes place after waiting for quiescence of the group. ++ case G1 will be at index 1 and we will add a value of 1. + Relaxed MO is fine because the change comes with no additional + constraints that others would have to observe. */ + __condvar_add_g1_start_relaxed (cond, + (old_orig_size << 1) + (g1 == 1 ? 1 : - 1)); + +- /* Now reopen the group, thus enabling waiters to again block using the +- futex controlled by __g_signals. Release MO so that observers that see +- no signals (and thus can block) also see the write __g1_start and thus +- that this is now a new group (see __pthread_cond_wait_common for the +- matching acquire MO loads). */ +- atomic_store_release (cond->__data.__g_signals + g1, 0); ++ unsigned int lowseq = ((old_g1_start + old_orig_size) << 1) & ~1U; ++ ++ /* If any waiters still hold group references (and thus could be blocked), ++ then wake them all up now and prevent any running ones from blocking. ++ This is effectively a catch-all for any possible current or future ++ bugs that can allow the group size to reach 0 before all G1 waiters ++ have been awakened or at least given signals to consume, or any ++ other case that can leave blocked (or about to block) older waiters.. */ ++ if ((atomic_fetch_or_release (cond->__data.__g_refs + g1, 0) >> 1) > 0) ++ { ++ /* First advance signals to the end of the group (i.e. enough signals ++ for the entire G1 group) to ensure that waiters which have not ++ yet blocked in the futex will not block. ++ Note that in the vast majority of cases, this should never ++ actually be necessary, since __g_signals will have enough ++ signals for the remaining g_refs waiters. As an optimization, ++ we could check this first before proceeding, although that ++ could still leave the potential for futex lost wakeup bugs ++ if the signal count was non-zero but the futex wakeup ++ was somehow lost. */ ++ atomic_store_release (cond->__data.__g_signals + g1, lowseq); ++ ++ futex_wake (cond->__data.__g_signals + g1, INT_MAX, private); ++ } + + /* At this point, the old G1 is now a valid new G2 (but not in use yet). + No old waiter can neither grab a signal nor acquire a reference without +@@ -451,6 +412,10 @@ __condvar_quiesce_and_switch_g1 (pthread_cond_t *cond, uint64_t wseq, + g1 ^= 1; + *g1index ^= 1; + ++ /* Now advance the new G1 g_signals to the new lowseq, giving it ++ an effective signal count of 0 to start. */ ++ atomic_store_release (cond->__data.__g_signals + g1, lowseq); ++ + /* These values are just observed by signalers, and thus protected by the + lock. */ + unsigned int orig_size = wseq - (old_g1_start + old_orig_size); +diff --git a/nptl/pthread_cond_wait.c b/nptl/pthread_cond_wait.c +index dc8c511f1a72517a..c34280c6bc9e80fb 100644 +--- a/nptl/pthread_cond_wait.c ++++ b/nptl/pthread_cond_wait.c +@@ -239,9 +239,7 @@ __condvar_cleanup_waiting (void *arg) + signaled), and a reference count. + + The group reference count is used to maintain the number of waiters that +- are using the group's futex. Before a group can change its role, the +- reference count must show that no waiters are using the futex anymore; this +- prevents ABA issues on the futex word. ++ are using the group's futex. + + To represent which intervals in the waiter sequence the groups cover (and + thus also which group slot contains G1 or G2), we use a 64b counter to +@@ -301,11 +299,12 @@ __condvar_cleanup_waiting (void *arg) + last reference. + * Reference count used by waiters concurrently with signalers that have + acquired the condvar-internal lock. +- __g_signals: The number of signals that can still be consumed. ++ __g_signals: The number of signals that can still be consumed, relative to ++ the current g1_start. (i.e. bits 31 to 1 of __g_signals are bits ++ 31 to 1 of g1_start with the signal count added) + * Used as a futex word by waiters. Used concurrently by waiters and + signalers. +- * LSB is true iff this group has been completely signaled (i.e., it is +- closed). ++ * LSB is currently reserved and 0. + __g_size: Waiters remaining in this group (i.e., which have not been + signaled yet. + * Accessed by signalers and waiters that cancel waiting (both do so only +@@ -329,18 +328,6 @@ __condvar_cleanup_waiting (void *arg) + sufficient because if a waiter can see a sufficiently large value, it could + have also consume a signal in the waiters group. + +- Waiters try to grab a signal from __g_signals without holding a reference +- count, which can lead to stealing a signal from a more recent group after +- their own group was already closed. They cannot always detect whether they +- in fact did because they do not know when they stole, but they can +- conservatively add a signal back to the group they stole from; if they +- did so unnecessarily, all that happens is a spurious wake-up. To make this +- even less likely, __g1_start contains the index of the current g2 too, +- which allows waiters to check if there aliasing on the group slots; if +- there wasn't, they didn't steal from the current G1, which means that the +- G1 they stole from must have been already closed and they do not need to +- fix anything. +- + It is essential that the last field in pthread_cond_t is __g_signals[1]: + The previous condvar used a pointer-sized field in pthread_cond_t, so a + PTHREAD_COND_INITIALIZER from that condvar implementation might only +@@ -436,6 +423,9 @@ __pthread_cond_wait_common (pthread_cond_t *cond, pthread_mutex_t *mutex, + { + while (1) + { ++ uint64_t g1_start = __condvar_load_g1_start_relaxed (cond); ++ unsigned int lowseq = (g1_start & 1) == g ? signals : g1_start & ~1U; ++ + /* Spin-wait first. + Note that spinning first without checking whether a timeout + passed might lead to what looks like a spurious wake-up even +@@ -447,35 +437,45 @@ __pthread_cond_wait_common (pthread_cond_t *cond, pthread_mutex_t *mutex, + having to compare against the current time seems to be the right + choice from a performance perspective for most use cases. */ + unsigned int spin = maxspin; +- while (signals == 0 && spin > 0) ++ while (spin > 0 && ((int)(signals - lowseq) < 2)) + { + /* Check that we are not spinning on a group that's already + closed. */ +- if (seq < (__condvar_load_g1_start_relaxed (cond) >> 1)) +- goto done; ++ if (seq < (g1_start >> 1)) ++ break; + + /* TODO Back off. */ + + /* Reload signals. See above for MO. */ + signals = atomic_load_acquire (cond->__data.__g_signals + g); ++ g1_start = __condvar_load_g1_start_relaxed (cond); ++ lowseq = (g1_start & 1) == g ? signals : g1_start & ~1U; + spin--; + } + +- /* If our group will be closed as indicated by the flag on signals, +- don't bother grabbing a signal. */ +- if (signals & 1) +- goto done; +- +- /* If there is an available signal, don't block. */ +- if (signals != 0) ++ if (seq < (g1_start >> 1)) ++ { ++ /* If the group is closed already, ++ then this waiter originally had enough extra signals to ++ consume, up until the time its group was closed. */ ++ goto done; ++ } ++ ++ /* If there is an available signal, don't block. ++ If __g1_start has advanced at all, then we must be in G1 ++ by now, perhaps in the process of switching back to an older ++ G2, but in either case we're allowed to consume the available ++ signal and should not block anymore. */ ++ if ((int)(signals - lowseq) >= 2) + break; + + /* No signals available after spinning, so prepare to block. + We first acquire a group reference and use acquire MO for that so + that we synchronize with the dummy read-modify-write in + __condvar_quiesce_and_switch_g1 if we read from that. In turn, +- in this case this will make us see the closed flag on __g_signals +- that designates a concurrent attempt to reuse the group's slot. ++ in this case this will make us see the advancement of __g_signals ++ to the upcoming new g1_start that occurs with a concurrent ++ attempt to reuse the group's slot. + We use acquire MO for the __g_signals check to make the + __g1_start check work (see spinning above). + Note that the group reference acquisition will not mask the +@@ -483,15 +483,24 @@ __pthread_cond_wait_common (pthread_cond_t *cond, pthread_mutex_t *mutex, + an atomic read-modify-write operation and thus extend the release + sequence. */ + atomic_fetch_add_acquire (cond->__data.__g_refs + g, 2); +- if (((atomic_load_acquire (cond->__data.__g_signals + g) & 1) != 0) +- || (seq < (__condvar_load_g1_start_relaxed (cond) >> 1))) ++ signals = atomic_load_acquire (cond->__data.__g_signals + g); ++ g1_start = __condvar_load_g1_start_relaxed (cond); ++ lowseq = (g1_start & 1) == g ? signals : g1_start & ~1U; ++ ++ if (seq < (g1_start >> 1)) + { +- /* Our group is closed. Wake up any signalers that might be +- waiting. */ ++ /* group is closed already, so don't block */ + __condvar_dec_grefs (cond, g, private); + goto done; + } + ++ if ((int)(signals - lowseq) >= 2) ++ { ++ /* a signal showed up or G1/G2 switched after we grabbed the refcount */ ++ __condvar_dec_grefs (cond, g, private); ++ break; ++ } ++ + // Now block. + struct _pthread_cleanup_buffer buffer; + struct _condvar_cleanup_buffer cbuffer; +@@ -502,7 +511,7 @@ __pthread_cond_wait_common (pthread_cond_t *cond, pthread_mutex_t *mutex, + __pthread_cleanup_push (&buffer, __condvar_cleanup_waiting, &cbuffer); + + err = __futex_abstimed_wait_cancelable64 ( +- cond->__data.__g_signals + g, 0, clockid, abstime, private); ++ cond->__data.__g_signals + g, signals, clockid, abstime, private); + + __pthread_cleanup_pop (&buffer, 0); + +@@ -525,6 +534,8 @@ __pthread_cond_wait_common (pthread_cond_t *cond, pthread_mutex_t *mutex, + signals = atomic_load_acquire (cond->__data.__g_signals + g); + } + ++ if (seq < (__condvar_load_g1_start_relaxed (cond) >> 1)) ++ goto done; + } + /* Try to grab a signal. Use acquire MO so that we see an up-to-date value + of __g1_start below (see spinning above for a similar case). In +@@ -533,69 +544,6 @@ __pthread_cond_wait_common (pthread_cond_t *cond, pthread_mutex_t *mutex, + while (!atomic_compare_exchange_weak_acquire (cond->__data.__g_signals + g, + &signals, signals - 2)); + +- /* We consumed a signal but we could have consumed from a more recent group +- that aliased with ours due to being in the same group slot. If this +- might be the case our group must be closed as visible through +- __g1_start. */ +- uint64_t g1_start = __condvar_load_g1_start_relaxed (cond); +- if (seq < (g1_start >> 1)) +- { +- /* We potentially stole a signal from a more recent group but we do not +- know which group we really consumed from. +- We do not care about groups older than current G1 because they are +- closed; we could have stolen from these, but then we just add a +- spurious wake-up for the current groups. +- We will never steal a signal from current G2 that was really intended +- for G2 because G2 never receives signals (until it becomes G1). We +- could have stolen a signal from G2 that was conservatively added by a +- previous waiter that also thought it stole a signal -- but given that +- that signal was added unnecessarily, it's not a problem if we steal +- it. +- Thus, the remaining case is that we could have stolen from the current +- G1, where "current" means the __g1_start value we observed. However, +- if the current G1 does not have the same slot index as we do, we did +- not steal from it and do not need to undo that. This is the reason +- for putting a bit with G2's index into__g1_start as well. */ +- if (((g1_start & 1) ^ 1) == g) +- { +- /* We have to conservatively undo our potential mistake of stealing +- a signal. We can stop trying to do that when the current G1 +- changes because other spinning waiters will notice this too and +- __condvar_quiesce_and_switch_g1 has checked that there are no +- futex waiters anymore before switching G1. +- Relaxed MO is fine for the __g1_start load because we need to +- merely be able to observe this fact and not have to observe +- something else as well. +- ??? Would it help to spin for a little while to see whether the +- current G1 gets closed? This might be worthwhile if the group is +- small or close to being closed. */ +- unsigned int s = atomic_load_relaxed (cond->__data.__g_signals + g); +- while (__condvar_load_g1_start_relaxed (cond) == g1_start) +- { +- /* Try to add a signal. We don't need to acquire the lock +- because at worst we can cause a spurious wake-up. If the +- group is in the process of being closed (LSB is true), this +- has an effect similar to us adding a signal. */ +- if (((s & 1) != 0) +- || atomic_compare_exchange_weak_relaxed +- (cond->__data.__g_signals + g, &s, s + 2)) +- { +- /* If we added a signal, we also need to add a wake-up on +- the futex. We also need to do that if we skipped adding +- a signal because the group is being closed because +- while __condvar_quiesce_and_switch_g1 could have closed +- the group, it might stil be waiting for futex waiters to +- leave (and one of those waiters might be the one we stole +- the signal from, which cause it to block using the +- futex). */ +- futex_wake (cond->__data.__g_signals + g, 1, private); +- break; +- } +- /* TODO Back off. */ +- } +- } +- } +- + done: + + /* Confirm that we have been woken. We do that before acquiring the mutex diff --git a/glibc-RHEL-2419-10.patch b/glibc-RHEL-2419-10.patch new file mode 100644 index 0000000..e0481ce --- /dev/null +++ b/glibc-RHEL-2419-10.patch @@ -0,0 +1,39 @@ +Partial revert of commit c36fc50781995e6758cae2b6927839d0157f213c +to restore the layout of pthread_cond_t and avoid a downstream +rpminspect and abidiff (libabigail tooling) spurious warning +about internal ABI changes. Without this change all RHEL developers +using pthread_cond_t would have to audit and waive the warning. +The alternative is to update the supression lists used in abidiff, +propagate that to the rpminspect service, and wait for that to +complete before doing the update. The more conservative position +is the partial revert of the layout change. + +This is a downstream-only change and is not required upstream. + +diff --git a/sysdeps/nptl/bits/thread-shared-types.h b/sysdeps/nptl/bits/thread-shared-types.h +index 5cd33b765d9689eb..5644472323fe5424 100644 +--- a/sysdeps/nptl/bits/thread-shared-types.h ++++ b/sysdeps/nptl/bits/thread-shared-types.h +@@ -109,7 +109,8 @@ struct __pthread_cond_s + unsigned int __high; + } __g1_start32; + }; +- unsigned int __g_size[2] __LOCK_ALIGNMENT; ++ unsigned int __glibc_unused___g_refs[2] __LOCK_ALIGNMENT; ++ unsigned int __g_size[2]; + unsigned int __g1_orig_size; + unsigned int __wrefs; + unsigned int __g_signals[2]; +diff --git a/sysdeps/nptl/pthread.h b/sysdeps/nptl/pthread.h +index 7ea6001784783371..43146e91c9d9579b 100644 +--- a/sysdeps/nptl/pthread.h ++++ b/sysdeps/nptl/pthread.h +@@ -152,7 +152,7 @@ enum + + + /* Conditional variable handling. */ +-#define PTHREAD_COND_INITIALIZER { { {0}, {0}, {0, 0}, 0, 0, {0, 0} } } ++#define PTHREAD_COND_INITIALIZER { { {0}, {0}, {0, 0}, {0, 0}, 0, 0, {0, 0} } } + + + /* Cleanup buffers */ diff --git a/glibc-RHEL-2419-2.patch b/glibc-RHEL-2419-2.patch new file mode 100644 index 0000000..99e7873 --- /dev/null +++ b/glibc-RHEL-2419-2.patch @@ -0,0 +1,133 @@ +commit 0cc973160c23bb67f895bc887dd6942d29f8fee3 +Author: Malte Skarupke +Date: Wed Dec 4 07:55:22 2024 -0500 + + nptl: Update comments and indentation for new condvar implementation + + Some comments were wrong after the most recent commit. This fixes that. + + Also fixing indentation where it was using spaces instead of tabs. + + Signed-off-by: Malte Skarupke + Reviewed-by: Carlos O'Donell + +diff --git a/nptl/pthread_cond_common.c b/nptl/pthread_cond_common.c +index b1565b780d175d3a..b355e38fb57862b1 100644 +--- a/nptl/pthread_cond_common.c ++++ b/nptl/pthread_cond_common.c +@@ -361,8 +361,9 @@ __condvar_quiesce_and_switch_g1 (pthread_cond_t *cond, uint64_t wseq, + * New waiters arriving concurrently with the group switching will all go + into G2 until we atomically make the switch. Waiters existing in G2 + are not affected. +- * Waiters in G1 will be closed out immediately by the advancing of +- __g_signals to the next "lowseq" (low 31 bits of the new g1_start), ++ * Waiters in G1 have already received a signal and been woken. If they ++ haven't woken yet, they will be closed out immediately by the advancing ++ of __g_signals to the next "lowseq" (low 31 bits of the new g1_start), + which will prevent waiters from blocking using a futex on + __g_signals since it provides enough signals for all possible + remaining waiters. As a result, they can each consume a signal +diff --git a/nptl/pthread_cond_wait.c b/nptl/pthread_cond_wait.c +index c34280c6bc9e80fb..7dabcb15d2d818e7 100644 +--- a/nptl/pthread_cond_wait.c ++++ b/nptl/pthread_cond_wait.c +@@ -250,7 +250,7 @@ __condvar_cleanup_waiting (void *arg) + figure out whether they are in a group that has already been completely + signaled (i.e., if the current G1 starts at a later position that the + waiter's position). Waiters cannot determine whether they are currently +- in G2 or G1 -- but they do not have too because all they are interested in ++ in G2 or G1 -- but they do not have to because all they are interested in + is whether there are available signals, and they always start in G2 (whose + group slot they know because of the bit in the waiter sequence. Signalers + will simply fill the right group until it is completely signaled and can +@@ -413,7 +413,7 @@ __pthread_cond_wait_common (pthread_cond_t *cond, pthread_mutex_t *mutex, + } + + /* Now wait until a signal is available in our group or it is closed. +- Acquire MO so that if we observe a value of zero written after group ++ Acquire MO so that if we observe (signals == lowseq) after group + switching in __condvar_quiesce_and_switch_g1, we synchronize with that + store and will see the prior update of __g1_start done while switching + groups too. */ +@@ -423,8 +423,8 @@ __pthread_cond_wait_common (pthread_cond_t *cond, pthread_mutex_t *mutex, + { + while (1) + { +- uint64_t g1_start = __condvar_load_g1_start_relaxed (cond); +- unsigned int lowseq = (g1_start & 1) == g ? signals : g1_start & ~1U; ++ uint64_t g1_start = __condvar_load_g1_start_relaxed (cond); ++ unsigned int lowseq = (g1_start & 1) == g ? signals : g1_start & ~1U; + + /* Spin-wait first. + Note that spinning first without checking whether a timeout +@@ -448,21 +448,21 @@ __pthread_cond_wait_common (pthread_cond_t *cond, pthread_mutex_t *mutex, + + /* Reload signals. See above for MO. */ + signals = atomic_load_acquire (cond->__data.__g_signals + g); +- g1_start = __condvar_load_g1_start_relaxed (cond); +- lowseq = (g1_start & 1) == g ? signals : g1_start & ~1U; ++ g1_start = __condvar_load_g1_start_relaxed (cond); ++ lowseq = (g1_start & 1) == g ? signals : g1_start & ~1U; + spin--; + } + +- if (seq < (g1_start >> 1)) ++ if (seq < (g1_start >> 1)) + { +- /* If the group is closed already, ++ /* If the group is closed already, + then this waiter originally had enough extra signals to + consume, up until the time its group was closed. */ + goto done; +- } ++ } + + /* If there is an available signal, don't block. +- If __g1_start has advanced at all, then we must be in G1 ++ If __g1_start has advanced at all, then we must be in G1 + by now, perhaps in the process of switching back to an older + G2, but in either case we're allowed to consume the available + signal and should not block anymore. */ +@@ -484,22 +484,23 @@ __pthread_cond_wait_common (pthread_cond_t *cond, pthread_mutex_t *mutex, + sequence. */ + atomic_fetch_add_acquire (cond->__data.__g_refs + g, 2); + signals = atomic_load_acquire (cond->__data.__g_signals + g); +- g1_start = __condvar_load_g1_start_relaxed (cond); +- lowseq = (g1_start & 1) == g ? signals : g1_start & ~1U; ++ g1_start = __condvar_load_g1_start_relaxed (cond); ++ lowseq = (g1_start & 1) == g ? signals : g1_start & ~1U; + +- if (seq < (g1_start >> 1)) ++ if (seq < (g1_start >> 1)) + { +- /* group is closed already, so don't block */ ++ /* group is closed already, so don't block */ + __condvar_dec_grefs (cond, g, private); + goto done; + } + + if ((int)(signals - lowseq) >= 2) + { +- /* a signal showed up or G1/G2 switched after we grabbed the refcount */ ++ /* a signal showed up or G1/G2 switched after we grabbed the ++ refcount */ + __condvar_dec_grefs (cond, g, private); + break; +- } ++ } + + // Now block. + struct _pthread_cleanup_buffer buffer; +@@ -537,10 +538,8 @@ __pthread_cond_wait_common (pthread_cond_t *cond, pthread_mutex_t *mutex, + if (seq < (__condvar_load_g1_start_relaxed (cond) >> 1)) + goto done; + } +- /* Try to grab a signal. Use acquire MO so that we see an up-to-date value +- of __g1_start below (see spinning above for a similar case). In +- particular, if we steal from a more recent group, we will also see a +- more recent __g1_start below. */ ++ /* Try to grab a signal. See above for MO. (if we do another loop ++ iteration we need to see the correct value of g1_start) */ + while (!atomic_compare_exchange_weak_acquire (cond->__data.__g_signals + g, + &signals, signals - 2)); + diff --git a/glibc-RHEL-2419-3.patch b/glibc-RHEL-2419-3.patch new file mode 100644 index 0000000..00456d2 --- /dev/null +++ b/glibc-RHEL-2419-3.patch @@ -0,0 +1,67 @@ +commit b42cc6af11062c260c7dfa91f1c89891366fed3e +Author: Malte Skarupke +Date: Wed Dec 4 07:55:50 2024 -0500 + + nptl: Remove unnecessary catch-all-wake in condvar group switch + + This wake is unnecessary. We only switch groups after every sleeper in a group + has been woken. Sure, they may take a while to actually wake up and may still + hold a reference, but waking them a second time doesn't speed that up. Instead + this just makes the code more complicated and may hide problems. + + In particular this safety wake wouldn't even have helped with the bug that was + fixed by Barrus' patch: The bug there was that pthread_cond_signal would not + switch g1 when it should, so we wouldn't even have entered this code path. + + Signed-off-by: Malte Skarupke + Reviewed-by: Carlos O'Donell + +diff --git a/nptl/pthread_cond_common.c b/nptl/pthread_cond_common.c +index b355e38fb57862b1..517ad52077829552 100644 +--- a/nptl/pthread_cond_common.c ++++ b/nptl/pthread_cond_common.c +@@ -361,13 +361,7 @@ __condvar_quiesce_and_switch_g1 (pthread_cond_t *cond, uint64_t wseq, + * New waiters arriving concurrently with the group switching will all go + into G2 until we atomically make the switch. Waiters existing in G2 + are not affected. +- * Waiters in G1 have already received a signal and been woken. If they +- haven't woken yet, they will be closed out immediately by the advancing +- of __g_signals to the next "lowseq" (low 31 bits of the new g1_start), +- which will prevent waiters from blocking using a futex on +- __g_signals since it provides enough signals for all possible +- remaining waiters. As a result, they can each consume a signal +- and they will eventually remove their group reference. */ ++ * Waiters in G1 have already received a signal and been woken. */ + + /* Update __g1_start, which finishes closing this group. The value we add + will never be negative because old_orig_size can only be zero when we +@@ -380,29 +374,6 @@ __condvar_quiesce_and_switch_g1 (pthread_cond_t *cond, uint64_t wseq, + + unsigned int lowseq = ((old_g1_start + old_orig_size) << 1) & ~1U; + +- /* If any waiters still hold group references (and thus could be blocked), +- then wake them all up now and prevent any running ones from blocking. +- This is effectively a catch-all for any possible current or future +- bugs that can allow the group size to reach 0 before all G1 waiters +- have been awakened or at least given signals to consume, or any +- other case that can leave blocked (or about to block) older waiters.. */ +- if ((atomic_fetch_or_release (cond->__data.__g_refs + g1, 0) >> 1) > 0) +- { +- /* First advance signals to the end of the group (i.e. enough signals +- for the entire G1 group) to ensure that waiters which have not +- yet blocked in the futex will not block. +- Note that in the vast majority of cases, this should never +- actually be necessary, since __g_signals will have enough +- signals for the remaining g_refs waiters. As an optimization, +- we could check this first before proceeding, although that +- could still leave the potential for futex lost wakeup bugs +- if the signal count was non-zero but the futex wakeup +- was somehow lost. */ +- atomic_store_release (cond->__data.__g_signals + g1, lowseq); +- +- futex_wake (cond->__data.__g_signals + g1, INT_MAX, private); +- } +- + /* At this point, the old G1 is now a valid new G2 (but not in use yet). + No old waiter can neither grab a signal nor acquire a reference without + noticing that __g1_start is larger. diff --git a/glibc-RHEL-2419-4.patch b/glibc-RHEL-2419-4.patch new file mode 100644 index 0000000..486903c --- /dev/null +++ b/glibc-RHEL-2419-4.patch @@ -0,0 +1,107 @@ +commit 4f7b051f8ee3feff1b53b27a906f245afaa9cee1 +Author: Malte Skarupke +Date: Wed Dec 4 07:56:13 2024 -0500 + + nptl: Remove unnecessary quadruple check in pthread_cond_wait + + pthread_cond_wait was checking whether it was in a closed group no less than + four times. Checking once is enough. Here are the four checks: + + 1. While spin-waiting. This was dead code: maxspin is set to 0 and has been + for years. + 2. Before deciding to go to sleep, and before incrementing grefs: I kept this + 3. After incrementing grefs. There is no reason to think that the group would + close while we do an atomic increment. Obviously it could close at any + point, but that doesn't mean we have to recheck after every step. This + check was equally good as check 2, except it has to do more work. + 4. When we find ourselves in a group that has a signal. We only get here after + we check that we're not in a closed group. There is no need to check again. + The check would only have helped in cases where the compare_exchange in the + next line would also have failed. Relying on the compare_exchange is fine. + + Removing the duplicate checks clarifies the code. + + Signed-off-by: Malte Skarupke + Reviewed-by: Carlos O'Donell + +diff --git a/nptl/pthread_cond_wait.c b/nptl/pthread_cond_wait.c +index 7dabcb15d2d818e7..ba9a19bedc2c176f 100644 +--- a/nptl/pthread_cond_wait.c ++++ b/nptl/pthread_cond_wait.c +@@ -367,7 +367,6 @@ static __always_inline int + __pthread_cond_wait_common (pthread_cond_t *cond, pthread_mutex_t *mutex, + clockid_t clockid, const struct __timespec64 *abstime) + { +- const int maxspin = 0; + int err; + int result = 0; + +@@ -426,33 +425,6 @@ __pthread_cond_wait_common (pthread_cond_t *cond, pthread_mutex_t *mutex, + uint64_t g1_start = __condvar_load_g1_start_relaxed (cond); + unsigned int lowseq = (g1_start & 1) == g ? signals : g1_start & ~1U; + +- /* Spin-wait first. +- Note that spinning first without checking whether a timeout +- passed might lead to what looks like a spurious wake-up even +- though we should return ETIMEDOUT (e.g., if the caller provides +- an absolute timeout that is clearly in the past). However, +- (1) spurious wake-ups are allowed, (2) it seems unlikely that a +- user will (ab)use pthread_cond_wait as a check for whether a +- point in time is in the past, and (3) spinning first without +- having to compare against the current time seems to be the right +- choice from a performance perspective for most use cases. */ +- unsigned int spin = maxspin; +- while (spin > 0 && ((int)(signals - lowseq) < 2)) +- { +- /* Check that we are not spinning on a group that's already +- closed. */ +- if (seq < (g1_start >> 1)) +- break; +- +- /* TODO Back off. */ +- +- /* Reload signals. See above for MO. */ +- signals = atomic_load_acquire (cond->__data.__g_signals + g); +- g1_start = __condvar_load_g1_start_relaxed (cond); +- lowseq = (g1_start & 1) == g ? signals : g1_start & ~1U; +- spin--; +- } +- + if (seq < (g1_start >> 1)) + { + /* If the group is closed already, +@@ -483,24 +455,6 @@ __pthread_cond_wait_common (pthread_cond_t *cond, pthread_mutex_t *mutex, + an atomic read-modify-write operation and thus extend the release + sequence. */ + atomic_fetch_add_acquire (cond->__data.__g_refs + g, 2); +- signals = atomic_load_acquire (cond->__data.__g_signals + g); +- g1_start = __condvar_load_g1_start_relaxed (cond); +- lowseq = (g1_start & 1) == g ? signals : g1_start & ~1U; +- +- if (seq < (g1_start >> 1)) +- { +- /* group is closed already, so don't block */ +- __condvar_dec_grefs (cond, g, private); +- goto done; +- } +- +- if ((int)(signals - lowseq) >= 2) +- { +- /* a signal showed up or G1/G2 switched after we grabbed the +- refcount */ +- __condvar_dec_grefs (cond, g, private); +- break; +- } + + // Now block. + struct _pthread_cleanup_buffer buffer; +@@ -534,9 +488,6 @@ __pthread_cond_wait_common (pthread_cond_t *cond, pthread_mutex_t *mutex, + /* Reload signals. See above for MO. */ + signals = atomic_load_acquire (cond->__data.__g_signals + g); + } +- +- if (seq < (__condvar_load_g1_start_relaxed (cond) >> 1)) +- goto done; + } + /* Try to grab a signal. See above for MO. (if we do another loop + iteration we need to see the correct value of g1_start) */ diff --git a/glibc-RHEL-2419-5.patch b/glibc-RHEL-2419-5.patch new file mode 100644 index 0000000..3df4fa8 --- /dev/null +++ b/glibc-RHEL-2419-5.patch @@ -0,0 +1,172 @@ +commit c36fc50781995e6758cae2b6927839d0157f213c +Author: Malte Skarupke +Date: Wed Dec 4 07:56:38 2024 -0500 + + nptl: Remove g_refs from condition variables + + This variable used to be needed to wait in group switching until all sleepers + have confirmed that they have woken. This is no longer needed. Nothing waits + on this variable so there is no need to track how many threads are currently + asleep in each group. + + Signed-off-by: Malte Skarupke + Reviewed-by: Carlos O'Donell + +# Conflicts: +# nptl/tst-cond22.c (No atomic wide counter refactor) +# sysdeps/nptl/bits/thread-shared-types.h (Likewise) + +diff --git a/nptl/pthread_cond_wait.c b/nptl/pthread_cond_wait.c +index ba9a19bedc2c176f..9652dbafe08dfde1 100644 +--- a/nptl/pthread_cond_wait.c ++++ b/nptl/pthread_cond_wait.c +@@ -144,23 +144,6 @@ __condvar_cancel_waiting (pthread_cond_t *cond, uint64_t seq, unsigned int g, + } + } + +-/* Wake up any signalers that might be waiting. */ +-static void +-__condvar_dec_grefs (pthread_cond_t *cond, unsigned int g, int private) +-{ +- /* Release MO to synchronize-with the acquire load in +- __condvar_quiesce_and_switch_g1. */ +- if (atomic_fetch_add_release (cond->__data.__g_refs + g, -2) == 3) +- { +- /* Clear the wake-up request flag before waking up. We do not need more +- than relaxed MO and it doesn't matter if we apply this for an aliased +- group because we wake all futex waiters right after clearing the +- flag. */ +- atomic_fetch_and_relaxed (cond->__data.__g_refs + g, ~(unsigned int) 1); +- futex_wake (cond->__data.__g_refs + g, INT_MAX, private); +- } +-} +- + /* Clean-up for cancellation of waiters waiting for normal signals. We cancel + our registration as a waiter, confirm we have woken up, and re-acquire the + mutex. */ +@@ -172,8 +155,6 @@ __condvar_cleanup_waiting (void *arg) + pthread_cond_t *cond = cbuffer->cond; + unsigned g = cbuffer->wseq & 1; + +- __condvar_dec_grefs (cond, g, cbuffer->private); +- + __condvar_cancel_waiting (cond, cbuffer->wseq >> 1, g, cbuffer->private); + /* FIXME With the current cancellation implementation, it is possible that + a thread is cancelled after it has returned from a syscall. This could +@@ -328,15 +309,6 @@ __condvar_cleanup_waiting (void *arg) + sufficient because if a waiter can see a sufficiently large value, it could + have also consume a signal in the waiters group. + +- It is essential that the last field in pthread_cond_t is __g_signals[1]: +- The previous condvar used a pointer-sized field in pthread_cond_t, so a +- PTHREAD_COND_INITIALIZER from that condvar implementation might only +- initialize 4 bytes to zero instead of the 8 bytes we need (i.e., 44 bytes +- in total instead of the 48 we need). __g_signals[1] is not accessed before +- the first group switch (G2 starts at index 0), which will set its value to +- zero after a harmless fetch-or whose return value is ignored. This +- effectively completes initialization. +- + + Limitations: + * This condvar isn't designed to allow for more than +@@ -441,21 +413,6 @@ __pthread_cond_wait_common (pthread_cond_t *cond, pthread_mutex_t *mutex, + if ((int)(signals - lowseq) >= 2) + break; + +- /* No signals available after spinning, so prepare to block. +- We first acquire a group reference and use acquire MO for that so +- that we synchronize with the dummy read-modify-write in +- __condvar_quiesce_and_switch_g1 if we read from that. In turn, +- in this case this will make us see the advancement of __g_signals +- to the upcoming new g1_start that occurs with a concurrent +- attempt to reuse the group's slot. +- We use acquire MO for the __g_signals check to make the +- __g1_start check work (see spinning above). +- Note that the group reference acquisition will not mask the +- release MO when decrementing the reference count because we use +- an atomic read-modify-write operation and thus extend the release +- sequence. */ +- atomic_fetch_add_acquire (cond->__data.__g_refs + g, 2); +- + // Now block. + struct _pthread_cleanup_buffer buffer; + struct _condvar_cleanup_buffer cbuffer; +@@ -472,18 +429,11 @@ __pthread_cond_wait_common (pthread_cond_t *cond, pthread_mutex_t *mutex, + + if (__glibc_unlikely (err == ETIMEDOUT || err == EOVERFLOW)) + { +- __condvar_dec_grefs (cond, g, private); +- /* If we timed out, we effectively cancel waiting. Note that +- we have decremented __g_refs before cancellation, so that a +- deadlock between waiting for quiescence of our group in +- __condvar_quiesce_and_switch_g1 and us trying to acquire +- the lock during cancellation is not possible. */ ++ /* If we timed out, we effectively cancel waiting. */ + __condvar_cancel_waiting (cond, seq, g, private); + result = err; + goto done; + } +- else +- __condvar_dec_grefs (cond, g, private); + + /* Reload signals. See above for MO. */ + signals = atomic_load_acquire (cond->__data.__g_signals + g); +diff --git a/nptl/tst-cond22.c b/nptl/tst-cond22.c +index 64f19ea0a55af057..ebeeeaf666070076 100644 +--- a/nptl/tst-cond22.c ++++ b/nptl/tst-cond22.c +@@ -106,10 +106,10 @@ do_test (void) + status = 1; + } + +- printf ("cond = { %llu, %llu, %u/%u/%u, %u/%u/%u, %u, %u }\n", ++ printf ("cond = { %llu, %llu, %u/%u, %u/%u, %u, %u }\n", + c.__data.__wseq, c.__data.__g1_start, +- c.__data.__g_signals[0], c.__data.__g_refs[0], c.__data.__g_size[0], +- c.__data.__g_signals[1], c.__data.__g_refs[1], c.__data.__g_size[1], ++ c.__data.__g_signals[0], c.__data.__g_size[0], ++ c.__data.__g_signals[1], c.__data.__g_size[1], + c.__data.__g1_orig_size, c.__data.__wrefs); + + if (pthread_create (&th, NULL, tf, (void *) 1l) != 0) +@@ -149,10 +149,10 @@ do_test (void) + status = 1; + } + +- printf ("cond = { %llu, %llu, %u/%u/%u, %u/%u/%u, %u, %u }\n", ++ printf ("cond = { %llu, %llu, %u/%u, %u/%u, %u, %u }\n", + c.__data.__wseq, c.__data.__g1_start, +- c.__data.__g_signals[0], c.__data.__g_refs[0], c.__data.__g_size[0], +- c.__data.__g_signals[1], c.__data.__g_refs[1], c.__data.__g_size[1], ++ c.__data.__g_signals[0], c.__data.__g_size[0], ++ c.__data.__g_signals[1], c.__data.__g_size[1], + c.__data.__g1_orig_size, c.__data.__wrefs); + + return status; +diff --git a/sysdeps/nptl/bits/thread-shared-types.h b/sysdeps/nptl/bits/thread-shared-types.h +index 44bf1e358dbdaaff..5cd33b765d9689eb 100644 +--- a/sysdeps/nptl/bits/thread-shared-types.h ++++ b/sysdeps/nptl/bits/thread-shared-types.h +@@ -109,8 +109,7 @@ struct __pthread_cond_s + unsigned int __high; + } __g1_start32; + }; +- unsigned int __g_refs[2] __LOCK_ALIGNMENT; +- unsigned int __g_size[2]; ++ unsigned int __g_size[2] __LOCK_ALIGNMENT; + unsigned int __g1_orig_size; + unsigned int __wrefs; + unsigned int __g_signals[2]; +diff --git a/sysdeps/nptl/pthread.h b/sysdeps/nptl/pthread.h +index 43146e91c9d9579b..7ea6001784783371 100644 +--- a/sysdeps/nptl/pthread.h ++++ b/sysdeps/nptl/pthread.h +@@ -152,7 +152,7 @@ enum + + + /* Conditional variable handling. */ +-#define PTHREAD_COND_INITIALIZER { { {0}, {0}, {0, 0}, {0, 0}, 0, 0, {0, 0} } } ++#define PTHREAD_COND_INITIALIZER { { {0}, {0}, {0, 0}, 0, 0, {0, 0} } } + + + /* Cleanup buffers */ diff --git a/glibc-RHEL-2419-6.patch b/glibc-RHEL-2419-6.patch new file mode 100644 index 0000000..7e966e5 --- /dev/null +++ b/glibc-RHEL-2419-6.patch @@ -0,0 +1,91 @@ +commit 929a4764ac90382616b6a21f099192b2475da674 +Author: Malte Skarupke +Date: Wed Dec 4 08:03:44 2024 -0500 + + nptl: Use a single loop in pthread_cond_wait instaed of a nested loop + + The loop was a little more complicated than necessary. There was only one + break statement out of the inner loop, and the outer loop was nearly empty. + So just remove the outer loop, moving its code to the one break statement in + the inner loop. This allows us to replace all gotos with break statements. + + Signed-off-by: Malte Skarupke + Reviewed-by: Carlos O'Donell + +diff --git a/nptl/pthread_cond_wait.c b/nptl/pthread_cond_wait.c +index 9652dbafe08dfde1..4886056d136db138 100644 +--- a/nptl/pthread_cond_wait.c ++++ b/nptl/pthread_cond_wait.c +@@ -383,17 +383,15 @@ __pthread_cond_wait_common (pthread_cond_t *cond, pthread_mutex_t *mutex, + return err; + } + +- /* Now wait until a signal is available in our group or it is closed. +- Acquire MO so that if we observe (signals == lowseq) after group +- switching in __condvar_quiesce_and_switch_g1, we synchronize with that +- store and will see the prior update of __g1_start done while switching +- groups too. */ +- unsigned int signals = atomic_load_acquire (cond->__data.__g_signals + g); +- +- do +- { ++ + while (1) + { ++ /* Now wait until a signal is available in our group or it is closed. ++ Acquire MO so that if we observe (signals == lowseq) after group ++ switching in __condvar_quiesce_and_switch_g1, we synchronize with that ++ store and will see the prior update of __g1_start done while switching ++ groups too. */ ++ unsigned int signals = atomic_load_acquire (cond->__data.__g_signals + g); + uint64_t g1_start = __condvar_load_g1_start_relaxed (cond); + unsigned int lowseq = (g1_start & 1) == g ? signals : g1_start & ~1U; + +@@ -402,7 +400,7 @@ __pthread_cond_wait_common (pthread_cond_t *cond, pthread_mutex_t *mutex, + /* If the group is closed already, + then this waiter originally had enough extra signals to + consume, up until the time its group was closed. */ +- goto done; ++ break; + } + + /* If there is an available signal, don't block. +@@ -411,7 +409,16 @@ __pthread_cond_wait_common (pthread_cond_t *cond, pthread_mutex_t *mutex, + G2, but in either case we're allowed to consume the available + signal and should not block anymore. */ + if ((int)(signals - lowseq) >= 2) +- break; ++ { ++ /* Try to grab a signal. See above for MO. (if we do another loop ++ iteration we need to see the correct value of g1_start) */ ++ if (atomic_compare_exchange_weak_acquire ( ++ cond->__data.__g_signals + g, ++ &signals, signals - 2)) ++ break; ++ else ++ continue; ++ } + + // Now block. + struct _pthread_cleanup_buffer buffer; +@@ -432,19 +439,9 @@ __pthread_cond_wait_common (pthread_cond_t *cond, pthread_mutex_t *mutex, + /* If we timed out, we effectively cancel waiting. */ + __condvar_cancel_waiting (cond, seq, g, private); + result = err; +- goto done; ++ break; + } +- +- /* Reload signals. See above for MO. */ +- signals = atomic_load_acquire (cond->__data.__g_signals + g); + } +- } +- /* Try to grab a signal. See above for MO. (if we do another loop +- iteration we need to see the correct value of g1_start) */ +- while (!atomic_compare_exchange_weak_acquire (cond->__data.__g_signals + g, +- &signals, signals - 2)); +- +- done: + + /* Confirm that we have been woken. We do that before acquiring the mutex + to allow for execution of pthread_cond_destroy while having acquired the diff --git a/glibc-RHEL-2419-7.patch b/glibc-RHEL-2419-7.patch new file mode 100644 index 0000000..87c54f2 --- /dev/null +++ b/glibc-RHEL-2419-7.patch @@ -0,0 +1,138 @@ +commit ee6c14ed59d480720721aaacc5fb03213dc153da +Author: Malte Skarupke +Date: Wed Dec 4 08:04:10 2024 -0500 + + nptl: Fix indentation + + In my previous change I turned a nested loop into a simple loop. I'm doing + the resulting indentation changes in a separate commit to make the diff on + the previous commit easier to review. + + Signed-off-by: Malte Skarupke + Reviewed-by: Carlos O'Donell + +diff --git a/nptl/pthread_cond_wait.c b/nptl/pthread_cond_wait.c +index 4886056d136db138..6c130436b016977a 100644 +--- a/nptl/pthread_cond_wait.c ++++ b/nptl/pthread_cond_wait.c +@@ -384,65 +384,65 @@ __pthread_cond_wait_common (pthread_cond_t *cond, pthread_mutex_t *mutex, + } + + +- while (1) +- { +- /* Now wait until a signal is available in our group or it is closed. +- Acquire MO so that if we observe (signals == lowseq) after group +- switching in __condvar_quiesce_and_switch_g1, we synchronize with that +- store and will see the prior update of __g1_start done while switching +- groups too. */ +- unsigned int signals = atomic_load_acquire (cond->__data.__g_signals + g); +- uint64_t g1_start = __condvar_load_g1_start_relaxed (cond); +- unsigned int lowseq = (g1_start & 1) == g ? signals : g1_start & ~1U; +- +- if (seq < (g1_start >> 1)) +- { +- /* If the group is closed already, +- then this waiter originally had enough extra signals to +- consume, up until the time its group was closed. */ +- break; +- } +- +- /* If there is an available signal, don't block. +- If __g1_start has advanced at all, then we must be in G1 +- by now, perhaps in the process of switching back to an older +- G2, but in either case we're allowed to consume the available +- signal and should not block anymore. */ +- if ((int)(signals - lowseq) >= 2) +- { +- /* Try to grab a signal. See above for MO. (if we do another loop +- iteration we need to see the correct value of g1_start) */ +- if (atomic_compare_exchange_weak_acquire ( +- cond->__data.__g_signals + g, ++ while (1) ++ { ++ /* Now wait until a signal is available in our group or it is closed. ++ Acquire MO so that if we observe (signals == lowseq) after group ++ switching in __condvar_quiesce_and_switch_g1, we synchronize with that ++ store and will see the prior update of __g1_start done while switching ++ groups too. */ ++ unsigned int signals = atomic_load_acquire (cond->__data.__g_signals + g); ++ uint64_t g1_start = __condvar_load_g1_start_relaxed (cond); ++ unsigned int lowseq = (g1_start & 1) == g ? signals : g1_start & ~1U; ++ ++ if (seq < (g1_start >> 1)) ++ { ++ /* If the group is closed already, ++ then this waiter originally had enough extra signals to ++ consume, up until the time its group was closed. */ ++ break; ++ } ++ ++ /* If there is an available signal, don't block. ++ If __g1_start has advanced at all, then we must be in G1 ++ by now, perhaps in the process of switching back to an older ++ G2, but in either case we're allowed to consume the available ++ signal and should not block anymore. */ ++ if ((int)(signals - lowseq) >= 2) ++ { ++ /* Try to grab a signal. See above for MO. (if we do another loop ++ iteration we need to see the correct value of g1_start) */ ++ if (atomic_compare_exchange_weak_acquire ( ++ cond->__data.__g_signals + g, + &signals, signals - 2)) +- break; +- else +- continue; +- } +- +- // Now block. +- struct _pthread_cleanup_buffer buffer; +- struct _condvar_cleanup_buffer cbuffer; +- cbuffer.wseq = wseq; +- cbuffer.cond = cond; +- cbuffer.mutex = mutex; +- cbuffer.private = private; +- __pthread_cleanup_push (&buffer, __condvar_cleanup_waiting, &cbuffer); +- +- err = __futex_abstimed_wait_cancelable64 ( +- cond->__data.__g_signals + g, signals, clockid, abstime, private); +- +- __pthread_cleanup_pop (&buffer, 0); +- +- if (__glibc_unlikely (err == ETIMEDOUT || err == EOVERFLOW)) +- { +- /* If we timed out, we effectively cancel waiting. */ +- __condvar_cancel_waiting (cond, seq, g, private); +- result = err; + break; +- } ++ else ++ continue; + } + ++ // Now block. ++ struct _pthread_cleanup_buffer buffer; ++ struct _condvar_cleanup_buffer cbuffer; ++ cbuffer.wseq = wseq; ++ cbuffer.cond = cond; ++ cbuffer.mutex = mutex; ++ cbuffer.private = private; ++ __pthread_cleanup_push (&buffer, __condvar_cleanup_waiting, &cbuffer); ++ ++ err = __futex_abstimed_wait_cancelable64 ( ++ cond->__data.__g_signals + g, signals, clockid, abstime, private); ++ ++ __pthread_cleanup_pop (&buffer, 0); ++ ++ if (__glibc_unlikely (err == ETIMEDOUT || err == EOVERFLOW)) ++ { ++ /* If we timed out, we effectively cancel waiting. */ ++ __condvar_cancel_waiting (cond, seq, g, private); ++ result = err; ++ break; ++ } ++ } ++ + /* Confirm that we have been woken. We do that before acquiring the mutex + to allow for execution of pthread_cond_destroy while having acquired the + mutex. */ diff --git a/glibc-RHEL-2419-8.patch b/glibc-RHEL-2419-8.patch new file mode 100644 index 0000000..eff1b1a --- /dev/null +++ b/glibc-RHEL-2419-8.patch @@ -0,0 +1,147 @@ +commit 4b79e27a5073c02f6bff9aa8f4791230a0ab1867 +Author: Malte Skarupke +Date: Wed Dec 4 08:04:54 2024 -0500 + + nptl: rename __condvar_quiesce_and_switch_g1 + + This function no longer waits for threads to leave g1, so rename it to + __condvar_switch_g1 + + Signed-off-by: Malte Skarupke + Reviewed-by: Carlos O'Donell + +diff --git a/nptl/pthread_cond_broadcast.c b/nptl/pthread_cond_broadcast.c +index f1275b2f15817788..3dff819952718892 100644 +--- a/nptl/pthread_cond_broadcast.c ++++ b/nptl/pthread_cond_broadcast.c +@@ -61,7 +61,7 @@ ___pthread_cond_broadcast (pthread_cond_t *cond) + cond->__data.__g_size[g1] << 1); + cond->__data.__g_size[g1] = 0; + +- /* We need to wake G1 waiters before we quiesce G1 below. */ ++ /* We need to wake G1 waiters before we switch G1 below. */ + /* TODO Only set it if there are indeed futex waiters. We could + also try to move this out of the critical section in cases when + G2 is empty (and we don't need to quiesce). */ +@@ -70,7 +70,7 @@ ___pthread_cond_broadcast (pthread_cond_t *cond) + + /* G1 is complete. Step (2) is next unless there are no waiters in G2, in + which case we can stop. */ +- if (__condvar_quiesce_and_switch_g1 (cond, wseq, &g1, private)) ++ if (__condvar_switch_g1 (cond, wseq, &g1, private)) + { + /* Step (3): Send signals to all waiters in the old G2 / new G1. */ + atomic_fetch_add_relaxed (cond->__data.__g_signals + g1, +diff --git a/nptl/pthread_cond_common.c b/nptl/pthread_cond_common.c +index 517ad52077829552..7b2b1e4605f163e7 100644 +--- a/nptl/pthread_cond_common.c ++++ b/nptl/pthread_cond_common.c +@@ -329,16 +329,15 @@ __condvar_get_private (int flags) + return FUTEX_SHARED; + } + +-/* This closes G1 (whose index is in G1INDEX), waits for all futex waiters to +- leave G1, converts G1 into a fresh G2, and then switches group roles so that +- the former G2 becomes the new G1 ending at the current __wseq value when we +- eventually make the switch (WSEQ is just an observation of __wseq by the +- signaler). ++/* This closes G1 (whose index is in G1INDEX), converts G1 into a fresh G2, ++ and then switches group roles so that the former G2 becomes the new G1 ++ ending at the current __wseq value when we eventually make the switch ++ (WSEQ is just an observation of __wseq by the signaler). + If G2 is empty, it will not switch groups because then it would create an + empty G1 which would require switching groups again on the next signal. + Returns false iff groups were not switched because G2 was empty. */ + static bool __attribute__ ((unused)) +-__condvar_quiesce_and_switch_g1 (pthread_cond_t *cond, uint64_t wseq, ++__condvar_switch_g1 (pthread_cond_t *cond, uint64_t wseq, + unsigned int *g1index, int private) + { + unsigned int g1 = *g1index; +@@ -354,8 +353,7 @@ __condvar_quiesce_and_switch_g1 (pthread_cond_t *cond, uint64_t wseq, + + cond->__data.__g_size[g1 ^ 1]) == 0) + return false; + +- /* Now try to close and quiesce G1. We have to consider the following kinds +- of waiters: ++ /* We have to consider the following kinds of waiters: + * Waiters from less recent groups than G1 are not affected because + nothing will change for them apart from __g1_start getting larger. + * New waiters arriving concurrently with the group switching will all go +@@ -363,12 +361,12 @@ __condvar_quiesce_and_switch_g1 (pthread_cond_t *cond, uint64_t wseq, + are not affected. + * Waiters in G1 have already received a signal and been woken. */ + +- /* Update __g1_start, which finishes closing this group. The value we add +- will never be negative because old_orig_size can only be zero when we +- switch groups the first time after a condvar was initialized, in which +- case G1 will be at index 1 and we will add a value of 1. +- Relaxed MO is fine because the change comes with no additional +- constraints that others would have to observe. */ ++ /* Update __g1_start, which closes this group. The value we add will never ++ be negative because old_orig_size can only be zero when we switch groups ++ the first time after a condvar was initialized, in which case G1 will be ++ at index 1 and we will add a value of 1. Relaxed MO is fine because the ++ change comes with no additional constraints that others would have to ++ observe. */ + __condvar_add_g1_start_relaxed (cond, + (old_orig_size << 1) + (g1 == 1 ? 1 : - 1)); + +diff --git a/nptl/pthread_cond_signal.c b/nptl/pthread_cond_signal.c +index 171193b13e203290..4f7639e386fc207a 100644 +--- a/nptl/pthread_cond_signal.c ++++ b/nptl/pthread_cond_signal.c +@@ -70,18 +70,17 @@ ___pthread_cond_signal (pthread_cond_t *cond) + bool do_futex_wake = false; + + /* If G1 is still receiving signals, we put the signal there. If not, we +- check if G2 has waiters, and if so, quiesce and switch G1 to the former +- G2; if this results in a new G1 with waiters (G2 might have cancellations +- already, see __condvar_quiesce_and_switch_g1), we put the signal in the +- new G1. */ ++ check if G2 has waiters, and if so, switch G1 to the former G2; if this ++ results in a new G1 with waiters (G2 might have cancellations already, ++ see __condvar_switch_g1), we put the signal in the new G1. */ + if ((cond->__data.__g_size[g1] != 0) +- || __condvar_quiesce_and_switch_g1 (cond, wseq, &g1, private)) ++ || __condvar_switch_g1 (cond, wseq, &g1, private)) + { + /* Add a signal. Relaxed MO is fine because signaling does not need to +- establish a happens-before relation (see above). We do not mask the +- release-MO store when initializing a group in +- __condvar_quiesce_and_switch_g1 because we use an atomic +- read-modify-write and thus extend that store's release sequence. */ ++ establish a happens-before relation (see above). We do not mask the ++ release-MO store when initializing a group in __condvar_switch_g1 ++ because we use an atomic read-modify-write and thus extend that ++ store's release sequence. */ + atomic_fetch_add_relaxed (cond->__data.__g_signals + g1, 2); + cond->__data.__g_size[g1]--; + /* TODO Only set it if there are indeed futex waiters. */ +diff --git a/nptl/pthread_cond_wait.c b/nptl/pthread_cond_wait.c +index 6c130436b016977a..173bd134164eed44 100644 +--- a/nptl/pthread_cond_wait.c ++++ b/nptl/pthread_cond_wait.c +@@ -355,8 +355,7 @@ __pthread_cond_wait_common (pthread_cond_t *cond, pthread_mutex_t *mutex, + because we do not need to establish any happens-before relation with + signalers (see __pthread_cond_signal); modification order alone + establishes a total order of waiters/signals. We do need acquire MO +- to synchronize with group reinitialization in +- __condvar_quiesce_and_switch_g1. */ ++ to synchronize with group reinitialization in __condvar_switch_g1. */ + uint64_t wseq = __condvar_fetch_add_wseq_acquire (cond, 2); + /* Find our group's index. We always go into what was G2 when we acquired + our position. */ +@@ -388,9 +387,9 @@ __pthread_cond_wait_common (pthread_cond_t *cond, pthread_mutex_t *mutex, + { + /* Now wait until a signal is available in our group or it is closed. + Acquire MO so that if we observe (signals == lowseq) after group +- switching in __condvar_quiesce_and_switch_g1, we synchronize with that +- store and will see the prior update of __g1_start done while switching +- groups too. */ ++ switching in __condvar_switch_g1, we synchronize with that store and ++ will see the prior update of __g1_start done while switching groups ++ too. */ + unsigned int signals = atomic_load_acquire (cond->__data.__g_signals + g); + uint64_t g1_start = __condvar_load_g1_start_relaxed (cond); + unsigned int lowseq = (g1_start & 1) == g ? signals : g1_start & ~1U; diff --git a/glibc-RHEL-2419-9.patch b/glibc-RHEL-2419-9.patch new file mode 100644 index 0000000..4f94d43 --- /dev/null +++ b/glibc-RHEL-2419-9.patch @@ -0,0 +1,179 @@ +commit 91bb902f58264a2fd50fbce8f39a9a290dd23706 +Author: Malte Skarupke +Date: Wed Dec 4 08:05:40 2024 -0500 + + nptl: Use all of g1_start and g_signals + + The LSB of g_signals was unused. The LSB of g1_start was used to indicate + which group is G2. This was used to always go to sleep in pthread_cond_wait + if a waiter is in G2. A comment earlier in the file says that this is not + correct to do: + + "Waiters cannot determine whether they are currently in G2 or G1 -- but they + do not have to because all they are interested in is whether there are + available signals" + + I either would have had to update the comment, or get rid of the check. I + chose to get rid of the check. In fact I don't quite know why it was there. + There will never be available signals for group G2, so we didn't need the + special case. Even if there were, this would just be a spurious wake. This + might have caught some cases where the count has wrapped around, but it + wouldn't reliably do that, (and even if it did, why would you want to force a + sleep in that case?) and we don't support that many concurrent waiters + anyway. Getting rid of it allows us to use one more bit, making us more + robust to wraparound. + + Signed-off-by: Malte Skarupke + Reviewed-by: Carlos O'Donell + +diff --git a/nptl/pthread_cond_broadcast.c b/nptl/pthread_cond_broadcast.c +index 3dff819952718892..6fd6cfe9d002c5d5 100644 +--- a/nptl/pthread_cond_broadcast.c ++++ b/nptl/pthread_cond_broadcast.c +@@ -58,7 +58,7 @@ ___pthread_cond_broadcast (pthread_cond_t *cond) + { + /* Add as many signals as the remaining size of the group. */ + atomic_fetch_add_relaxed (cond->__data.__g_signals + g1, +- cond->__data.__g_size[g1] << 1); ++ cond->__data.__g_size[g1]); + cond->__data.__g_size[g1] = 0; + + /* We need to wake G1 waiters before we switch G1 below. */ +@@ -74,7 +74,7 @@ ___pthread_cond_broadcast (pthread_cond_t *cond) + { + /* Step (3): Send signals to all waiters in the old G2 / new G1. */ + atomic_fetch_add_relaxed (cond->__data.__g_signals + g1, +- cond->__data.__g_size[g1] << 1); ++ cond->__data.__g_size[g1]); + cond->__data.__g_size[g1] = 0; + /* TODO Only set it if there are indeed futex waiters. */ + do_futex_wake = true; +diff --git a/nptl/pthread_cond_common.c b/nptl/pthread_cond_common.c +index 7b2b1e4605f163e7..485aca4076a372d7 100644 +--- a/nptl/pthread_cond_common.c ++++ b/nptl/pthread_cond_common.c +@@ -348,9 +348,9 @@ __condvar_switch_g1 (pthread_cond_t *cond, uint64_t wseq, + behavior. + Note that this works correctly for a zero-initialized condvar too. */ + unsigned int old_orig_size = __condvar_get_orig_size (cond); +- uint64_t old_g1_start = __condvar_load_g1_start_relaxed (cond) >> 1; +- if (((unsigned) (wseq - old_g1_start - old_orig_size) +- + cond->__data.__g_size[g1 ^ 1]) == 0) ++ uint64_t old_g1_start = __condvar_load_g1_start_relaxed (cond); ++ uint64_t new_g1_start = old_g1_start + old_orig_size; ++ if (((unsigned) (wseq - new_g1_start) + cond->__data.__g_size[g1 ^ 1]) == 0) + return false; + + /* We have to consider the following kinds of waiters: +@@ -361,16 +361,10 @@ __condvar_switch_g1 (pthread_cond_t *cond, uint64_t wseq, + are not affected. + * Waiters in G1 have already received a signal and been woken. */ + +- /* Update __g1_start, which closes this group. The value we add will never +- be negative because old_orig_size can only be zero when we switch groups +- the first time after a condvar was initialized, in which case G1 will be +- at index 1 and we will add a value of 1. Relaxed MO is fine because the +- change comes with no additional constraints that others would have to +- observe. */ +- __condvar_add_g1_start_relaxed (cond, +- (old_orig_size << 1) + (g1 == 1 ? 1 : - 1)); +- +- unsigned int lowseq = ((old_g1_start + old_orig_size) << 1) & ~1U; ++ /* Update __g1_start, which closes this group. Relaxed MO is fine because ++ the change comes with no additional constraints that others would have ++ to observe. */ ++ __condvar_add_g1_start_relaxed (cond, old_orig_size); + + /* At this point, the old G1 is now a valid new G2 (but not in use yet). + No old waiter can neither grab a signal nor acquire a reference without +@@ -382,13 +376,13 @@ __condvar_switch_g1 (pthread_cond_t *cond, uint64_t wseq, + g1 ^= 1; + *g1index ^= 1; + +- /* Now advance the new G1 g_signals to the new lowseq, giving it ++ /* Now advance the new G1 g_signals to the new g1_start, giving it + an effective signal count of 0 to start. */ +- atomic_store_release (cond->__data.__g_signals + g1, lowseq); ++ atomic_store_release (cond->__data.__g_signals + g1, (unsigned)new_g1_start); + + /* These values are just observed by signalers, and thus protected by the + lock. */ +- unsigned int orig_size = wseq - (old_g1_start + old_orig_size); ++ unsigned int orig_size = wseq - new_g1_start; + __condvar_set_orig_size (cond, orig_size); + /* Use and addition to not loose track of cancellations in what was + previously G2. */ +diff --git a/nptl/pthread_cond_signal.c b/nptl/pthread_cond_signal.c +index 4f7639e386fc207a..9a5bac92fe8fc246 100644 +--- a/nptl/pthread_cond_signal.c ++++ b/nptl/pthread_cond_signal.c +@@ -81,7 +81,7 @@ ___pthread_cond_signal (pthread_cond_t *cond) + release-MO store when initializing a group in __condvar_switch_g1 + because we use an atomic read-modify-write and thus extend that + store's release sequence. */ +- atomic_fetch_add_relaxed (cond->__data.__g_signals + g1, 2); ++ atomic_fetch_add_relaxed (cond->__data.__g_signals + g1, 1); + cond->__data.__g_size[g1]--; + /* TODO Only set it if there are indeed futex waiters. */ + do_futex_wake = true; +diff --git a/nptl/pthread_cond_wait.c b/nptl/pthread_cond_wait.c +index 173bd134164eed44..944b241ec26b9b32 100644 +--- a/nptl/pthread_cond_wait.c ++++ b/nptl/pthread_cond_wait.c +@@ -85,7 +85,7 @@ __condvar_cancel_waiting (pthread_cond_t *cond, uint64_t seq, unsigned int g, + not hold a reference on the group. */ + __condvar_acquire_lock (cond, private); + +- uint64_t g1_start = __condvar_load_g1_start_relaxed (cond) >> 1; ++ uint64_t g1_start = __condvar_load_g1_start_relaxed (cond); + if (g1_start > seq) + { + /* Our group is closed, so someone provided enough signals for it. +@@ -260,7 +260,6 @@ __condvar_cleanup_waiting (void *arg) + * Waiters fetch-add while having acquire the mutex associated with the + condvar. Signalers load it and fetch-xor it concurrently. + __g1_start: Starting position of G1 (inclusive) +- * LSB is index of current G2. + * Modified by signalers while having acquired the condvar-internal lock + and observed concurrently by waiters. + __g1_orig_size: Initial size of G1 +@@ -281,11 +280,9 @@ __condvar_cleanup_waiting (void *arg) + * Reference count used by waiters concurrently with signalers that have + acquired the condvar-internal lock. + __g_signals: The number of signals that can still be consumed, relative to +- the current g1_start. (i.e. bits 31 to 1 of __g_signals are bits +- 31 to 1 of g1_start with the signal count added) ++ the current g1_start. (i.e. g1_start with the signal count added) + * Used as a futex word by waiters. Used concurrently by waiters and + signalers. +- * LSB is currently reserved and 0. + __g_size: Waiters remaining in this group (i.e., which have not been + signaled yet. + * Accessed by signalers and waiters that cancel waiting (both do so only +@@ -392,9 +389,8 @@ __pthread_cond_wait_common (pthread_cond_t *cond, pthread_mutex_t *mutex, + too. */ + unsigned int signals = atomic_load_acquire (cond->__data.__g_signals + g); + uint64_t g1_start = __condvar_load_g1_start_relaxed (cond); +- unsigned int lowseq = (g1_start & 1) == g ? signals : g1_start & ~1U; + +- if (seq < (g1_start >> 1)) ++ if (seq < g1_start) + { + /* If the group is closed already, + then this waiter originally had enough extra signals to +@@ -407,13 +403,13 @@ __pthread_cond_wait_common (pthread_cond_t *cond, pthread_mutex_t *mutex, + by now, perhaps in the process of switching back to an older + G2, but in either case we're allowed to consume the available + signal and should not block anymore. */ +- if ((int)(signals - lowseq) >= 2) ++ if ((int)(signals - (unsigned int)g1_start) > 0) + { + /* Try to grab a signal. See above for MO. (if we do another loop + iteration we need to see the correct value of g1_start) */ + if (atomic_compare_exchange_weak_acquire ( + cond->__data.__g_signals + g, +- &signals, signals - 2)) ++ &signals, signals - 1)) + break; + else + continue; diff --git a/glibc.spec b/glibc.spec index 0ebae00..4f988bc 100644 --- a/glibc.spec +++ b/glibc.spec @@ -157,7 +157,7 @@ end \ Summary: The GNU libc libraries Name: glibc Version: %{glibcversion} -Release: 163%{?dist} +Release: 164%{?dist} # In general, GPLv2+ is used by programs, LGPLv2+ is used for # libraries. @@ -1098,6 +1098,16 @@ Patch790: glibc-RHEL-67592-1.patch Patch791: glibc-RHEL-67592-2.patch Patch792: glibc-RHEL-67592-3.patch Patch793: glibc-RHEL-67592-4.patch +Patch794: glibc-RHEL-2419-1.patch +Patch795: glibc-RHEL-2419-2.patch +Patch796: glibc-RHEL-2419-3.patch +Patch797: glibc-RHEL-2419-4.patch +Patch798: glibc-RHEL-2419-5.patch +Patch799: glibc-RHEL-2419-6.patch +Patch800: glibc-RHEL-2419-7.patch +Patch801: glibc-RHEL-2419-8.patch +Patch802: glibc-RHEL-2419-9.patch +Patch803: glibc-RHEL-2419-10.patch ############################################################################## # Continued list of core "glibc" package information: @@ -3091,6 +3101,9 @@ update_gconv_modules_cache () %endif %changelog +* Fri Feb 7 2025 Carlos O'Donell - 2.34-164 +- Fix missed wakeup in POSIX thread condition variables (RHEL-2419) + * Tue Feb 4 2025 DJ Delorie - 2.34-163 - manual: sigaction's sa_flags field and SA_SIGINFO (RHEL-67592)