glibc/glibc-upstream-2.39-20.patch

commit 6484a92698039c4a7a510f0214e22d067b0d78b3
Author: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Date:   Thu Feb 8 10:08:39 2024 -0300

    x86: Do not prefer ERMS for memset on Zen3+
    
    For AMD Zen3+ architecture, the performance of the vectorized loop is
    slightly better than ERMS.
    
    Checked on x86_64-linux-gnu on Zen3.
    Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
    
    (cherry picked from commit 272708884cb750f12f5c74a00e6620c19dc6d567)

diff --git a/sysdeps/x86/dl-cacheinfo.h b/sysdeps/x86/dl-cacheinfo.h
index f34d12846caf9422..5a98f70364220da4 100644
--- a/sysdeps/x86/dl-cacheinfo.h
+++ b/sysdeps/x86/dl-cacheinfo.h
@@ -1021,6 +1021,11 @@ dl_init_cacheinfo (struct cpu_features *cpu_features)
      minimum value is fixed.  */
   rep_stosb_threshold = TUNABLE_GET (x86_rep_stosb_threshold,
 				     long int, NULL);
+  if (cpu_features->basic.kind == arch_kind_amd
+      && !TUNABLE_IS_INITIALIZED (x86_rep_stosb_threshold))
+    /* For AMD Zen3+ architecture, the performance of the vectorized loop is
+       slightly better than ERMS.  */
+    rep_stosb_threshold = SIZE_MAX;
 
   TUNABLE_SET_WITH_BOUNDS (x86_data_cache_size, data, 0, SIZE_MAX);
   TUNABLE_SET_WITH_BOUNDS (x86_shared_cache_size, shared, 0, SIZE_MAX);
Sync with upstream branch release/2.39/master Upstream commit: 5d070d12b3a52bc44dd1b71743abc4b6243862ae - x86: Expand the comment on when REP STOSB is used on memset - x86: Do not prefer ERMS for memset on Zen3+ - x86: Fix Zen3/Zen4 ERMS selection (BZ 30994) - Add tst-gnu2-tls2mod1 to test-internal-extras - elf: Enable TLS descriptor tests on aarch64 - arm: Update _dl_tlsdesc_dynamic to preserve caller-saved registers (BZ 31372) - Ignore undefined symbols for -mtls-dialect=gnu2 - x86-64: Allocate state buffer space for RDI, RSI and RBX - x86-64: Update _dl_tlsdesc_dynamic to preserve AMX registers - x86: Update _dl_tlsdesc_dynamic to preserve caller-saved registers - x86-64: Save APX registers in ld.so trampoline - LoongArch: Correct {__ieee754, _}_scalb -> {__ieee754, _}_scalbf - powerpc: Placeholder and infrastructure/build support to add Power11 related changes. - powerpc: Add HWCAP3/HWCAP4 data to TCB for Power Architecture. 2024-04-04 15:10:36 +00:00			`commit 6484a92698039c4a7a510f0214e22d067b0d78b3`
			`Author: Adhemerval Zanella <adhemerval.zanella@linaro.org>`
			`Date: Thu Feb 8 10:08:39 2024 -0300`

			`x86: Do not prefer ERMS for memset on Zen3+`

			`For AMD Zen3+ architecture, the performance of the vectorized loop is`
			`slightly better than ERMS.`

			`Checked on x86_64-linux-gnu on Zen3.`
			`Reviewed-by: H.J. Lu <hjl.tools@gmail.com>`

			`(cherry picked from commit 272708884cb750f12f5c74a00e6620c19dc6d567)`

			`diff --git a/sysdeps/x86/dl-cacheinfo.h b/sysdeps/x86/dl-cacheinfo.h`
			`index f34d12846caf9422..5a98f70364220da4 100644`
			`--- a/sysdeps/x86/dl-cacheinfo.h`
			`+++ b/sysdeps/x86/dl-cacheinfo.h`
			`@@ -1021,6 +1021,11 @@ dl_init_cacheinfo (struct cpu_features *cpu_features)`
			`minimum value is fixed. */`
			`rep_stosb_threshold = TUNABLE_GET (x86_rep_stosb_threshold,`
			`long int, NULL);`
			`+ if (cpu_features->basic.kind == arch_kind_amd`
			`+ && !TUNABLE_IS_INITIALIZED (x86_rep_stosb_threshold))`
			`+ /* For AMD Zen3+ architecture, the performance of the vectorized loop is`
			`+ slightly better than ERMS. */`
			`+ rep_stosb_threshold = SIZE_MAX;`

			`TUNABLE_SET_WITH_BOUNDS (x86_data_cache_size, data, 0, SIZE_MAX);`
			`TUNABLE_SET_WITH_BOUNDS (x86_shared_cache_size, shared, 0, SIZE_MAX);`