1a997221e3
Upstream commit: 5d070d12b3a52bc44dd1b71743abc4b6243862ae
Related: RHEL-25850
- x86: Expand the comment on when REP STOSB is used on memset
- x86: Do not prefer ERMS for memset on Zen3+
- x86: Fix Zen3/Zen4 ERMS selection (BZ 30994)
Resolves: RHEL-25530
- Add tst-gnu2-tls2mod1 to test-internal-extras
- elf: Enable TLS descriptor tests on aarch64
- arm: Update _dl_tlsdesc_dynamic to preserve caller-saved registers (BZ 31372)
- Ignore undefined symbols for -mtls-dialect=gnu2
- x86-64: Allocate state buffer space for RDI, RSI and RBX
- x86-64: Update _dl_tlsdesc_dynamic to preserve AMX registers
- x86: Update _dl_tlsdesc_dynamic to preserve caller-saved registers
Resolves: RHEL-29179
- x86-64: Save APX registers in ld.so trampoline
Resolves: RHEL-25045
- LoongArch: Correct {__ieee754, _}_scalb -> {__ieee754, _}_scalbf
- powerpc: Placeholder and infrastructure/build support to add Power11 related changes.
- powerpc: Add HWCAP3/HWCAP4 data to TCB for Power Architecture.
Resolves: RHEL-24761
Fedora 40 commit: 24af28d49b
31 lines
1.2 KiB
Diff
31 lines
1.2 KiB
Diff
commit 6484a92698039c4a7a510f0214e22d067b0d78b3
|
|
Author: Adhemerval Zanella <adhemerval.zanella@linaro.org>
|
|
Date: Thu Feb 8 10:08:39 2024 -0300
|
|
|
|
x86: Do not prefer ERMS for memset on Zen3+
|
|
|
|
For AMD Zen3+ architecture, the performance of the vectorized loop is
|
|
slightly better than ERMS.
|
|
|
|
Checked on x86_64-linux-gnu on Zen3.
|
|
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
|
|
|
|
(cherry picked from commit 272708884cb750f12f5c74a00e6620c19dc6d567)
|
|
|
|
diff --git a/sysdeps/x86/dl-cacheinfo.h b/sysdeps/x86/dl-cacheinfo.h
|
|
index f34d12846caf9422..5a98f70364220da4 100644
|
|
--- a/sysdeps/x86/dl-cacheinfo.h
|
|
+++ b/sysdeps/x86/dl-cacheinfo.h
|
|
@@ -1021,6 +1021,11 @@ dl_init_cacheinfo (struct cpu_features *cpu_features)
|
|
minimum value is fixed. */
|
|
rep_stosb_threshold = TUNABLE_GET (x86_rep_stosb_threshold,
|
|
long int, NULL);
|
|
+ if (cpu_features->basic.kind == arch_kind_amd
|
|
+ && !TUNABLE_IS_INITIALIZED (x86_rep_stosb_threshold))
|
|
+ /* For AMD Zen3+ architecture, the performance of the vectorized loop is
|
|
+ slightly better than ERMS. */
|
|
+ rep_stosb_threshold = SIZE_MAX;
|
|
|
|
TUNABLE_SET_WITH_BOUNDS (x86_data_cache_size, data, 0, SIZE_MAX);
|
|
TUNABLE_SET_WITH_BOUNDS (x86_shared_cache_size, shared, 0, SIZE_MAX);
|