24af28d49b
Upstream commit: 5d070d12b3a52bc44dd1b71743abc4b6243862ae - x86: Expand the comment on when REP STOSB is used on memset - x86: Do not prefer ERMS for memset on Zen3+ - x86: Fix Zen3/Zen4 ERMS selection (BZ 30994) - Add tst-gnu2-tls2mod1 to test-internal-extras - elf: Enable TLS descriptor tests on aarch64 - arm: Update _dl_tlsdesc_dynamic to preserve caller-saved registers (BZ 31372) - Ignore undefined symbols for -mtls-dialect=gnu2 - x86-64: Allocate state buffer space for RDI, RSI and RBX - x86-64: Update _dl_tlsdesc_dynamic to preserve AMX registers - x86: Update _dl_tlsdesc_dynamic to preserve caller-saved registers - x86-64: Save APX registers in ld.so trampoline - LoongArch: Correct {__ieee754, _}_scalb -> {__ieee754, _}_scalbf - powerpc: Placeholder and infrastructure/build support to add Power11 related changes. - powerpc: Add HWCAP3/HWCAP4 data to TCB for Power Architecture.
25 lines
1.2 KiB
Diff
25 lines
1.2 KiB
Diff
commit 5d070d12b3a52bc44dd1b71743abc4b6243862ae
|
|
Author: Adhemerval Zanella <adhemerval.zanella@linaro.org>
|
|
Date: Thu Feb 8 10:08:40 2024 -0300
|
|
|
|
x86: Expand the comment on when REP STOSB is used on memset
|
|
|
|
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
|
|
(cherry picked from commit 491e55beab7457ed310a4a47496f4a333c5d1032)
|
|
|
|
diff --git a/sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S b/sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S
|
|
index 9984c3ca0fafab6a..97839a22483b0613 100644
|
|
--- a/sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S
|
|
+++ b/sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S
|
|
@@ -21,7 +21,9 @@
|
|
2. If size is less than VEC, use integer register stores.
|
|
3. If size is from VEC_SIZE to 2 * VEC_SIZE, use 2 VEC stores.
|
|
4. If size is from 2 * VEC_SIZE to 4 * VEC_SIZE, use 4 VEC stores.
|
|
- 5. If size is more to 4 * VEC_SIZE, align to 4 * VEC_SIZE with
|
|
+ 5. On machines ERMS feature, if size is greater or equal than
|
|
+ __x86_rep_stosb_threshold then REP STOSB will be used.
|
|
+ 6. If size is more to 4 * VEC_SIZE, align to 4 * VEC_SIZE with
|
|
4 VEC stores and store 4 * VEC at a time until done. */
|
|
|
|
#include <sysdep.h>
|