Import glibc-2.34-35.fc35 from f35
* Tue May 31 2022 Arjun Shankar <arjun@redhat.com> - 2.34-35 - Sync with upstream branch release/2.34/master, commit ff450cdbdee0b8cb6b9d653d6d2fa892de29be31: - Fix deadlock when pthread_atfork handler calls pthread_atfork or dlclose - x86: Fallback {str|wcs}cmp RTM in the ncmp overflow case [BZ #29127] - string.h: fix __fortified_attr_access macro call [BZ #29162] - linux: Add a getauxval test [BZ #23293] - rtld: Use generic argv adjustment in ld.so [BZ #23293] - S390: Enable static PIE * Thu May 19 2022 Florian Weimer <fweimer@redhat.com> - 2.34-34 - Sync with upstream branch release/2.34/master, commit ede8d94d154157d269b18f3601440ac576c1f96a: - csu: Implement and use _dl_early_allocate during static startup - Linux: Introduce __brk_call for invoking the brk system call - Linux: Implement a useful version of _startup_fatal - ia64: Always define IA64_USE_NEW_STUB as a flag macro - Linux: Define MMAP_CALL_INTERNAL - i386: Honor I386_USE_SYSENTER for 6-argument Linux system calls - i386: Remove OPTIMIZE_FOR_GCC_5 from Linux libc-do-syscall.S - elf: Remove __libc_init_secure - Linux: Consolidate auxiliary vector parsing (redo) - Linux: Include <dl-auxv.h> in dl-sysdep.c only for SHARED - Revert "Linux: Consolidate auxiliary vector parsing" - Linux: Consolidate auxiliary vector parsing - Linux: Assume that NEED_DL_SYSINFO_DSO is always defined - Linux: Remove DL_FIND_ARG_COMPONENTS - Linux: Remove HAVE_AUX_SECURE, HAVE_AUX_XID, HAVE_AUX_PAGESIZE - elf: Merge dl-sysdep.c into the Linux version - elf: Remove unused NEED_DL_BASE_ADDR and _dl_base_addr - x86: Optimize {str|wcs}rchr-evex - x86: Optimize {str|wcs}rchr-avx2 - x86: Optimize {str|wcs}rchr-sse2 - x86: Cleanup page cross code in memcmp-avx2-movbe.S - x86: Remove memcmp-sse4.S - x86: Small improvements for wcslen - x86: Remove AVX str{n}casecmp - x86: Add EVEX optimized str{n}casecmp - x86: Add AVX2 optimized str{n}casecmp - x86: Optimize str{n}casecmp TOLOWER logic in strcmp-sse42.S - x86: Optimize str{n}casecmp TOLOWER logic in strcmp.S - x86: Remove strspn-sse2.S and use the generic implementation - x86: Remove strpbrk-sse2.S and use the generic implementation - x87: Remove strcspn-sse2.S and use the generic implementation - x86: Optimize strspn in strspn-c.c - x86: Optimize strcspn and strpbrk in strcspn-c.c - x86: Code cleanup in strchr-evex and comment justifying branch - x86: Code cleanup in strchr-avx2 and comment justifying branch - x86_64: Remove bcopy optimizations - x86-64: Remove bzero weak alias in SS2 memset - x86_64/multiarch: Sort sysdep_routines and put one entry per line - x86: Improve L to support L(XXX_SYMBOL (YYY, ZZZ)) - fortify: Ensure that __glibc_fortify condition is a constant [BZ #29141] * Thu May 12 2022 Florian Weimer <fweimer@redhat.com> - 2.34-33 - Sync with upstream branch release/2.34/master, commit 91c2e6c3db44297bf4cb3a2e3c40236c5b6a0b23: - dlfcn: Implement the RTLD_DI_PHDR request type for dlinfo - manual: Document the dlinfo function - x86: Fix fallback for wcsncmp_avx2 in strcmp-avx2.S [BZ #28896] - x86: Fix bug in strncmp-evex and strncmp-avx2 [BZ #28895] - x86: Set .text section in memset-vec-unaligned-erms - x86-64: Optimize bzero - x86: Remove SSSE3 instruction for broadcast in memset.S (SSE2 Only) - x86: Improve vec generation in memset-vec-unaligned-erms.S - x86-64: Fix strcmp-evex.S - x86-64: Fix strcmp-avx2.S - x86: Optimize strcmp-evex.S - x86: Optimize strcmp-avx2.S - manual: Clarify that abbreviations of long options are allowed - Add HWCAP2_AFP, HWCAP2_RPRES from Linux 5.17 to AArch64 bits/hwcap.h - aarch64: Add HWCAP2_ECV from Linux 5.16 - Add SOL_MPTCP, SOL_MCTP from Linux 5.16 to bits/socket.h - Update kernel version to 5.17 in tst-mman-consts.py - Update kernel version to 5.16 in tst-mman-consts.py - Update syscall lists for Linux 5.17 - Add ARPHRD_CAN, ARPHRD_MCTP to net/if_arp.h - Update kernel version to 5.15 in tst-mman-consts.py - Add PF_MCTP, AF_MCTP from Linux 5.15 to bits/socket.h Resolves: #2091541
This commit is contained in:
parent
73667d0be6
commit
601650f878
35
glibc-upstream-2.34-191.patch
Normal file
35
glibc-upstream-2.34-191.patch
Normal file
@ -0,0 +1,35 @@
|
||||
commit bc6fba3c8048b11c9f73db03339c97a2fec3f0cf
|
||||
Author: Joseph Myers <joseph@codesourcery.com>
|
||||
Date: Wed Nov 17 14:25:16 2021 +0000
|
||||
|
||||
Add PF_MCTP, AF_MCTP from Linux 5.15 to bits/socket.h
|
||||
|
||||
Linux 5.15 adds a new address / protocol family PF_MCTP / AF_MCTP; add
|
||||
these constants to bits/socket.h.
|
||||
|
||||
Tested for x86_64.
|
||||
|
||||
(cherry picked from commit bdeb7a8fa9989d18dab6310753d04d908125dc1d)
|
||||
|
||||
diff --git a/sysdeps/unix/sysv/linux/bits/socket.h b/sysdeps/unix/sysv/linux/bits/socket.h
|
||||
index a011a8c0959b9970..7bb9e863d7329da9 100644
|
||||
--- a/sysdeps/unix/sysv/linux/bits/socket.h
|
||||
+++ b/sysdeps/unix/sysv/linux/bits/socket.h
|
||||
@@ -86,7 +86,8 @@ typedef __socklen_t socklen_t;
|
||||
#define PF_QIPCRTR 42 /* Qualcomm IPC Router. */
|
||||
#define PF_SMC 43 /* SMC sockets. */
|
||||
#define PF_XDP 44 /* XDP sockets. */
|
||||
-#define PF_MAX 45 /* For now.. */
|
||||
+#define PF_MCTP 45 /* Management component transport protocol. */
|
||||
+#define PF_MAX 46 /* For now.. */
|
||||
|
||||
/* Address families. */
|
||||
#define AF_UNSPEC PF_UNSPEC
|
||||
@@ -137,6 +138,7 @@ typedef __socklen_t socklen_t;
|
||||
#define AF_QIPCRTR PF_QIPCRTR
|
||||
#define AF_SMC PF_SMC
|
||||
#define AF_XDP PF_XDP
|
||||
+#define AF_MCTP PF_MCTP
|
||||
#define AF_MAX PF_MAX
|
||||
|
||||
/* Socket level values. Others are defined in the appropriate headers.
|
27
glibc-upstream-2.34-192.patch
Normal file
27
glibc-upstream-2.34-192.patch
Normal file
@ -0,0 +1,27 @@
|
||||
commit fd5dbfd1cd98cb2f12f9e9f7004a4d25ab0c977f
|
||||
Author: Joseph Myers <joseph@codesourcery.com>
|
||||
Date: Mon Nov 22 15:30:12 2021 +0000
|
||||
|
||||
Update kernel version to 5.15 in tst-mman-consts.py
|
||||
|
||||
This patch updates the kernel version in the test tst-mman-consts.py
|
||||
to 5.15. (There are no new MAP_* constants covered by this test in
|
||||
5.15 that need any other header changes.)
|
||||
|
||||
Tested with build-many-glibcs.py.
|
||||
|
||||
(cherry picked from commit 5c3ece451d46a7d8721311609bfcb6faafacb39e)
|
||||
|
||||
diff --git a/sysdeps/unix/sysv/linux/tst-mman-consts.py b/sysdeps/unix/sysv/linux/tst-mman-consts.py
|
||||
index 810433c238f31c25..eeccdfd04dae57ab 100644
|
||||
--- a/sysdeps/unix/sysv/linux/tst-mman-consts.py
|
||||
+++ b/sysdeps/unix/sysv/linux/tst-mman-consts.py
|
||||
@@ -33,7 +33,7 @@ def main():
|
||||
help='C compiler (including options) to use')
|
||||
args = parser.parse_args()
|
||||
linux_version_headers = glibcsyscalls.linux_kernel_version(args.cc)
|
||||
- linux_version_glibc = (5, 14)
|
||||
+ linux_version_glibc = (5, 15)
|
||||
sys.exit(glibcextract.compare_macro_consts(
|
||||
'#define _GNU_SOURCE 1\n'
|
||||
'#include <sys/mman.h>\n',
|
28
glibc-upstream-2.34-193.patch
Normal file
28
glibc-upstream-2.34-193.patch
Normal file
@ -0,0 +1,28 @@
|
||||
commit 5146b73d72ced9bab125e986aa99ef5fe2f88475
|
||||
Author: Joseph Myers <joseph@codesourcery.com>
|
||||
Date: Mon Dec 20 15:38:32 2021 +0000
|
||||
|
||||
Add ARPHRD_CAN, ARPHRD_MCTP to net/if_arp.h
|
||||
|
||||
Add the constant ARPHRD_MCTP, from Linux 5.15, to net/if_arp.h, along
|
||||
with ARPHRD_CAN which was added to Linux in version 2.6.25 (commit
|
||||
cd05acfe65ed2cf2db683fa9a6adb8d35635263b, "[CAN]: Allocate protocol
|
||||
numbers for PF_CAN") but apparently missed for glibc at the time.
|
||||
|
||||
Tested for x86_64.
|
||||
|
||||
(cherry picked from commit a94d9659cd69dbc70d3494b1cbbbb5a1551675c5)
|
||||
|
||||
diff --git a/sysdeps/unix/sysv/linux/net/if_arp.h b/sysdeps/unix/sysv/linux/net/if_arp.h
|
||||
index 2a8933cde7cf236d..42910b776660def1 100644
|
||||
--- a/sysdeps/unix/sysv/linux/net/if_arp.h
|
||||
+++ b/sysdeps/unix/sysv/linux/net/if_arp.h
|
||||
@@ -95,6 +95,8 @@ struct arphdr
|
||||
#define ARPHRD_ROSE 270
|
||||
#define ARPHRD_X25 271 /* CCITT X.25. */
|
||||
#define ARPHRD_HWX25 272 /* Boards with X.25 in firmware. */
|
||||
+#define ARPHRD_CAN 280 /* Controller Area Network. */
|
||||
+#define ARPHRD_MCTP 290
|
||||
#define ARPHRD_PPP 512
|
||||
#define ARPHRD_CISCO 513 /* Cisco HDLC. */
|
||||
#define ARPHRD_HDLC ARPHRD_CISCO
|
337
glibc-upstream-2.34-194.patch
Normal file
337
glibc-upstream-2.34-194.patch
Normal file
@ -0,0 +1,337 @@
|
||||
commit 6af165658d0999ac2c4e9ce88bee020fbc2ee49f
|
||||
Author: Joseph Myers <joseph@codesourcery.com>
|
||||
Date: Wed Mar 23 17:11:56 2022 +0000
|
||||
|
||||
Update syscall lists for Linux 5.17
|
||||
|
||||
Linux 5.17 has one new syscall, set_mempolicy_home_node. Update
|
||||
syscall-names.list and regenerate the arch-syscall.h headers with
|
||||
build-many-glibcs.py update-syscalls.
|
||||
|
||||
Tested with build-many-glibcs.py.
|
||||
|
||||
(cherry picked from commit 8ef9196b26793830515402ea95aca2629f7721ec)
|
||||
|
||||
diff --git a/sysdeps/unix/sysv/linux/aarch64/arch-syscall.h b/sysdeps/unix/sysv/linux/aarch64/arch-syscall.h
|
||||
index 9905ebedf298954c..4fcb6da80af37e9e 100644
|
||||
--- a/sysdeps/unix/sysv/linux/aarch64/arch-syscall.h
|
||||
+++ b/sysdeps/unix/sysv/linux/aarch64/arch-syscall.h
|
||||
@@ -236,6 +236,7 @@
|
||||
#define __NR_sendmsg 211
|
||||
#define __NR_sendto 206
|
||||
#define __NR_set_mempolicy 237
|
||||
+#define __NR_set_mempolicy_home_node 450
|
||||
#define __NR_set_robust_list 99
|
||||
#define __NR_set_tid_address 96
|
||||
#define __NR_setdomainname 162
|
||||
diff --git a/sysdeps/unix/sysv/linux/alpha/arch-syscall.h b/sysdeps/unix/sysv/linux/alpha/arch-syscall.h
|
||||
index ee8085be69958b25..0cf74c1a96bb1235 100644
|
||||
--- a/sysdeps/unix/sysv/linux/alpha/arch-syscall.h
|
||||
+++ b/sysdeps/unix/sysv/linux/alpha/arch-syscall.h
|
||||
@@ -391,6 +391,7 @@
|
||||
#define __NR_sendmsg 114
|
||||
#define __NR_sendto 133
|
||||
#define __NR_set_mempolicy 431
|
||||
+#define __NR_set_mempolicy_home_node 560
|
||||
#define __NR_set_robust_list 466
|
||||
#define __NR_set_tid_address 411
|
||||
#define __NR_setdomainname 166
|
||||
diff --git a/sysdeps/unix/sysv/linux/arc/arch-syscall.h b/sysdeps/unix/sysv/linux/arc/arch-syscall.h
|
||||
index 1b626d97705d545a..c1207aaa12be6a51 100644
|
||||
--- a/sysdeps/unix/sysv/linux/arc/arch-syscall.h
|
||||
+++ b/sysdeps/unix/sysv/linux/arc/arch-syscall.h
|
||||
@@ -238,6 +238,7 @@
|
||||
#define __NR_sendmsg 211
|
||||
#define __NR_sendto 206
|
||||
#define __NR_set_mempolicy 237
|
||||
+#define __NR_set_mempolicy_home_node 450
|
||||
#define __NR_set_robust_list 99
|
||||
#define __NR_set_tid_address 96
|
||||
#define __NR_setdomainname 162
|
||||
diff --git a/sysdeps/unix/sysv/linux/arm/arch-syscall.h b/sysdeps/unix/sysv/linux/arm/arch-syscall.h
|
||||
index 96ef8db9368e7de4..e7ba04c106d8af7d 100644
|
||||
--- a/sysdeps/unix/sysv/linux/arm/arch-syscall.h
|
||||
+++ b/sysdeps/unix/sysv/linux/arm/arch-syscall.h
|
||||
@@ -302,6 +302,7 @@
|
||||
#define __NR_sendmsg 296
|
||||
#define __NR_sendto 290
|
||||
#define __NR_set_mempolicy 321
|
||||
+#define __NR_set_mempolicy_home_node 450
|
||||
#define __NR_set_robust_list 338
|
||||
#define __NR_set_tid_address 256
|
||||
#define __NR_set_tls 983045
|
||||
diff --git a/sysdeps/unix/sysv/linux/csky/arch-syscall.h b/sysdeps/unix/sysv/linux/csky/arch-syscall.h
|
||||
index 96910154ed6a5c1b..dc9383758ebc641b 100644
|
||||
--- a/sysdeps/unix/sysv/linux/csky/arch-syscall.h
|
||||
+++ b/sysdeps/unix/sysv/linux/csky/arch-syscall.h
|
||||
@@ -250,6 +250,7 @@
|
||||
#define __NR_sendmsg 211
|
||||
#define __NR_sendto 206
|
||||
#define __NR_set_mempolicy 237
|
||||
+#define __NR_set_mempolicy_home_node 450
|
||||
#define __NR_set_robust_list 99
|
||||
#define __NR_set_thread_area 244
|
||||
#define __NR_set_tid_address 96
|
||||
diff --git a/sysdeps/unix/sysv/linux/hppa/arch-syscall.h b/sysdeps/unix/sysv/linux/hppa/arch-syscall.h
|
||||
index 36675fd48e6f50c5..767f1287a30b473e 100644
|
||||
--- a/sysdeps/unix/sysv/linux/hppa/arch-syscall.h
|
||||
+++ b/sysdeps/unix/sysv/linux/hppa/arch-syscall.h
|
||||
@@ -289,6 +289,7 @@
|
||||
#define __NR_sendmsg 183
|
||||
#define __NR_sendto 82
|
||||
#define __NR_set_mempolicy 262
|
||||
+#define __NR_set_mempolicy_home_node 450
|
||||
#define __NR_set_robust_list 289
|
||||
#define __NR_set_tid_address 237
|
||||
#define __NR_setdomainname 121
|
||||
diff --git a/sysdeps/unix/sysv/linux/i386/arch-syscall.h b/sysdeps/unix/sysv/linux/i386/arch-syscall.h
|
||||
index c86ccbda4681066c..1998f0d76a444cac 100644
|
||||
--- a/sysdeps/unix/sysv/linux/i386/arch-syscall.h
|
||||
+++ b/sysdeps/unix/sysv/linux/i386/arch-syscall.h
|
||||
@@ -323,6 +323,7 @@
|
||||
#define __NR_sendmsg 370
|
||||
#define __NR_sendto 369
|
||||
#define __NR_set_mempolicy 276
|
||||
+#define __NR_set_mempolicy_home_node 450
|
||||
#define __NR_set_robust_list 311
|
||||
#define __NR_set_thread_area 243
|
||||
#define __NR_set_tid_address 258
|
||||
diff --git a/sysdeps/unix/sysv/linux/ia64/arch-syscall.h b/sysdeps/unix/sysv/linux/ia64/arch-syscall.h
|
||||
index d898bce404955ef0..b2eab1b93d70b9de 100644
|
||||
--- a/sysdeps/unix/sysv/linux/ia64/arch-syscall.h
|
||||
+++ b/sysdeps/unix/sysv/linux/ia64/arch-syscall.h
|
||||
@@ -272,6 +272,7 @@
|
||||
#define __NR_sendmsg 1205
|
||||
#define __NR_sendto 1199
|
||||
#define __NR_set_mempolicy 1261
|
||||
+#define __NR_set_mempolicy_home_node 1474
|
||||
#define __NR_set_robust_list 1298
|
||||
#define __NR_set_tid_address 1233
|
||||
#define __NR_setdomainname 1129
|
||||
diff --git a/sysdeps/unix/sysv/linux/m68k/arch-syscall.h b/sysdeps/unix/sysv/linux/m68k/arch-syscall.h
|
||||
index fe721b809076abeb..5fc3723772f92516 100644
|
||||
--- a/sysdeps/unix/sysv/linux/m68k/arch-syscall.h
|
||||
+++ b/sysdeps/unix/sysv/linux/m68k/arch-syscall.h
|
||||
@@ -310,6 +310,7 @@
|
||||
#define __NR_sendmsg 367
|
||||
#define __NR_sendto 366
|
||||
#define __NR_set_mempolicy 270
|
||||
+#define __NR_set_mempolicy_home_node 450
|
||||
#define __NR_set_robust_list 304
|
||||
#define __NR_set_thread_area 334
|
||||
#define __NR_set_tid_address 253
|
||||
diff --git a/sysdeps/unix/sysv/linux/microblaze/arch-syscall.h b/sysdeps/unix/sysv/linux/microblaze/arch-syscall.h
|
||||
index 6e10c3661db96a1e..b6e9b007e496cd80 100644
|
||||
--- a/sysdeps/unix/sysv/linux/microblaze/arch-syscall.h
|
||||
+++ b/sysdeps/unix/sysv/linux/microblaze/arch-syscall.h
|
||||
@@ -326,6 +326,7 @@
|
||||
#define __NR_sendmsg 360
|
||||
#define __NR_sendto 353
|
||||
#define __NR_set_mempolicy 276
|
||||
+#define __NR_set_mempolicy_home_node 450
|
||||
#define __NR_set_robust_list 311
|
||||
#define __NR_set_thread_area 243
|
||||
#define __NR_set_tid_address 258
|
||||
diff --git a/sysdeps/unix/sysv/linux/mips/mips32/arch-syscall.h b/sysdeps/unix/sysv/linux/mips/mips32/arch-syscall.h
|
||||
index 26a6d594a2222f15..b3a3871f8ab8a23e 100644
|
||||
--- a/sysdeps/unix/sysv/linux/mips/mips32/arch-syscall.h
|
||||
+++ b/sysdeps/unix/sysv/linux/mips/mips32/arch-syscall.h
|
||||
@@ -308,6 +308,7 @@
|
||||
#define __NR_sendmsg 4179
|
||||
#define __NR_sendto 4180
|
||||
#define __NR_set_mempolicy 4270
|
||||
+#define __NR_set_mempolicy_home_node 4450
|
||||
#define __NR_set_robust_list 4309
|
||||
#define __NR_set_thread_area 4283
|
||||
#define __NR_set_tid_address 4252
|
||||
diff --git a/sysdeps/unix/sysv/linux/mips/mips64/n32/arch-syscall.h b/sysdeps/unix/sysv/linux/mips/mips64/n32/arch-syscall.h
|
||||
index 83e0d49c5e3ca1bc..b462182723aff286 100644
|
||||
--- a/sysdeps/unix/sysv/linux/mips/mips64/n32/arch-syscall.h
|
||||
+++ b/sysdeps/unix/sysv/linux/mips/mips64/n32/arch-syscall.h
|
||||
@@ -288,6 +288,7 @@
|
||||
#define __NR_sendmsg 6045
|
||||
#define __NR_sendto 6043
|
||||
#define __NR_set_mempolicy 6233
|
||||
+#define __NR_set_mempolicy_home_node 6450
|
||||
#define __NR_set_robust_list 6272
|
||||
#define __NR_set_thread_area 6246
|
||||
#define __NR_set_tid_address 6213
|
||||
diff --git a/sysdeps/unix/sysv/linux/mips/mips64/n64/arch-syscall.h b/sysdeps/unix/sysv/linux/mips/mips64/n64/arch-syscall.h
|
||||
index d6747c542f63202b..a9d6b94572e93001 100644
|
||||
--- a/sysdeps/unix/sysv/linux/mips/mips64/n64/arch-syscall.h
|
||||
+++ b/sysdeps/unix/sysv/linux/mips/mips64/n64/arch-syscall.h
|
||||
@@ -270,6 +270,7 @@
|
||||
#define __NR_sendmsg 5045
|
||||
#define __NR_sendto 5043
|
||||
#define __NR_set_mempolicy 5229
|
||||
+#define __NR_set_mempolicy_home_node 5450
|
||||
#define __NR_set_robust_list 5268
|
||||
#define __NR_set_thread_area 5242
|
||||
#define __NR_set_tid_address 5212
|
||||
diff --git a/sysdeps/unix/sysv/linux/nios2/arch-syscall.h b/sysdeps/unix/sysv/linux/nios2/arch-syscall.h
|
||||
index 4ee209bc4475ea7d..809a219ef32a45ef 100644
|
||||
--- a/sysdeps/unix/sysv/linux/nios2/arch-syscall.h
|
||||
+++ b/sysdeps/unix/sysv/linux/nios2/arch-syscall.h
|
||||
@@ -250,6 +250,7 @@
|
||||
#define __NR_sendmsg 211
|
||||
#define __NR_sendto 206
|
||||
#define __NR_set_mempolicy 237
|
||||
+#define __NR_set_mempolicy_home_node 450
|
||||
#define __NR_set_robust_list 99
|
||||
#define __NR_set_tid_address 96
|
||||
#define __NR_setdomainname 162
|
||||
diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc32/arch-syscall.h b/sysdeps/unix/sysv/linux/powerpc/powerpc32/arch-syscall.h
|
||||
index 497299fbc47a708c..627831ebae1b9e90 100644
|
||||
--- a/sysdeps/unix/sysv/linux/powerpc/powerpc32/arch-syscall.h
|
||||
+++ b/sysdeps/unix/sysv/linux/powerpc/powerpc32/arch-syscall.h
|
||||
@@ -319,6 +319,7 @@
|
||||
#define __NR_sendmsg 341
|
||||
#define __NR_sendto 335
|
||||
#define __NR_set_mempolicy 261
|
||||
+#define __NR_set_mempolicy_home_node 450
|
||||
#define __NR_set_robust_list 300
|
||||
#define __NR_set_tid_address 232
|
||||
#define __NR_setdomainname 121
|
||||
diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc64/arch-syscall.h b/sysdeps/unix/sysv/linux/powerpc/powerpc64/arch-syscall.h
|
||||
index e840279f171b10b9..bae597199d79eaad 100644
|
||||
--- a/sysdeps/unix/sysv/linux/powerpc/powerpc64/arch-syscall.h
|
||||
+++ b/sysdeps/unix/sysv/linux/powerpc/powerpc64/arch-syscall.h
|
||||
@@ -298,6 +298,7 @@
|
||||
#define __NR_sendmsg 341
|
||||
#define __NR_sendto 335
|
||||
#define __NR_set_mempolicy 261
|
||||
+#define __NR_set_mempolicy_home_node 450
|
||||
#define __NR_set_robust_list 300
|
||||
#define __NR_set_tid_address 232
|
||||
#define __NR_setdomainname 121
|
||||
diff --git a/sysdeps/unix/sysv/linux/riscv/rv32/arch-syscall.h b/sysdeps/unix/sysv/linux/riscv/rv32/arch-syscall.h
|
||||
index 73ef74c005e5a2bb..bf4be80f8d380963 100644
|
||||
--- a/sysdeps/unix/sysv/linux/riscv/rv32/arch-syscall.h
|
||||
+++ b/sysdeps/unix/sysv/linux/riscv/rv32/arch-syscall.h
|
||||
@@ -228,6 +228,7 @@
|
||||
#define __NR_sendmsg 211
|
||||
#define __NR_sendto 206
|
||||
#define __NR_set_mempolicy 237
|
||||
+#define __NR_set_mempolicy_home_node 450
|
||||
#define __NR_set_robust_list 99
|
||||
#define __NR_set_tid_address 96
|
||||
#define __NR_setdomainname 162
|
||||
diff --git a/sysdeps/unix/sysv/linux/riscv/rv64/arch-syscall.h b/sysdeps/unix/sysv/linux/riscv/rv64/arch-syscall.h
|
||||
index 919a79ee91177459..d656aedcc2be6009 100644
|
||||
--- a/sysdeps/unix/sysv/linux/riscv/rv64/arch-syscall.h
|
||||
+++ b/sysdeps/unix/sysv/linux/riscv/rv64/arch-syscall.h
|
||||
@@ -235,6 +235,7 @@
|
||||
#define __NR_sendmsg 211
|
||||
#define __NR_sendto 206
|
||||
#define __NR_set_mempolicy 237
|
||||
+#define __NR_set_mempolicy_home_node 450
|
||||
#define __NR_set_robust_list 99
|
||||
#define __NR_set_tid_address 96
|
||||
#define __NR_setdomainname 162
|
||||
diff --git a/sysdeps/unix/sysv/linux/s390/s390-32/arch-syscall.h b/sysdeps/unix/sysv/linux/s390/s390-32/arch-syscall.h
|
||||
index 005c0ada7aab85a1..57025107e82c9439 100644
|
||||
--- a/sysdeps/unix/sysv/linux/s390/s390-32/arch-syscall.h
|
||||
+++ b/sysdeps/unix/sysv/linux/s390/s390-32/arch-syscall.h
|
||||
@@ -311,6 +311,7 @@
|
||||
#define __NR_sendmsg 370
|
||||
#define __NR_sendto 369
|
||||
#define __NR_set_mempolicy 270
|
||||
+#define __NR_set_mempolicy_home_node 450
|
||||
#define __NR_set_robust_list 304
|
||||
#define __NR_set_tid_address 252
|
||||
#define __NR_setdomainname 121
|
||||
diff --git a/sysdeps/unix/sysv/linux/s390/s390-64/arch-syscall.h b/sysdeps/unix/sysv/linux/s390/s390-64/arch-syscall.h
|
||||
index 9131fddcc16116e4..72e19c6d569fbf9b 100644
|
||||
--- a/sysdeps/unix/sysv/linux/s390/s390-64/arch-syscall.h
|
||||
+++ b/sysdeps/unix/sysv/linux/s390/s390-64/arch-syscall.h
|
||||
@@ -278,6 +278,7 @@
|
||||
#define __NR_sendmsg 370
|
||||
#define __NR_sendto 369
|
||||
#define __NR_set_mempolicy 270
|
||||
+#define __NR_set_mempolicy_home_node 450
|
||||
#define __NR_set_robust_list 304
|
||||
#define __NR_set_tid_address 252
|
||||
#define __NR_setdomainname 121
|
||||
diff --git a/sysdeps/unix/sysv/linux/sh/arch-syscall.h b/sysdeps/unix/sysv/linux/sh/arch-syscall.h
|
||||
index d8fb041568ecb4da..d52b522d9cac87ef 100644
|
||||
--- a/sysdeps/unix/sysv/linux/sh/arch-syscall.h
|
||||
+++ b/sysdeps/unix/sysv/linux/sh/arch-syscall.h
|
||||
@@ -303,6 +303,7 @@
|
||||
#define __NR_sendmsg 355
|
||||
#define __NR_sendto 349
|
||||
#define __NR_set_mempolicy 276
|
||||
+#define __NR_set_mempolicy_home_node 450
|
||||
#define __NR_set_robust_list 311
|
||||
#define __NR_set_tid_address 258
|
||||
#define __NR_setdomainname 121
|
||||
diff --git a/sysdeps/unix/sysv/linux/sparc/sparc32/arch-syscall.h b/sysdeps/unix/sysv/linux/sparc/sparc32/arch-syscall.h
|
||||
index 2bc014fe6a1a1f4a..d3f4d8aa3edb4795 100644
|
||||
--- a/sysdeps/unix/sysv/linux/sparc/sparc32/arch-syscall.h
|
||||
+++ b/sysdeps/unix/sysv/linux/sparc/sparc32/arch-syscall.h
|
||||
@@ -310,6 +310,7 @@
|
||||
#define __NR_sendmsg 114
|
||||
#define __NR_sendto 133
|
||||
#define __NR_set_mempolicy 305
|
||||
+#define __NR_set_mempolicy_home_node 450
|
||||
#define __NR_set_robust_list 300
|
||||
#define __NR_set_tid_address 166
|
||||
#define __NR_setdomainname 163
|
||||
diff --git a/sysdeps/unix/sysv/linux/sparc/sparc64/arch-syscall.h b/sysdeps/unix/sysv/linux/sparc/sparc64/arch-syscall.h
|
||||
index 76dbbe595ffe868f..2cc03d7a24453335 100644
|
||||
--- a/sysdeps/unix/sysv/linux/sparc/sparc64/arch-syscall.h
|
||||
+++ b/sysdeps/unix/sysv/linux/sparc/sparc64/arch-syscall.h
|
||||
@@ -286,6 +286,7 @@
|
||||
#define __NR_sendmsg 114
|
||||
#define __NR_sendto 133
|
||||
#define __NR_set_mempolicy 305
|
||||
+#define __NR_set_mempolicy_home_node 450
|
||||
#define __NR_set_robust_list 300
|
||||
#define __NR_set_tid_address 166
|
||||
#define __NR_setdomainname 163
|
||||
diff --git a/sysdeps/unix/sysv/linux/syscall-names.list b/sysdeps/unix/sysv/linux/syscall-names.list
|
||||
index 0bc2af37dfa1eeb5..e2743c649586d97a 100644
|
||||
--- a/sysdeps/unix/sysv/linux/syscall-names.list
|
||||
+++ b/sysdeps/unix/sysv/linux/syscall-names.list
|
||||
@@ -21,8 +21,8 @@
|
||||
# This file can list all potential system calls. The names are only
|
||||
# used if the installed kernel headers also provide them.
|
||||
|
||||
-# The list of system calls is current as of Linux 5.16.
|
||||
-kernel 5.16
|
||||
+# The list of system calls is current as of Linux 5.17.
|
||||
+kernel 5.17
|
||||
|
||||
FAST_atomic_update
|
||||
FAST_cmpxchg
|
||||
@@ -523,6 +523,7 @@ sendmmsg
|
||||
sendmsg
|
||||
sendto
|
||||
set_mempolicy
|
||||
+set_mempolicy_home_node
|
||||
set_robust_list
|
||||
set_thread_area
|
||||
set_tid_address
|
||||
diff --git a/sysdeps/unix/sysv/linux/x86_64/64/arch-syscall.h b/sysdeps/unix/sysv/linux/x86_64/64/arch-syscall.h
|
||||
index 28558279b48a1ef4..b4ab892ec183e32d 100644
|
||||
--- a/sysdeps/unix/sysv/linux/x86_64/64/arch-syscall.h
|
||||
+++ b/sysdeps/unix/sysv/linux/x86_64/64/arch-syscall.h
|
||||
@@ -278,6 +278,7 @@
|
||||
#define __NR_sendmsg 46
|
||||
#define __NR_sendto 44
|
||||
#define __NR_set_mempolicy 238
|
||||
+#define __NR_set_mempolicy_home_node 450
|
||||
#define __NR_set_robust_list 273
|
||||
#define __NR_set_thread_area 205
|
||||
#define __NR_set_tid_address 218
|
||||
diff --git a/sysdeps/unix/sysv/linux/x86_64/x32/arch-syscall.h b/sysdeps/unix/sysv/linux/x86_64/x32/arch-syscall.h
|
||||
index c1ab8ec45e8b8fd3..772559c87b3625b8 100644
|
||||
--- a/sysdeps/unix/sysv/linux/x86_64/x32/arch-syscall.h
|
||||
+++ b/sysdeps/unix/sysv/linux/x86_64/x32/arch-syscall.h
|
||||
@@ -270,6 +270,7 @@
|
||||
#define __NR_sendmsg 1073742342
|
||||
#define __NR_sendto 1073741868
|
||||
#define __NR_set_mempolicy 1073742062
|
||||
+#define __NR_set_mempolicy_home_node 1073742274
|
||||
#define __NR_set_robust_list 1073742354
|
||||
#define __NR_set_thread_area 1073742029
|
||||
#define __NR_set_tid_address 1073742042
|
27
glibc-upstream-2.34-195.patch
Normal file
27
glibc-upstream-2.34-195.patch
Normal file
@ -0,0 +1,27 @@
|
||||
commit 81181ba5d916fc49bd737f603e28a3c2dc8430b4
|
||||
Author: Joseph Myers <joseph@codesourcery.com>
|
||||
Date: Wed Feb 16 14:19:24 2022 +0000
|
||||
|
||||
Update kernel version to 5.16 in tst-mman-consts.py
|
||||
|
||||
This patch updates the kernel version in the test tst-mman-consts.py
|
||||
to 5.16. (There are no new MAP_* constants covered by this test in
|
||||
5.16 that need any other header changes.)
|
||||
|
||||
Tested with build-many-glibcs.py.
|
||||
|
||||
(cherry picked from commit 790a607e234aa10d4b977a1b80aebe8a2acac970)
|
||||
|
||||
diff --git a/sysdeps/unix/sysv/linux/tst-mman-consts.py b/sysdeps/unix/sysv/linux/tst-mman-consts.py
|
||||
index eeccdfd04dae57ab..8102d80b6660e523 100644
|
||||
--- a/sysdeps/unix/sysv/linux/tst-mman-consts.py
|
||||
+++ b/sysdeps/unix/sysv/linux/tst-mman-consts.py
|
||||
@@ -33,7 +33,7 @@ def main():
|
||||
help='C compiler (including options) to use')
|
||||
args = parser.parse_args()
|
||||
linux_version_headers = glibcsyscalls.linux_kernel_version(args.cc)
|
||||
- linux_version_glibc = (5, 15)
|
||||
+ linux_version_glibc = (5, 16)
|
||||
sys.exit(glibcextract.compare_macro_consts(
|
||||
'#define _GNU_SOURCE 1\n'
|
||||
'#include <sys/mman.h>\n',
|
27
glibc-upstream-2.34-196.patch
Normal file
27
glibc-upstream-2.34-196.patch
Normal file
@ -0,0 +1,27 @@
|
||||
commit 0499c3a95fb864284fef36d3e9c5a54f6646b2db
|
||||
Author: Joseph Myers <joseph@codesourcery.com>
|
||||
Date: Thu Mar 24 15:35:27 2022 +0000
|
||||
|
||||
Update kernel version to 5.17 in tst-mman-consts.py
|
||||
|
||||
This patch updates the kernel version in the test tst-mman-consts.py
|
||||
to 5.17. (There are no new MAP_* constants covered by this test in
|
||||
5.17 that need any other header changes.)
|
||||
|
||||
Tested with build-many-glibcs.py.
|
||||
|
||||
(cherry picked from commit 23808a422e6036accaba7236fd3b9a0d7ab7e8ee)
|
||||
|
||||
diff --git a/sysdeps/unix/sysv/linux/tst-mman-consts.py b/sysdeps/unix/sysv/linux/tst-mman-consts.py
|
||||
index 8102d80b6660e523..724c7375c3a1623b 100644
|
||||
--- a/sysdeps/unix/sysv/linux/tst-mman-consts.py
|
||||
+++ b/sysdeps/unix/sysv/linux/tst-mman-consts.py
|
||||
@@ -33,7 +33,7 @@ def main():
|
||||
help='C compiler (including options) to use')
|
||||
args = parser.parse_args()
|
||||
linux_version_headers = glibcsyscalls.linux_kernel_version(args.cc)
|
||||
- linux_version_glibc = (5, 16)
|
||||
+ linux_version_glibc = (5, 17)
|
||||
sys.exit(glibcextract.compare_macro_consts(
|
||||
'#define _GNU_SOURCE 1\n'
|
||||
'#include <sys/mman.h>\n',
|
26
glibc-upstream-2.34-197.patch
Normal file
26
glibc-upstream-2.34-197.patch
Normal file
@ -0,0 +1,26 @@
|
||||
commit f858bc309315a03ff6b1a048f59405c159d23430
|
||||
Author: Joseph Myers <joseph@codesourcery.com>
|
||||
Date: Mon Feb 21 22:49:36 2022 +0000
|
||||
|
||||
Add SOL_MPTCP, SOL_MCTP from Linux 5.16 to bits/socket.h
|
||||
|
||||
Linux 5.16 adds constants SOL_MPTCP and SOL_MCTP to the getsockopt /
|
||||
setsockopt levels; add these constants to bits/socket.h.
|
||||
|
||||
Tested for x86_64.
|
||||
|
||||
(cherry picked from commit fdc1ae67fef27eea1445bab4bdfe2f0fb3bc7aa1)
|
||||
|
||||
diff --git a/sysdeps/unix/sysv/linux/bits/socket.h b/sysdeps/unix/sysv/linux/bits/socket.h
|
||||
index 7bb9e863d7329da9..c81fab840918924e 100644
|
||||
--- a/sysdeps/unix/sysv/linux/bits/socket.h
|
||||
+++ b/sysdeps/unix/sysv/linux/bits/socket.h
|
||||
@@ -169,6 +169,8 @@ typedef __socklen_t socklen_t;
|
||||
#define SOL_KCM 281
|
||||
#define SOL_TLS 282
|
||||
#define SOL_XDP 283
|
||||
+#define SOL_MPTCP 284
|
||||
+#define SOL_MCTP 285
|
||||
|
||||
/* Maximum queue length specifiable by listen. */
|
||||
#define SOMAXCONN 4096
|
21
glibc-upstream-2.34-198.patch
Normal file
21
glibc-upstream-2.34-198.patch
Normal file
@ -0,0 +1,21 @@
|
||||
commit c108e87026d61d6744e3e55704e0bea937243f5a
|
||||
Author: Szabolcs Nagy <szabolcs.nagy@arm.com>
|
||||
Date: Tue Dec 14 11:15:07 2021 +0000
|
||||
|
||||
aarch64: Add HWCAP2_ECV from Linux 5.16
|
||||
|
||||
Indicates the availability of enhanced counter virtualization extension
|
||||
of armv8.6-a with self-synchronized virtual counter CNTVCTSS_EL0 usable
|
||||
in userspace.
|
||||
|
||||
(cherry picked from commit 5a1be8ebdf6f02d4efec6e5f12ad06db17511f90)
|
||||
|
||||
diff --git a/sysdeps/unix/sysv/linux/aarch64/bits/hwcap.h b/sysdeps/unix/sysv/linux/aarch64/bits/hwcap.h
|
||||
index 30fda0a4a347695e..04cc762015a7230a 100644
|
||||
--- a/sysdeps/unix/sysv/linux/aarch64/bits/hwcap.h
|
||||
+++ b/sysdeps/unix/sysv/linux/aarch64/bits/hwcap.h
|
||||
@@ -74,3 +74,4 @@
|
||||
#define HWCAP2_RNG (1 << 16)
|
||||
#define HWCAP2_BTI (1 << 17)
|
||||
#define HWCAP2_MTE (1 << 18)
|
||||
+#define HWCAP2_ECV (1 << 19)
|
21
glibc-upstream-2.34-199.patch
Normal file
21
glibc-upstream-2.34-199.patch
Normal file
@ -0,0 +1,21 @@
|
||||
commit 97cb8227b864b8ea0d99a4a50e4163baad3e1c72
|
||||
Author: Joseph Myers <joseph@codesourcery.com>
|
||||
Date: Mon Mar 28 13:16:48 2022 +0000
|
||||
|
||||
Add HWCAP2_AFP, HWCAP2_RPRES from Linux 5.17 to AArch64 bits/hwcap.h
|
||||
|
||||
Add the new HWCAP2_AFP and HWCAP2_RPRES constants from Linux 5.17.
|
||||
Tested with build-many-glibcs.py for aarch64-linux-gnu.
|
||||
|
||||
(cherry picked from commit 866c599182e87f116440b5d854f9e99533c48eb3)
|
||||
|
||||
diff --git a/sysdeps/unix/sysv/linux/aarch64/bits/hwcap.h b/sysdeps/unix/sysv/linux/aarch64/bits/hwcap.h
|
||||
index 04cc762015a7230a..9a5c4116b3fe9903 100644
|
||||
--- a/sysdeps/unix/sysv/linux/aarch64/bits/hwcap.h
|
||||
+++ b/sysdeps/unix/sysv/linux/aarch64/bits/hwcap.h
|
||||
@@ -75,3 +75,5 @@
|
||||
#define HWCAP2_BTI (1 << 17)
|
||||
#define HWCAP2_MTE (1 << 18)
|
||||
#define HWCAP2_ECV (1 << 19)
|
||||
+#define HWCAP2_AFP (1 << 20)
|
||||
+#define HWCAP2_RPRES (1 << 21)
|
29
glibc-upstream-2.34-200.patch
Normal file
29
glibc-upstream-2.34-200.patch
Normal file
@ -0,0 +1,29 @@
|
||||
commit 31af92b9c8cf753992d45c801a855a02060afc08
|
||||
Author: Siddhesh Poyarekar <siddhesh@sourceware.org>
|
||||
Date: Wed May 4 15:56:47 2022 +0530
|
||||
|
||||
manual: Clarify that abbreviations of long options are allowed
|
||||
|
||||
The man page and code comments clearly state that abbreviations of long
|
||||
option names are recognized correctly as long as they are unique.
|
||||
Document this fact in the glibc manual as well.
|
||||
|
||||
Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
|
||||
Reviewed-by: Florian Weimer <fweimer@redhat.com>
|
||||
Reviewed-by: Andreas Schwab <schwab@linux-m68k.org>
|
||||
(cherry picked from commit db1efe02c9f15affc3908d6ae73875b82898a489)
|
||||
|
||||
diff --git a/manual/getopt.texi b/manual/getopt.texi
|
||||
index 5485fc46946631f7..b4c0b15ac2060560 100644
|
||||
--- a/manual/getopt.texi
|
||||
+++ b/manual/getopt.texi
|
||||
@@ -250,7 +250,8 @@ option, and stores the option's argument (if it has one) in @code{optarg}.
|
||||
|
||||
When @code{getopt_long} encounters a long option, it takes actions based
|
||||
on the @code{flag} and @code{val} fields of the definition of that
|
||||
-option.
|
||||
+option. The option name may be abbreviated as long as the abbreviation is
|
||||
+unique.
|
||||
|
||||
If @code{flag} is a null pointer, then @code{getopt_long} returns the
|
||||
contents of @code{val} to indicate which option it found. You should
|
1789
glibc-upstream-2.34-201.patch
Normal file
1789
glibc-upstream-2.34-201.patch
Normal file
File diff suppressed because it is too large
Load Diff
1987
glibc-upstream-2.34-202.patch
Normal file
1987
glibc-upstream-2.34-202.patch
Normal file
File diff suppressed because it is too large
Load Diff
29
glibc-upstream-2.34-203.patch
Normal file
29
glibc-upstream-2.34-203.patch
Normal file
@ -0,0 +1,29 @@
|
||||
commit d299032743e05571ef326c838a5ecf6ef5b3e9c3
|
||||
Author: H.J. Lu <hjl.tools@gmail.com>
|
||||
Date: Fri Feb 4 11:09:10 2022 -0800
|
||||
|
||||
x86-64: Fix strcmp-avx2.S
|
||||
|
||||
Change "movl %edx, %rdx" to "movl %edx, %edx" in:
|
||||
|
||||
commit b77b06e0e296f1a2276c27a67e1d44f2cfa38d45
|
||||
Author: Noah Goldstein <goldstein.w.n@gmail.com>
|
||||
Date: Mon Jan 10 15:35:38 2022 -0600
|
||||
|
||||
x86: Optimize strcmp-avx2.S
|
||||
|
||||
(cherry picked from commit c15efd011cea3d8f0494269eb539583215a1feed)
|
||||
|
||||
diff --git a/sysdeps/x86_64/multiarch/strcmp-avx2.S b/sysdeps/x86_64/multiarch/strcmp-avx2.S
|
||||
index a0d1c65db11028bc..cdded412a70bad10 100644
|
||||
--- a/sysdeps/x86_64/multiarch/strcmp-avx2.S
|
||||
+++ b/sysdeps/x86_64/multiarch/strcmp-avx2.S
|
||||
@@ -106,7 +106,7 @@ ENTRY(STRCMP)
|
||||
# ifdef USE_AS_STRNCMP
|
||||
# ifdef __ILP32__
|
||||
/* Clear the upper 32 bits. */
|
||||
- movl %edx, %rdx
|
||||
+ movl %edx, %edx
|
||||
# endif
|
||||
cmp $1, %RDX_LP
|
||||
/* Signed comparison intentional. We use this branch to also
|
29
glibc-upstream-2.34-204.patch
Normal file
29
glibc-upstream-2.34-204.patch
Normal file
@ -0,0 +1,29 @@
|
||||
commit 53ddafe917a8af17b16beb794c29e5b09b86d534
|
||||
Author: H.J. Lu <hjl.tools@gmail.com>
|
||||
Date: Fri Feb 4 11:11:08 2022 -0800
|
||||
|
||||
x86-64: Fix strcmp-evex.S
|
||||
|
||||
Change "movl %edx, %rdx" to "movl %edx, %edx" in:
|
||||
|
||||
commit 8418eb3ff4b781d31c4ed5dc6c0bd7356bc45db9
|
||||
Author: Noah Goldstein <goldstein.w.n@gmail.com>
|
||||
Date: Mon Jan 10 15:35:39 2022 -0600
|
||||
|
||||
x86: Optimize strcmp-evex.S
|
||||
|
||||
(cherry picked from commit 0e0199a9e02ebe42e2b36958964d63f03573c382)
|
||||
|
||||
diff --git a/sysdeps/x86_64/multiarch/strcmp-evex.S b/sysdeps/x86_64/multiarch/strcmp-evex.S
|
||||
index 99d8409af27327ad..ed56af8ecdad48b2 100644
|
||||
--- a/sysdeps/x86_64/multiarch/strcmp-evex.S
|
||||
+++ b/sysdeps/x86_64/multiarch/strcmp-evex.S
|
||||
@@ -116,7 +116,7 @@ ENTRY(STRCMP)
|
||||
# ifdef USE_AS_STRNCMP
|
||||
# ifdef __ILP32__
|
||||
/* Clear the upper 32 bits. */
|
||||
- movl %edx, %rdx
|
||||
+ movl %edx, %edx
|
||||
# endif
|
||||
cmp $1, %RDX_LP
|
||||
/* Signed comparison intentional. We use this branch to also
|
451
glibc-upstream-2.34-205.patch
Normal file
451
glibc-upstream-2.34-205.patch
Normal file
@ -0,0 +1,451 @@
|
||||
commit ea19c490a3f5628d55ded271cbb753e66b2f05e8
|
||||
Author: Noah Goldstein <goldstein.w.n@gmail.com>
|
||||
Date: Sun Feb 6 00:54:18 2022 -0600
|
||||
|
||||
x86: Improve vec generation in memset-vec-unaligned-erms.S
|
||||
|
||||
No bug.
|
||||
|
||||
Split vec generation into multiple steps. This allows the
|
||||
broadcast in AVX2 to use 'xmm' registers for the L(less_vec)
|
||||
case. This saves an expensive lane-cross instruction and removes
|
||||
the need for 'vzeroupper'.
|
||||
|
||||
For SSE2 replace 2x 'punpck' instructions with zero-idiom 'pxor' for
|
||||
byte broadcast.
|
||||
|
||||
Results for memset-avx2 small (geomean of N = 20 benchset runs).
|
||||
|
||||
size, New Time, Old Time, New / Old
|
||||
0, 4.100, 3.831, 0.934
|
||||
1, 5.074, 4.399, 0.867
|
||||
2, 4.433, 4.411, 0.995
|
||||
4, 4.487, 4.415, 0.984
|
||||
8, 4.454, 4.396, 0.987
|
||||
16, 4.502, 4.443, 0.987
|
||||
|
||||
All relevant string/wcsmbs tests are passing.
|
||||
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
|
||||
|
||||
(cherry picked from commit b62ace2740a106222e124cc86956448fa07abf4d)
|
||||
|
||||
diff --git a/sysdeps/x86_64/memset.S b/sysdeps/x86_64/memset.S
|
||||
index 0137eba4cdd9f830..34ee0bfdcb81fb39 100644
|
||||
--- a/sysdeps/x86_64/memset.S
|
||||
+++ b/sysdeps/x86_64/memset.S
|
||||
@@ -28,17 +28,22 @@
|
||||
#define VMOVU movups
|
||||
#define VMOVA movaps
|
||||
|
||||
-#define MEMSET_VDUP_TO_VEC0_AND_SET_RETURN(d, r) \
|
||||
+# define MEMSET_SET_VEC0_AND_SET_RETURN(d, r) \
|
||||
movd d, %xmm0; \
|
||||
- movq r, %rax; \
|
||||
- punpcklbw %xmm0, %xmm0; \
|
||||
- punpcklwd %xmm0, %xmm0; \
|
||||
- pshufd $0, %xmm0, %xmm0
|
||||
+ pxor %xmm1, %xmm1; \
|
||||
+ pshufb %xmm1, %xmm0; \
|
||||
+ movq r, %rax
|
||||
|
||||
-#define WMEMSET_VDUP_TO_VEC0_AND_SET_RETURN(d, r) \
|
||||
+# define WMEMSET_SET_VEC0_AND_SET_RETURN(d, r) \
|
||||
movd d, %xmm0; \
|
||||
- movq r, %rax; \
|
||||
- pshufd $0, %xmm0, %xmm0
|
||||
+ pshufd $0, %xmm0, %xmm0; \
|
||||
+ movq r, %rax
|
||||
+
|
||||
+# define MEMSET_VDUP_TO_VEC0_HIGH()
|
||||
+# define MEMSET_VDUP_TO_VEC0_LOW()
|
||||
+
|
||||
+# define WMEMSET_VDUP_TO_VEC0_HIGH()
|
||||
+# define WMEMSET_VDUP_TO_VEC0_LOW()
|
||||
|
||||
#define SECTION(p) p
|
||||
|
||||
diff --git a/sysdeps/x86_64/multiarch/memset-avx2-unaligned-erms.S b/sysdeps/x86_64/multiarch/memset-avx2-unaligned-erms.S
|
||||
index 1af668af0aeda59e..c0bf2875d03d51ab 100644
|
||||
--- a/sysdeps/x86_64/multiarch/memset-avx2-unaligned-erms.S
|
||||
+++ b/sysdeps/x86_64/multiarch/memset-avx2-unaligned-erms.S
|
||||
@@ -10,15 +10,18 @@
|
||||
# define VMOVU vmovdqu
|
||||
# define VMOVA vmovdqa
|
||||
|
||||
-# define MEMSET_VDUP_TO_VEC0_AND_SET_RETURN(d, r) \
|
||||
+# define MEMSET_SET_VEC0_AND_SET_RETURN(d, r) \
|
||||
vmovd d, %xmm0; \
|
||||
- movq r, %rax; \
|
||||
- vpbroadcastb %xmm0, %ymm0
|
||||
+ movq r, %rax;
|
||||
|
||||
-# define WMEMSET_VDUP_TO_VEC0_AND_SET_RETURN(d, r) \
|
||||
- vmovd d, %xmm0; \
|
||||
- movq r, %rax; \
|
||||
- vpbroadcastd %xmm0, %ymm0
|
||||
+# define WMEMSET_SET_VEC0_AND_SET_RETURN(d, r) \
|
||||
+ MEMSET_SET_VEC0_AND_SET_RETURN(d, r)
|
||||
+
|
||||
+# define MEMSET_VDUP_TO_VEC0_HIGH() vpbroadcastb %xmm0, %ymm0
|
||||
+# define MEMSET_VDUP_TO_VEC0_LOW() vpbroadcastb %xmm0, %xmm0
|
||||
+
|
||||
+# define WMEMSET_VDUP_TO_VEC0_HIGH() vpbroadcastd %xmm0, %ymm0
|
||||
+# define WMEMSET_VDUP_TO_VEC0_LOW() vpbroadcastd %xmm0, %xmm0
|
||||
|
||||
# ifndef SECTION
|
||||
# define SECTION(p) p##.avx
|
||||
@@ -30,5 +33,6 @@
|
||||
# define WMEMSET_SYMBOL(p,s) p##_avx2_##s
|
||||
# endif
|
||||
|
||||
+# define USE_XMM_LESS_VEC
|
||||
# include "memset-vec-unaligned-erms.S"
|
||||
#endif
|
||||
diff --git a/sysdeps/x86_64/multiarch/memset-avx512-unaligned-erms.S b/sysdeps/x86_64/multiarch/memset-avx512-unaligned-erms.S
|
||||
index f14d6f8493c21a36..5241216a77bf72b7 100644
|
||||
--- a/sysdeps/x86_64/multiarch/memset-avx512-unaligned-erms.S
|
||||
+++ b/sysdeps/x86_64/multiarch/memset-avx512-unaligned-erms.S
|
||||
@@ -15,13 +15,19 @@
|
||||
|
||||
# define VZEROUPPER
|
||||
|
||||
-# define MEMSET_VDUP_TO_VEC0_AND_SET_RETURN(d, r) \
|
||||
- movq r, %rax; \
|
||||
- vpbroadcastb d, %VEC0
|
||||
+# define MEMSET_SET_VEC0_AND_SET_RETURN(d, r) \
|
||||
+ vpbroadcastb d, %VEC0; \
|
||||
+ movq r, %rax
|
||||
|
||||
-# define WMEMSET_VDUP_TO_VEC0_AND_SET_RETURN(d, r) \
|
||||
- movq r, %rax; \
|
||||
- vpbroadcastd d, %VEC0
|
||||
+# define WMEMSET_SET_VEC0_AND_SET_RETURN(d, r) \
|
||||
+ vpbroadcastd d, %VEC0; \
|
||||
+ movq r, %rax
|
||||
+
|
||||
+# define MEMSET_VDUP_TO_VEC0_HIGH()
|
||||
+# define MEMSET_VDUP_TO_VEC0_LOW()
|
||||
+
|
||||
+# define WMEMSET_VDUP_TO_VEC0_HIGH()
|
||||
+# define WMEMSET_VDUP_TO_VEC0_LOW()
|
||||
|
||||
# define SECTION(p) p##.evex512
|
||||
# define MEMSET_SYMBOL(p,s) p##_avx512_##s
|
||||
diff --git a/sysdeps/x86_64/multiarch/memset-evex-unaligned-erms.S b/sysdeps/x86_64/multiarch/memset-evex-unaligned-erms.S
|
||||
index 64b09e77cc20cc42..637002150659123c 100644
|
||||
--- a/sysdeps/x86_64/multiarch/memset-evex-unaligned-erms.S
|
||||
+++ b/sysdeps/x86_64/multiarch/memset-evex-unaligned-erms.S
|
||||
@@ -15,13 +15,19 @@
|
||||
|
||||
# define VZEROUPPER
|
||||
|
||||
-# define MEMSET_VDUP_TO_VEC0_AND_SET_RETURN(d, r) \
|
||||
- movq r, %rax; \
|
||||
- vpbroadcastb d, %VEC0
|
||||
+# define MEMSET_SET_VEC0_AND_SET_RETURN(d, r) \
|
||||
+ vpbroadcastb d, %VEC0; \
|
||||
+ movq r, %rax
|
||||
|
||||
-# define WMEMSET_VDUP_TO_VEC0_AND_SET_RETURN(d, r) \
|
||||
- movq r, %rax; \
|
||||
- vpbroadcastd d, %VEC0
|
||||
+# define WMEMSET_SET_VEC0_AND_SET_RETURN(d, r) \
|
||||
+ vpbroadcastd d, %VEC0; \
|
||||
+ movq r, %rax
|
||||
+
|
||||
+# define MEMSET_VDUP_TO_VEC0_HIGH()
|
||||
+# define MEMSET_VDUP_TO_VEC0_LOW()
|
||||
+
|
||||
+# define WMEMSET_VDUP_TO_VEC0_HIGH()
|
||||
+# define WMEMSET_VDUP_TO_VEC0_LOW()
|
||||
|
||||
# define SECTION(p) p##.evex
|
||||
# define MEMSET_SYMBOL(p,s) p##_evex_##s
|
||||
diff --git a/sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S b/sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S
|
||||
index e723413a664c088f..c8db87dcbf69f0d8 100644
|
||||
--- a/sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S
|
||||
+++ b/sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S
|
||||
@@ -58,8 +58,10 @@
|
||||
#ifndef MOVQ
|
||||
# if VEC_SIZE > 16
|
||||
# define MOVQ vmovq
|
||||
+# define MOVD vmovd
|
||||
# else
|
||||
# define MOVQ movq
|
||||
+# define MOVD movd
|
||||
# endif
|
||||
#endif
|
||||
|
||||
@@ -72,9 +74,17 @@
|
||||
#if defined USE_WITH_EVEX || defined USE_WITH_AVX512
|
||||
# define END_REG rcx
|
||||
# define LOOP_REG rdi
|
||||
+# define LESS_VEC_REG rax
|
||||
#else
|
||||
# define END_REG rdi
|
||||
# define LOOP_REG rdx
|
||||
+# define LESS_VEC_REG rdi
|
||||
+#endif
|
||||
+
|
||||
+#ifdef USE_XMM_LESS_VEC
|
||||
+# define XMM_SMALL 1
|
||||
+#else
|
||||
+# define XMM_SMALL 0
|
||||
#endif
|
||||
|
||||
#define PAGE_SIZE 4096
|
||||
@@ -110,8 +120,12 @@ END_CHK (WMEMSET_CHK_SYMBOL (__wmemset_chk, unaligned))
|
||||
|
||||
ENTRY (WMEMSET_SYMBOL (__wmemset, unaligned))
|
||||
shl $2, %RDX_LP
|
||||
- WMEMSET_VDUP_TO_VEC0_AND_SET_RETURN (%esi, %rdi)
|
||||
- jmp L(entry_from_bzero)
|
||||
+ WMEMSET_SET_VEC0_AND_SET_RETURN (%esi, %rdi)
|
||||
+ WMEMSET_VDUP_TO_VEC0_LOW()
|
||||
+ cmpq $VEC_SIZE, %rdx
|
||||
+ jb L(less_vec_no_vdup)
|
||||
+ WMEMSET_VDUP_TO_VEC0_HIGH()
|
||||
+ jmp L(entry_from_wmemset)
|
||||
END (WMEMSET_SYMBOL (__wmemset, unaligned))
|
||||
#endif
|
||||
|
||||
@@ -123,7 +137,7 @@ END_CHK (MEMSET_CHK_SYMBOL (__memset_chk, unaligned))
|
||||
#endif
|
||||
|
||||
ENTRY (MEMSET_SYMBOL (__memset, unaligned))
|
||||
- MEMSET_VDUP_TO_VEC0_AND_SET_RETURN (%esi, %rdi)
|
||||
+ MEMSET_SET_VEC0_AND_SET_RETURN (%esi, %rdi)
|
||||
# ifdef __ILP32__
|
||||
/* Clear the upper 32 bits. */
|
||||
mov %edx, %edx
|
||||
@@ -131,6 +145,8 @@ ENTRY (MEMSET_SYMBOL (__memset, unaligned))
|
||||
L(entry_from_bzero):
|
||||
cmpq $VEC_SIZE, %rdx
|
||||
jb L(less_vec)
|
||||
+ MEMSET_VDUP_TO_VEC0_HIGH()
|
||||
+L(entry_from_wmemset):
|
||||
cmpq $(VEC_SIZE * 2), %rdx
|
||||
ja L(more_2x_vec)
|
||||
/* From VEC and to 2 * VEC. No branch when size == VEC_SIZE. */
|
||||
@@ -179,27 +195,27 @@ END_CHK (MEMSET_CHK_SYMBOL (__memset_chk, unaligned_erms))
|
||||
# endif
|
||||
|
||||
ENTRY_P2ALIGN (MEMSET_SYMBOL (__memset, unaligned_erms), 6)
|
||||
- MEMSET_VDUP_TO_VEC0_AND_SET_RETURN (%esi, %rdi)
|
||||
+ MEMSET_SET_VEC0_AND_SET_RETURN (%esi, %rdi)
|
||||
# ifdef __ILP32__
|
||||
/* Clear the upper 32 bits. */
|
||||
mov %edx, %edx
|
||||
# endif
|
||||
cmp $VEC_SIZE, %RDX_LP
|
||||
jb L(less_vec)
|
||||
+ MEMSET_VDUP_TO_VEC0_HIGH ()
|
||||
cmp $(VEC_SIZE * 2), %RDX_LP
|
||||
ja L(stosb_more_2x_vec)
|
||||
- /* From VEC and to 2 * VEC. No branch when size == VEC_SIZE.
|
||||
- */
|
||||
- VMOVU %VEC(0), (%rax)
|
||||
- VMOVU %VEC(0), -VEC_SIZE(%rax, %rdx)
|
||||
+ /* From VEC and to 2 * VEC. No branch when size == VEC_SIZE. */
|
||||
+ VMOVU %VEC(0), (%rdi)
|
||||
+ VMOVU %VEC(0), (VEC_SIZE * -1)(%rdi, %rdx)
|
||||
VZEROUPPER_RETURN
|
||||
#endif
|
||||
|
||||
- .p2align 4,, 10
|
||||
+ .p2align 4,, 4
|
||||
L(last_2x_vec):
|
||||
#ifdef USE_LESS_VEC_MASK_STORE
|
||||
- VMOVU %VEC(0), (VEC_SIZE * 2 + LOOP_4X_OFFSET)(%rcx)
|
||||
- VMOVU %VEC(0), (VEC_SIZE * 3 + LOOP_4X_OFFSET)(%rcx)
|
||||
+ VMOVU %VEC(0), (VEC_SIZE * -2)(%rdi, %rdx)
|
||||
+ VMOVU %VEC(0), (VEC_SIZE * -1)(%rdi, %rdx)
|
||||
#else
|
||||
VMOVU %VEC(0), (VEC_SIZE * -2)(%rdi)
|
||||
VMOVU %VEC(0), (VEC_SIZE * -1)(%rdi)
|
||||
@@ -212,6 +228,7 @@ L(last_2x_vec):
|
||||
#ifdef USE_LESS_VEC_MASK_STORE
|
||||
.p2align 4,, 10
|
||||
L(less_vec):
|
||||
+L(less_vec_no_vdup):
|
||||
/* Less than 1 VEC. */
|
||||
# if VEC_SIZE != 16 && VEC_SIZE != 32 && VEC_SIZE != 64
|
||||
# error Unsupported VEC_SIZE!
|
||||
@@ -262,28 +279,18 @@ L(stosb_more_2x_vec):
|
||||
/* Fallthrough goes to L(loop_4x_vec). Tests for memset (2x, 4x]
|
||||
and (4x, 8x] jump to target. */
|
||||
L(more_2x_vec):
|
||||
-
|
||||
- /* Two different methods of setting up pointers / compare. The
|
||||
- two methods are based on the fact that EVEX/AVX512 mov
|
||||
- instructions take more bytes then AVX2/SSE2 mov instructions. As
|
||||
- well that EVEX/AVX512 machines also have fast LEA_BID. Both
|
||||
- setup and END_REG to avoid complex address mode. For EVEX/AVX512
|
||||
- this saves code size and keeps a few targets in one fetch block.
|
||||
- For AVX2/SSE2 this helps prevent AGU bottlenecks. */
|
||||
-#if defined USE_WITH_EVEX || defined USE_WITH_AVX512
|
||||
- /* If EVEX/AVX512 compute END_REG - (VEC_SIZE * 4 +
|
||||
- LOOP_4X_OFFSET) with LEA_BID. */
|
||||
-
|
||||
- /* END_REG is rcx for EVEX/AVX512. */
|
||||
- leaq -(VEC_SIZE * 4 + LOOP_4X_OFFSET)(%rdi, %rdx), %END_REG
|
||||
-#endif
|
||||
-
|
||||
- /* Stores to first 2x VEC before cmp as any path forward will
|
||||
- require it. */
|
||||
- VMOVU %VEC(0), (%rax)
|
||||
- VMOVU %VEC(0), VEC_SIZE(%rax)
|
||||
+ /* Store next 2x vec regardless. */
|
||||
+ VMOVU %VEC(0), (%rdi)
|
||||
+ VMOVU %VEC(0), (VEC_SIZE * 1)(%rdi)
|
||||
|
||||
|
||||
+ /* Two different methods of setting up pointers / compare. The two
|
||||
+ methods are based on the fact that EVEX/AVX512 mov instructions take
|
||||
+ more bytes then AVX2/SSE2 mov instructions. As well that EVEX/AVX512
|
||||
+ machines also have fast LEA_BID. Both setup and END_REG to avoid complex
|
||||
+ address mode. For EVEX/AVX512 this saves code size and keeps a few
|
||||
+ targets in one fetch block. For AVX2/SSE2 this helps prevent AGU
|
||||
+ bottlenecks. */
|
||||
#if !(defined USE_WITH_EVEX || defined USE_WITH_AVX512)
|
||||
/* If AVX2/SSE2 compute END_REG (rdi) with ALU. */
|
||||
addq %rdx, %END_REG
|
||||
@@ -292,6 +299,15 @@ L(more_2x_vec):
|
||||
cmpq $(VEC_SIZE * 4), %rdx
|
||||
jbe L(last_2x_vec)
|
||||
|
||||
+
|
||||
+#if defined USE_WITH_EVEX || defined USE_WITH_AVX512
|
||||
+ /* If EVEX/AVX512 compute END_REG - (VEC_SIZE * 4 + LOOP_4X_OFFSET) with
|
||||
+ LEA_BID. */
|
||||
+
|
||||
+ /* END_REG is rcx for EVEX/AVX512. */
|
||||
+ leaq -(VEC_SIZE * 4 + LOOP_4X_OFFSET)(%rdi, %rdx), %END_REG
|
||||
+#endif
|
||||
+
|
||||
/* Store next 2x vec regardless. */
|
||||
VMOVU %VEC(0), (VEC_SIZE * 2)(%rax)
|
||||
VMOVU %VEC(0), (VEC_SIZE * 3)(%rax)
|
||||
@@ -355,65 +371,93 @@ L(stosb_local):
|
||||
/* Define L(less_vec) only if not otherwise defined. */
|
||||
.p2align 4
|
||||
L(less_vec):
|
||||
+ /* Broadcast esi to partial register (i.e VEC_SIZE == 32 broadcast to
|
||||
+ xmm). This is only does anything for AVX2. */
|
||||
+ MEMSET_VDUP_TO_VEC0_LOW ()
|
||||
+L(less_vec_no_vdup):
|
||||
#endif
|
||||
L(cross_page):
|
||||
#if VEC_SIZE > 32
|
||||
cmpl $32, %edx
|
||||
- jae L(between_32_63)
|
||||
+ jge L(between_32_63)
|
||||
#endif
|
||||
#if VEC_SIZE > 16
|
||||
cmpl $16, %edx
|
||||
- jae L(between_16_31)
|
||||
+ jge L(between_16_31)
|
||||
+#endif
|
||||
+#ifndef USE_XMM_LESS_VEC
|
||||
+ MOVQ %XMM0, %rcx
|
||||
#endif
|
||||
- MOVQ %XMM0, %rdi
|
||||
cmpl $8, %edx
|
||||
- jae L(between_8_15)
|
||||
+ jge L(between_8_15)
|
||||
cmpl $4, %edx
|
||||
- jae L(between_4_7)
|
||||
+ jge L(between_4_7)
|
||||
cmpl $1, %edx
|
||||
- ja L(between_2_3)
|
||||
- jb L(return)
|
||||
- movb %sil, (%rax)
|
||||
- VZEROUPPER_RETURN
|
||||
+ jg L(between_2_3)
|
||||
+ jl L(between_0_0)
|
||||
+ movb %sil, (%LESS_VEC_REG)
|
||||
+L(between_0_0):
|
||||
+ ret
|
||||
|
||||
- /* Align small targets only if not doing so would cross a fetch
|
||||
- line. */
|
||||
+ /* Align small targets only if not doing so would cross a fetch line.
|
||||
+ */
|
||||
#if VEC_SIZE > 32
|
||||
.p2align 4,, SMALL_MEMSET_ALIGN(MOV_SIZE, RET_SIZE)
|
||||
/* From 32 to 63. No branch when size == 32. */
|
||||
L(between_32_63):
|
||||
- VMOVU %YMM0, (%rax)
|
||||
- VMOVU %YMM0, -32(%rax, %rdx)
|
||||
+ VMOVU %YMM0, (%LESS_VEC_REG)
|
||||
+ VMOVU %YMM0, -32(%LESS_VEC_REG, %rdx)
|
||||
VZEROUPPER_RETURN
|
||||
#endif
|
||||
|
||||
#if VEC_SIZE >= 32
|
||||
- .p2align 4,, SMALL_MEMSET_ALIGN(MOV_SIZE, RET_SIZE)
|
||||
+ .p2align 4,, SMALL_MEMSET_ALIGN(MOV_SIZE, 1)
|
||||
L(between_16_31):
|
||||
/* From 16 to 31. No branch when size == 16. */
|
||||
- VMOVU %XMM0, (%rax)
|
||||
- VMOVU %XMM0, -16(%rax, %rdx)
|
||||
- VZEROUPPER_RETURN
|
||||
+ VMOVU %XMM0, (%LESS_VEC_REG)
|
||||
+ VMOVU %XMM0, -16(%LESS_VEC_REG, %rdx)
|
||||
+ ret
|
||||
#endif
|
||||
|
||||
- .p2align 4,, SMALL_MEMSET_ALIGN(3, RET_SIZE)
|
||||
+ /* Move size is 3 for SSE2, EVEX, and AVX512. Move size is 4 for AVX2.
|
||||
+ */
|
||||
+ .p2align 4,, SMALL_MEMSET_ALIGN(3 + XMM_SMALL, 1)
|
||||
L(between_8_15):
|
||||
/* From 8 to 15. No branch when size == 8. */
|
||||
- movq %rdi, (%rax)
|
||||
- movq %rdi, -8(%rax, %rdx)
|
||||
- VZEROUPPER_RETURN
|
||||
+#ifdef USE_XMM_LESS_VEC
|
||||
+ MOVQ %XMM0, (%rdi)
|
||||
+ MOVQ %XMM0, -8(%rdi, %rdx)
|
||||
+#else
|
||||
+ movq %rcx, (%LESS_VEC_REG)
|
||||
+ movq %rcx, -8(%LESS_VEC_REG, %rdx)
|
||||
+#endif
|
||||
+ ret
|
||||
|
||||
- .p2align 4,, SMALL_MEMSET_ALIGN(2, RET_SIZE)
|
||||
+ /* Move size is 2 for SSE2, EVEX, and AVX512. Move size is 4 for AVX2.
|
||||
+ */
|
||||
+ .p2align 4,, SMALL_MEMSET_ALIGN(2 << XMM_SMALL, 1)
|
||||
L(between_4_7):
|
||||
/* From 4 to 7. No branch when size == 4. */
|
||||
- movl %edi, (%rax)
|
||||
- movl %edi, -4(%rax, %rdx)
|
||||
- VZEROUPPER_RETURN
|
||||
+#ifdef USE_XMM_LESS_VEC
|
||||
+ MOVD %XMM0, (%rdi)
|
||||
+ MOVD %XMM0, -4(%rdi, %rdx)
|
||||
+#else
|
||||
+ movl %ecx, (%LESS_VEC_REG)
|
||||
+ movl %ecx, -4(%LESS_VEC_REG, %rdx)
|
||||
+#endif
|
||||
+ ret
|
||||
|
||||
- .p2align 4,, SMALL_MEMSET_ALIGN(3, RET_SIZE)
|
||||
+ /* 4 * XMM_SMALL for the third mov for AVX2. */
|
||||
+ .p2align 4,, 4 * XMM_SMALL + SMALL_MEMSET_ALIGN(3, 1)
|
||||
L(between_2_3):
|
||||
/* From 2 to 3. No branch when size == 2. */
|
||||
- movw %di, (%rax)
|
||||
- movb %dil, -1(%rax, %rdx)
|
||||
- VZEROUPPER_RETURN
|
||||
+#ifdef USE_XMM_LESS_VEC
|
||||
+ movb %sil, (%rdi)
|
||||
+ movb %sil, 1(%rdi)
|
||||
+ movb %sil, -1(%rdi, %rdx)
|
||||
+#else
|
||||
+ movw %cx, (%LESS_VEC_REG)
|
||||
+ movb %sil, -1(%LESS_VEC_REG, %rdx)
|
||||
+#endif
|
||||
+ ret
|
||||
END (MEMSET_SYMBOL (__memset, unaligned_erms))
|
35
glibc-upstream-2.34-206.patch
Normal file
35
glibc-upstream-2.34-206.patch
Normal file
@ -0,0 +1,35 @@
|
||||
commit 190ea5f7e4e7e98b9b6e3f29835ae8b1f6a5442e
|
||||
Author: Noah Goldstein <goldstein.w.n@gmail.com>
|
||||
Date: Mon Feb 7 00:32:23 2022 -0600
|
||||
|
||||
x86: Remove SSSE3 instruction for broadcast in memset.S (SSE2 Only)
|
||||
|
||||
commit b62ace2740a106222e124cc86956448fa07abf4d
|
||||
Author: Noah Goldstein <goldstein.w.n@gmail.com>
|
||||
Date: Sun Feb 6 00:54:18 2022 -0600
|
||||
|
||||
x86: Improve vec generation in memset-vec-unaligned-erms.S
|
||||
|
||||
Revert usage of 'pshufb' in broadcast logic as it is an SSSE3
|
||||
instruction and memset.S is restricted to only SSE2 instructions.
|
||||
|
||||
(cherry picked from commit 1b0c60f95bbe2eded80b2bb5be75c0e45b11cde1)
|
||||
|
||||
diff --git a/sysdeps/x86_64/memset.S b/sysdeps/x86_64/memset.S
|
||||
index 34ee0bfdcb81fb39..954471e5a5bf225b 100644
|
||||
--- a/sysdeps/x86_64/memset.S
|
||||
+++ b/sysdeps/x86_64/memset.S
|
||||
@@ -30,9 +30,10 @@
|
||||
|
||||
# define MEMSET_SET_VEC0_AND_SET_RETURN(d, r) \
|
||||
movd d, %xmm0; \
|
||||
- pxor %xmm1, %xmm1; \
|
||||
- pshufb %xmm1, %xmm0; \
|
||||
- movq r, %rax
|
||||
+ movq r, %rax; \
|
||||
+ punpcklbw %xmm0, %xmm0; \
|
||||
+ punpcklwd %xmm0, %xmm0; \
|
||||
+ pshufd $0, %xmm0, %xmm0
|
||||
|
||||
# define WMEMSET_SET_VEC0_AND_SET_RETURN(d, r) \
|
||||
movd d, %xmm0; \
|
719
glibc-upstream-2.34-207.patch
Normal file
719
glibc-upstream-2.34-207.patch
Normal file
@ -0,0 +1,719 @@
|
||||
commit 5cb6329652696e79d6d576165ea87e332c9de106
|
||||
Author: H.J. Lu <hjl.tools@gmail.com>
|
||||
Date: Mon Feb 7 05:55:15 2022 -0800
|
||||
|
||||
x86-64: Optimize bzero
|
||||
|
||||
memset with zero as the value to set is by far the majority value (99%+
|
||||
for Python3 and GCC).
|
||||
|
||||
bzero can be slightly more optimized for this case by using a zero-idiom
|
||||
xor for broadcasting the set value to a register (vector or GPR).
|
||||
|
||||
Co-developed-by: Noah Goldstein <goldstein.w.n@gmail.com>
|
||||
(cherry picked from commit 3d9f171bfb5325bd5f427e9fc386453358c6e840)
|
||||
|
||||
diff --git a/sysdeps/x86_64/memset.S b/sysdeps/x86_64/memset.S
|
||||
index 954471e5a5bf225b..0358210c7ff3a976 100644
|
||||
--- a/sysdeps/x86_64/memset.S
|
||||
+++ b/sysdeps/x86_64/memset.S
|
||||
@@ -35,6 +35,9 @@
|
||||
punpcklwd %xmm0, %xmm0; \
|
||||
pshufd $0, %xmm0, %xmm0
|
||||
|
||||
+# define BZERO_ZERO_VEC0() \
|
||||
+ pxor %xmm0, %xmm0
|
||||
+
|
||||
# define WMEMSET_SET_VEC0_AND_SET_RETURN(d, r) \
|
||||
movd d, %xmm0; \
|
||||
pshufd $0, %xmm0, %xmm0; \
|
||||
@@ -53,6 +56,10 @@
|
||||
# define MEMSET_SYMBOL(p,s) memset
|
||||
#endif
|
||||
|
||||
+#ifndef BZERO_SYMBOL
|
||||
+# define BZERO_SYMBOL(p,s) __bzero
|
||||
+#endif
|
||||
+
|
||||
#ifndef WMEMSET_SYMBOL
|
||||
# define WMEMSET_CHK_SYMBOL(p,s) p
|
||||
# define WMEMSET_SYMBOL(p,s) __wmemset
|
||||
@@ -63,6 +70,7 @@
|
||||
libc_hidden_builtin_def (memset)
|
||||
|
||||
#if IS_IN (libc)
|
||||
+weak_alias (__bzero, bzero)
|
||||
libc_hidden_def (__wmemset)
|
||||
weak_alias (__wmemset, wmemset)
|
||||
libc_hidden_weak (wmemset)
|
||||
diff --git a/sysdeps/x86_64/multiarch/Makefile b/sysdeps/x86_64/multiarch/Makefile
|
||||
index 26be40959ce62895..37d8d6f0bd2d10cc 100644
|
||||
--- a/sysdeps/x86_64/multiarch/Makefile
|
||||
+++ b/sysdeps/x86_64/multiarch/Makefile
|
||||
@@ -1,85 +1,130 @@
|
||||
ifeq ($(subdir),string)
|
||||
|
||||
-sysdep_routines += strncat-c stpncpy-c strncpy-c \
|
||||
- strcmp-sse2 strcmp-sse2-unaligned strcmp-ssse3 \
|
||||
- strcmp-sse4_2 strcmp-avx2 \
|
||||
- strncmp-sse2 strncmp-ssse3 strncmp-sse4_2 strncmp-avx2 \
|
||||
- memchr-sse2 rawmemchr-sse2 memchr-avx2 rawmemchr-avx2 \
|
||||
- memrchr-sse2 memrchr-avx2 \
|
||||
- memcmp-sse2 \
|
||||
- memcmp-avx2-movbe \
|
||||
- memcmp-sse4 memcpy-ssse3 \
|
||||
- memmove-ssse3 \
|
||||
- memcpy-ssse3-back \
|
||||
- memmove-ssse3-back \
|
||||
- memmove-avx512-no-vzeroupper \
|
||||
- strcasecmp_l-sse2 strcasecmp_l-ssse3 \
|
||||
- strcasecmp_l-sse4_2 strcasecmp_l-avx \
|
||||
- strncase_l-sse2 strncase_l-ssse3 \
|
||||
- strncase_l-sse4_2 strncase_l-avx \
|
||||
- strchr-sse2 strchrnul-sse2 strchr-avx2 strchrnul-avx2 \
|
||||
- strrchr-sse2 strrchr-avx2 \
|
||||
- strlen-sse2 strnlen-sse2 strlen-avx2 strnlen-avx2 \
|
||||
- strcat-avx2 strncat-avx2 \
|
||||
- strcat-ssse3 strncat-ssse3\
|
||||
- strcpy-avx2 strncpy-avx2 \
|
||||
- strcpy-sse2 stpcpy-sse2 \
|
||||
- strcpy-ssse3 strncpy-ssse3 stpcpy-ssse3 stpncpy-ssse3 \
|
||||
- strcpy-sse2-unaligned strncpy-sse2-unaligned \
|
||||
- stpcpy-sse2-unaligned stpncpy-sse2-unaligned \
|
||||
- stpcpy-avx2 stpncpy-avx2 \
|
||||
- strcat-sse2 \
|
||||
- strcat-sse2-unaligned strncat-sse2-unaligned \
|
||||
- strchr-sse2-no-bsf memcmp-ssse3 strstr-sse2-unaligned \
|
||||
- strcspn-sse2 strpbrk-sse2 strspn-sse2 \
|
||||
- strcspn-c strpbrk-c strspn-c varshift \
|
||||
- memset-avx512-no-vzeroupper \
|
||||
- memmove-sse2-unaligned-erms \
|
||||
- memmove-avx-unaligned-erms \
|
||||
- memmove-avx512-unaligned-erms \
|
||||
- memset-sse2-unaligned-erms \
|
||||
- memset-avx2-unaligned-erms \
|
||||
- memset-avx512-unaligned-erms \
|
||||
- memchr-avx2-rtm \
|
||||
- memcmp-avx2-movbe-rtm \
|
||||
- memmove-avx-unaligned-erms-rtm \
|
||||
- memrchr-avx2-rtm \
|
||||
- memset-avx2-unaligned-erms-rtm \
|
||||
- rawmemchr-avx2-rtm \
|
||||
- strchr-avx2-rtm \
|
||||
- strcmp-avx2-rtm \
|
||||
- strchrnul-avx2-rtm \
|
||||
- stpcpy-avx2-rtm \
|
||||
- stpncpy-avx2-rtm \
|
||||
- strcat-avx2-rtm \
|
||||
- strcpy-avx2-rtm \
|
||||
- strlen-avx2-rtm \
|
||||
- strncat-avx2-rtm \
|
||||
- strncmp-avx2-rtm \
|
||||
- strncpy-avx2-rtm \
|
||||
- strnlen-avx2-rtm \
|
||||
- strrchr-avx2-rtm \
|
||||
- memchr-evex \
|
||||
- memcmp-evex-movbe \
|
||||
- memmove-evex-unaligned-erms \
|
||||
- memrchr-evex \
|
||||
- memset-evex-unaligned-erms \
|
||||
- rawmemchr-evex \
|
||||
- stpcpy-evex \
|
||||
- stpncpy-evex \
|
||||
- strcat-evex \
|
||||
- strchr-evex \
|
||||
- strchrnul-evex \
|
||||
- strcmp-evex \
|
||||
- strcpy-evex \
|
||||
- strlen-evex \
|
||||
- strncat-evex \
|
||||
- strncmp-evex \
|
||||
- strncpy-evex \
|
||||
- strnlen-evex \
|
||||
- strrchr-evex \
|
||||
- memchr-evex-rtm \
|
||||
- rawmemchr-evex-rtm
|
||||
+sysdep_routines += \
|
||||
+ bzero \
|
||||
+ memchr-avx2 \
|
||||
+ memchr-avx2-rtm \
|
||||
+ memchr-evex \
|
||||
+ memchr-evex-rtm \
|
||||
+ memchr-sse2 \
|
||||
+ memcmp-avx2-movbe \
|
||||
+ memcmp-avx2-movbe-rtm \
|
||||
+ memcmp-evex-movbe \
|
||||
+ memcmp-sse2 \
|
||||
+ memcmp-sse4 \
|
||||
+ memcmp-ssse3 \
|
||||
+ memcpy-ssse3 \
|
||||
+ memcpy-ssse3-back \
|
||||
+ memmove-avx-unaligned-erms \
|
||||
+ memmove-avx-unaligned-erms-rtm \
|
||||
+ memmove-avx512-no-vzeroupper \
|
||||
+ memmove-avx512-unaligned-erms \
|
||||
+ memmove-evex-unaligned-erms \
|
||||
+ memmove-sse2-unaligned-erms \
|
||||
+ memmove-ssse3 \
|
||||
+ memmove-ssse3-back \
|
||||
+ memrchr-avx2 \
|
||||
+ memrchr-avx2-rtm \
|
||||
+ memrchr-evex \
|
||||
+ memrchr-sse2 \
|
||||
+ memset-avx2-unaligned-erms \
|
||||
+ memset-avx2-unaligned-erms-rtm \
|
||||
+ memset-avx512-no-vzeroupper \
|
||||
+ memset-avx512-unaligned-erms \
|
||||
+ memset-evex-unaligned-erms \
|
||||
+ memset-sse2-unaligned-erms \
|
||||
+ rawmemchr-avx2 \
|
||||
+ rawmemchr-avx2-rtm \
|
||||
+ rawmemchr-evex \
|
||||
+ rawmemchr-evex-rtm \
|
||||
+ rawmemchr-sse2 \
|
||||
+ stpcpy-avx2 \
|
||||
+ stpcpy-avx2-rtm \
|
||||
+ stpcpy-evex \
|
||||
+ stpcpy-sse2 \
|
||||
+ stpcpy-sse2-unaligned \
|
||||
+ stpcpy-ssse3 \
|
||||
+ stpncpy-avx2 \
|
||||
+ stpncpy-avx2-rtm \
|
||||
+ stpncpy-c \
|
||||
+ stpncpy-evex \
|
||||
+ stpncpy-sse2-unaligned \
|
||||
+ stpncpy-ssse3 \
|
||||
+ strcasecmp_l-avx \
|
||||
+ strcasecmp_l-sse2 \
|
||||
+ strcasecmp_l-sse4_2 \
|
||||
+ strcasecmp_l-ssse3 \
|
||||
+ strcat-avx2 \
|
||||
+ strcat-avx2-rtm \
|
||||
+ strcat-evex \
|
||||
+ strcat-sse2 \
|
||||
+ strcat-sse2-unaligned \
|
||||
+ strcat-ssse3 \
|
||||
+ strchr-avx2 \
|
||||
+ strchr-avx2-rtm \
|
||||
+ strchr-evex \
|
||||
+ strchr-sse2 \
|
||||
+ strchr-sse2-no-bsf \
|
||||
+ strchrnul-avx2 \
|
||||
+ strchrnul-avx2-rtm \
|
||||
+ strchrnul-evex \
|
||||
+ strchrnul-sse2 \
|
||||
+ strcmp-avx2 \
|
||||
+ strcmp-avx2-rtm \
|
||||
+ strcmp-evex \
|
||||
+ strcmp-sse2 \
|
||||
+ strcmp-sse2-unaligned \
|
||||
+ strcmp-sse4_2 \
|
||||
+ strcmp-ssse3 \
|
||||
+ strcpy-avx2 \
|
||||
+ strcpy-avx2-rtm \
|
||||
+ strcpy-evex \
|
||||
+ strcpy-sse2 \
|
||||
+ strcpy-sse2-unaligned \
|
||||
+ strcpy-ssse3 \
|
||||
+ strcspn-c \
|
||||
+ strcspn-sse2 \
|
||||
+ strlen-avx2 \
|
||||
+ strlen-avx2-rtm \
|
||||
+ strlen-evex \
|
||||
+ strlen-sse2 \
|
||||
+ strncase_l-avx \
|
||||
+ strncase_l-sse2 \
|
||||
+ strncase_l-sse4_2 \
|
||||
+ strncase_l-ssse3 \
|
||||
+ strncat-avx2 \
|
||||
+ strncat-avx2-rtm \
|
||||
+ strncat-c \
|
||||
+ strncat-evex \
|
||||
+ strncat-sse2-unaligned \
|
||||
+ strncat-ssse3 \
|
||||
+ strncmp-avx2 \
|
||||
+ strncmp-avx2-rtm \
|
||||
+ strncmp-evex \
|
||||
+ strncmp-sse2 \
|
||||
+ strncmp-sse4_2 \
|
||||
+ strncmp-ssse3 \
|
||||
+ strncpy-avx2 \
|
||||
+ strncpy-avx2-rtm \
|
||||
+ strncpy-c \
|
||||
+ strncpy-evex \
|
||||
+ strncpy-sse2-unaligned \
|
||||
+ strncpy-ssse3 \
|
||||
+ strnlen-avx2 \
|
||||
+ strnlen-avx2-rtm \
|
||||
+ strnlen-evex \
|
||||
+ strnlen-sse2 \
|
||||
+ strpbrk-c \
|
||||
+ strpbrk-sse2 \
|
||||
+ strrchr-avx2 \
|
||||
+ strrchr-avx2-rtm \
|
||||
+ strrchr-evex \
|
||||
+ strrchr-sse2 \
|
||||
+ strspn-c \
|
||||
+ strspn-sse2 \
|
||||
+ strstr-sse2-unaligned \
|
||||
+ varshift \
|
||||
+# sysdep_routines
|
||||
CFLAGS-varshift.c += -msse4
|
||||
CFLAGS-strcspn-c.c += -msse4
|
||||
CFLAGS-strpbrk-c.c += -msse4
|
||||
diff --git a/sysdeps/x86_64/multiarch/bzero.c b/sysdeps/x86_64/multiarch/bzero.c
|
||||
new file mode 100644
|
||||
index 0000000000000000..13e399a9a1fbdeb2
|
||||
--- /dev/null
|
||||
+++ b/sysdeps/x86_64/multiarch/bzero.c
|
||||
@@ -0,0 +1,108 @@
|
||||
+/* Multiple versions of bzero.
|
||||
+ All versions must be listed in ifunc-impl-list.c.
|
||||
+ Copyright (C) 2022 Free Software Foundation, Inc.
|
||||
+ This file is part of the GNU C Library.
|
||||
+
|
||||
+ The GNU C Library is free software; you can redistribute it and/or
|
||||
+ modify it under the terms of the GNU Lesser General Public
|
||||
+ License as published by the Free Software Foundation; either
|
||||
+ version 2.1 of the License, or (at your option) any later version.
|
||||
+
|
||||
+ The GNU C Library is distributed in the hope that it will be useful,
|
||||
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
|
||||
+ Lesser General Public License for more details.
|
||||
+
|
||||
+ You should have received a copy of the GNU Lesser General Public
|
||||
+ License along with the GNU C Library; if not, see
|
||||
+ <https://www.gnu.org/licenses/>. */
|
||||
+
|
||||
+/* Define multiple versions only for the definition in libc. */
|
||||
+#if IS_IN (libc)
|
||||
+# define __bzero __redirect___bzero
|
||||
+# include <string.h>
|
||||
+# undef __bzero
|
||||
+
|
||||
+/* OPTIMIZE1 definition required for bzero patch. */
|
||||
+# define OPTIMIZE1(name) EVALUATOR1 (SYMBOL_NAME, name)
|
||||
+# define SYMBOL_NAME __bzero
|
||||
+# include <init-arch.h>
|
||||
+
|
||||
+extern __typeof (REDIRECT_NAME) OPTIMIZE1 (sse2_unaligned)
|
||||
+ attribute_hidden;
|
||||
+extern __typeof (REDIRECT_NAME) OPTIMIZE1 (sse2_unaligned_erms)
|
||||
+ attribute_hidden;
|
||||
+extern __typeof (REDIRECT_NAME) OPTIMIZE1 (avx2_unaligned) attribute_hidden;
|
||||
+extern __typeof (REDIRECT_NAME) OPTIMIZE1 (avx2_unaligned_erms)
|
||||
+ attribute_hidden;
|
||||
+extern __typeof (REDIRECT_NAME) OPTIMIZE1 (avx2_unaligned_rtm)
|
||||
+ attribute_hidden;
|
||||
+extern __typeof (REDIRECT_NAME) OPTIMIZE1 (avx2_unaligned_erms_rtm)
|
||||
+ attribute_hidden;
|
||||
+extern __typeof (REDIRECT_NAME) OPTIMIZE1 (evex_unaligned)
|
||||
+ attribute_hidden;
|
||||
+extern __typeof (REDIRECT_NAME) OPTIMIZE1 (evex_unaligned_erms)
|
||||
+ attribute_hidden;
|
||||
+extern __typeof (REDIRECT_NAME) OPTIMIZE1 (avx512_unaligned)
|
||||
+ attribute_hidden;
|
||||
+extern __typeof (REDIRECT_NAME) OPTIMIZE1 (avx512_unaligned_erms)
|
||||
+ attribute_hidden;
|
||||
+
|
||||
+static inline void *
|
||||
+IFUNC_SELECTOR (void)
|
||||
+{
|
||||
+ const struct cpu_features* cpu_features = __get_cpu_features ();
|
||||
+
|
||||
+ if (CPU_FEATURE_USABLE_P (cpu_features, AVX512F)
|
||||
+ && !CPU_FEATURES_ARCH_P (cpu_features, Prefer_No_AVX512))
|
||||
+ {
|
||||
+ if (CPU_FEATURE_USABLE_P (cpu_features, AVX512VL)
|
||||
+ && CPU_FEATURE_USABLE_P (cpu_features, AVX512BW)
|
||||
+ && CPU_FEATURE_USABLE_P (cpu_features, BMI2))
|
||||
+ {
|
||||
+ if (CPU_FEATURE_USABLE_P (cpu_features, ERMS))
|
||||
+ return OPTIMIZE1 (avx512_unaligned_erms);
|
||||
+
|
||||
+ return OPTIMIZE1 (avx512_unaligned);
|
||||
+ }
|
||||
+ }
|
||||
+
|
||||
+ if (CPU_FEATURE_USABLE_P (cpu_features, AVX2))
|
||||
+ {
|
||||
+ if (CPU_FEATURE_USABLE_P (cpu_features, AVX512VL)
|
||||
+ && CPU_FEATURE_USABLE_P (cpu_features, AVX512BW)
|
||||
+ && CPU_FEATURE_USABLE_P (cpu_features, BMI2))
|
||||
+ {
|
||||
+ if (CPU_FEATURE_USABLE_P (cpu_features, ERMS))
|
||||
+ return OPTIMIZE1 (evex_unaligned_erms);
|
||||
+
|
||||
+ return OPTIMIZE1 (evex_unaligned);
|
||||
+ }
|
||||
+
|
||||
+ if (CPU_FEATURE_USABLE_P (cpu_features, RTM))
|
||||
+ {
|
||||
+ if (CPU_FEATURE_USABLE_P (cpu_features, ERMS))
|
||||
+ return OPTIMIZE1 (avx2_unaligned_erms_rtm);
|
||||
+
|
||||
+ return OPTIMIZE1 (avx2_unaligned_rtm);
|
||||
+ }
|
||||
+
|
||||
+ if (!CPU_FEATURES_ARCH_P (cpu_features, Prefer_No_VZEROUPPER))
|
||||
+ {
|
||||
+ if (CPU_FEATURE_USABLE_P (cpu_features, ERMS))
|
||||
+ return OPTIMIZE1 (avx2_unaligned_erms);
|
||||
+
|
||||
+ return OPTIMIZE1 (avx2_unaligned);
|
||||
+ }
|
||||
+ }
|
||||
+
|
||||
+ if (CPU_FEATURE_USABLE_P (cpu_features, ERMS))
|
||||
+ return OPTIMIZE1 (sse2_unaligned_erms);
|
||||
+
|
||||
+ return OPTIMIZE1 (sse2_unaligned);
|
||||
+}
|
||||
+
|
||||
+libc_ifunc_redirected (__redirect___bzero, __bzero, IFUNC_SELECTOR ());
|
||||
+
|
||||
+weak_alias (__bzero, bzero)
|
||||
+#endif
|
||||
diff --git a/sysdeps/x86_64/multiarch/ifunc-impl-list.c b/sysdeps/x86_64/multiarch/ifunc-impl-list.c
|
||||
index 39ab10613bb0ffea..4992d7bd3206a7c0 100644
|
||||
--- a/sysdeps/x86_64/multiarch/ifunc-impl-list.c
|
||||
+++ b/sysdeps/x86_64/multiarch/ifunc-impl-list.c
|
||||
@@ -282,6 +282,48 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array,
|
||||
__memset_avx512_no_vzeroupper)
|
||||
)
|
||||
|
||||
+ /* Support sysdeps/x86_64/multiarch/bzero.c. */
|
||||
+ IFUNC_IMPL (i, name, bzero,
|
||||
+ IFUNC_IMPL_ADD (array, i, bzero, 1,
|
||||
+ __bzero_sse2_unaligned)
|
||||
+ IFUNC_IMPL_ADD (array, i, bzero, 1,
|
||||
+ __bzero_sse2_unaligned_erms)
|
||||
+ IFUNC_IMPL_ADD (array, i, bzero,
|
||||
+ CPU_FEATURE_USABLE (AVX2),
|
||||
+ __bzero_avx2_unaligned)
|
||||
+ IFUNC_IMPL_ADD (array, i, bzero,
|
||||
+ CPU_FEATURE_USABLE (AVX2),
|
||||
+ __bzero_avx2_unaligned_erms)
|
||||
+ IFUNC_IMPL_ADD (array, i, bzero,
|
||||
+ (CPU_FEATURE_USABLE (AVX2)
|
||||
+ && CPU_FEATURE_USABLE (RTM)),
|
||||
+ __bzero_avx2_unaligned_rtm)
|
||||
+ IFUNC_IMPL_ADD (array, i, bzero,
|
||||
+ (CPU_FEATURE_USABLE (AVX2)
|
||||
+ && CPU_FEATURE_USABLE (RTM)),
|
||||
+ __bzero_avx2_unaligned_erms_rtm)
|
||||
+ IFUNC_IMPL_ADD (array, i, bzero,
|
||||
+ (CPU_FEATURE_USABLE (AVX512VL)
|
||||
+ && CPU_FEATURE_USABLE (AVX512BW)
|
||||
+ && CPU_FEATURE_USABLE (BMI2)),
|
||||
+ __bzero_evex_unaligned)
|
||||
+ IFUNC_IMPL_ADD (array, i, bzero,
|
||||
+ (CPU_FEATURE_USABLE (AVX512VL)
|
||||
+ && CPU_FEATURE_USABLE (AVX512BW)
|
||||
+ && CPU_FEATURE_USABLE (BMI2)),
|
||||
+ __bzero_evex_unaligned_erms)
|
||||
+ IFUNC_IMPL_ADD (array, i, bzero,
|
||||
+ (CPU_FEATURE_USABLE (AVX512VL)
|
||||
+ && CPU_FEATURE_USABLE (AVX512BW)
|
||||
+ && CPU_FEATURE_USABLE (BMI2)),
|
||||
+ __bzero_avx512_unaligned_erms)
|
||||
+ IFUNC_IMPL_ADD (array, i, bzero,
|
||||
+ (CPU_FEATURE_USABLE (AVX512VL)
|
||||
+ && CPU_FEATURE_USABLE (AVX512BW)
|
||||
+ && CPU_FEATURE_USABLE (BMI2)),
|
||||
+ __bzero_avx512_unaligned)
|
||||
+ )
|
||||
+
|
||||
/* Support sysdeps/x86_64/multiarch/rawmemchr.c. */
|
||||
IFUNC_IMPL (i, name, rawmemchr,
|
||||
IFUNC_IMPL_ADD (array, i, rawmemchr,
|
||||
diff --git a/sysdeps/x86_64/multiarch/memset-avx2-unaligned-erms-rtm.S b/sysdeps/x86_64/multiarch/memset-avx2-unaligned-erms-rtm.S
|
||||
index 8ac3e479bba488be..5a5ee6f67299400b 100644
|
||||
--- a/sysdeps/x86_64/multiarch/memset-avx2-unaligned-erms-rtm.S
|
||||
+++ b/sysdeps/x86_64/multiarch/memset-avx2-unaligned-erms-rtm.S
|
||||
@@ -5,6 +5,7 @@
|
||||
|
||||
#define SECTION(p) p##.avx.rtm
|
||||
#define MEMSET_SYMBOL(p,s) p##_avx2_##s##_rtm
|
||||
+#define BZERO_SYMBOL(p,s) p##_avx2_##s##_rtm
|
||||
#define WMEMSET_SYMBOL(p,s) p##_avx2_##s##_rtm
|
||||
|
||||
#include "memset-avx2-unaligned-erms.S"
|
||||
diff --git a/sysdeps/x86_64/multiarch/memset-avx2-unaligned-erms.S b/sysdeps/x86_64/multiarch/memset-avx2-unaligned-erms.S
|
||||
index c0bf2875d03d51ab..a093a2831f3dfa0d 100644
|
||||
--- a/sysdeps/x86_64/multiarch/memset-avx2-unaligned-erms.S
|
||||
+++ b/sysdeps/x86_64/multiarch/memset-avx2-unaligned-erms.S
|
||||
@@ -14,6 +14,9 @@
|
||||
vmovd d, %xmm0; \
|
||||
movq r, %rax;
|
||||
|
||||
+# define BZERO_ZERO_VEC0() \
|
||||
+ vpxor %xmm0, %xmm0, %xmm0
|
||||
+
|
||||
# define WMEMSET_SET_VEC0_AND_SET_RETURN(d, r) \
|
||||
MEMSET_SET_VEC0_AND_SET_RETURN(d, r)
|
||||
|
||||
@@ -29,6 +32,9 @@
|
||||
# ifndef MEMSET_SYMBOL
|
||||
# define MEMSET_SYMBOL(p,s) p##_avx2_##s
|
||||
# endif
|
||||
+# ifndef BZERO_SYMBOL
|
||||
+# define BZERO_SYMBOL(p,s) p##_avx2_##s
|
||||
+# endif
|
||||
# ifndef WMEMSET_SYMBOL
|
||||
# define WMEMSET_SYMBOL(p,s) p##_avx2_##s
|
||||
# endif
|
||||
diff --git a/sysdeps/x86_64/multiarch/memset-avx512-unaligned-erms.S b/sysdeps/x86_64/multiarch/memset-avx512-unaligned-erms.S
|
||||
index 5241216a77bf72b7..727c92133a15900f 100644
|
||||
--- a/sysdeps/x86_64/multiarch/memset-avx512-unaligned-erms.S
|
||||
+++ b/sysdeps/x86_64/multiarch/memset-avx512-unaligned-erms.S
|
||||
@@ -19,6 +19,9 @@
|
||||
vpbroadcastb d, %VEC0; \
|
||||
movq r, %rax
|
||||
|
||||
+# define BZERO_ZERO_VEC0() \
|
||||
+ vpxorq %XMM0, %XMM0, %XMM0
|
||||
+
|
||||
# define WMEMSET_SET_VEC0_AND_SET_RETURN(d, r) \
|
||||
vpbroadcastd d, %VEC0; \
|
||||
movq r, %rax
|
||||
diff --git a/sysdeps/x86_64/multiarch/memset-evex-unaligned-erms.S b/sysdeps/x86_64/multiarch/memset-evex-unaligned-erms.S
|
||||
index 637002150659123c..5d8fa78f05476b10 100644
|
||||
--- a/sysdeps/x86_64/multiarch/memset-evex-unaligned-erms.S
|
||||
+++ b/sysdeps/x86_64/multiarch/memset-evex-unaligned-erms.S
|
||||
@@ -19,6 +19,9 @@
|
||||
vpbroadcastb d, %VEC0; \
|
||||
movq r, %rax
|
||||
|
||||
+# define BZERO_ZERO_VEC0() \
|
||||
+ vpxorq %XMM0, %XMM0, %XMM0
|
||||
+
|
||||
# define WMEMSET_SET_VEC0_AND_SET_RETURN(d, r) \
|
||||
vpbroadcastd d, %VEC0; \
|
||||
movq r, %rax
|
||||
diff --git a/sysdeps/x86_64/multiarch/memset-sse2-unaligned-erms.S b/sysdeps/x86_64/multiarch/memset-sse2-unaligned-erms.S
|
||||
index e4e95fc19fe48d2d..bac74ac37fd3c144 100644
|
||||
--- a/sysdeps/x86_64/multiarch/memset-sse2-unaligned-erms.S
|
||||
+++ b/sysdeps/x86_64/multiarch/memset-sse2-unaligned-erms.S
|
||||
@@ -22,6 +22,7 @@
|
||||
|
||||
#if IS_IN (libc)
|
||||
# define MEMSET_SYMBOL(p,s) p##_sse2_##s
|
||||
+# define BZERO_SYMBOL(p,s) MEMSET_SYMBOL (p, s)
|
||||
# define WMEMSET_SYMBOL(p,s) p##_sse2_##s
|
||||
|
||||
# ifdef SHARED
|
||||
diff --git a/sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S b/sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S
|
||||
index c8db87dcbf69f0d8..39a096a594ccb5b6 100644
|
||||
--- a/sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S
|
||||
+++ b/sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S
|
||||
@@ -26,6 +26,10 @@
|
||||
|
||||
#include <sysdep.h>
|
||||
|
||||
+#ifndef BZERO_SYMBOL
|
||||
+# define BZERO_SYMBOL(p,s) MEMSET_SYMBOL (p, s)
|
||||
+#endif
|
||||
+
|
||||
#ifndef MEMSET_CHK_SYMBOL
|
||||
# define MEMSET_CHK_SYMBOL(p,s) MEMSET_SYMBOL(p, s)
|
||||
#endif
|
||||
@@ -87,6 +91,18 @@
|
||||
# define XMM_SMALL 0
|
||||
#endif
|
||||
|
||||
+#ifdef USE_LESS_VEC_MASK_STORE
|
||||
+# define SET_REG64 rcx
|
||||
+# define SET_REG32 ecx
|
||||
+# define SET_REG16 cx
|
||||
+# define SET_REG8 cl
|
||||
+#else
|
||||
+# define SET_REG64 rsi
|
||||
+# define SET_REG32 esi
|
||||
+# define SET_REG16 si
|
||||
+# define SET_REG8 sil
|
||||
+#endif
|
||||
+
|
||||
#define PAGE_SIZE 4096
|
||||
|
||||
/* Macro to calculate size of small memset block for aligning
|
||||
@@ -96,18 +112,6 @@
|
||||
|
||||
#ifndef SECTION
|
||||
# error SECTION is not defined!
|
||||
-#endif
|
||||
-
|
||||
- .section SECTION(.text),"ax",@progbits
|
||||
-#if VEC_SIZE == 16 && IS_IN (libc)
|
||||
-ENTRY (__bzero)
|
||||
- mov %RDI_LP, %RAX_LP /* Set return value. */
|
||||
- mov %RSI_LP, %RDX_LP /* Set n. */
|
||||
- xorl %esi, %esi
|
||||
- pxor %XMM0, %XMM0
|
||||
- jmp L(entry_from_bzero)
|
||||
-END (__bzero)
|
||||
-weak_alias (__bzero, bzero)
|
||||
#endif
|
||||
|
||||
#if IS_IN (libc)
|
||||
@@ -123,12 +127,37 @@ ENTRY (WMEMSET_SYMBOL (__wmemset, unaligned))
|
||||
WMEMSET_SET_VEC0_AND_SET_RETURN (%esi, %rdi)
|
||||
WMEMSET_VDUP_TO_VEC0_LOW()
|
||||
cmpq $VEC_SIZE, %rdx
|
||||
- jb L(less_vec_no_vdup)
|
||||
+ jb L(less_vec_from_wmemset)
|
||||
WMEMSET_VDUP_TO_VEC0_HIGH()
|
||||
jmp L(entry_from_wmemset)
|
||||
END (WMEMSET_SYMBOL (__wmemset, unaligned))
|
||||
#endif
|
||||
|
||||
+ENTRY (BZERO_SYMBOL(__bzero, unaligned))
|
||||
+#if VEC_SIZE > 16
|
||||
+ BZERO_ZERO_VEC0 ()
|
||||
+#endif
|
||||
+ mov %RDI_LP, %RAX_LP
|
||||
+ mov %RSI_LP, %RDX_LP
|
||||
+#ifndef USE_LESS_VEC_MASK_STORE
|
||||
+ xorl %esi, %esi
|
||||
+#endif
|
||||
+ cmp $VEC_SIZE, %RDX_LP
|
||||
+ jb L(less_vec_no_vdup)
|
||||
+#ifdef USE_LESS_VEC_MASK_STORE
|
||||
+ xorl %esi, %esi
|
||||
+#endif
|
||||
+#if VEC_SIZE <= 16
|
||||
+ BZERO_ZERO_VEC0 ()
|
||||
+#endif
|
||||
+ cmp $(VEC_SIZE * 2), %RDX_LP
|
||||
+ ja L(more_2x_vec)
|
||||
+ /* From VEC and to 2 * VEC. No branch when size == VEC_SIZE. */
|
||||
+ VMOVU %VEC(0), (%rdi)
|
||||
+ VMOVU %VEC(0), (VEC_SIZE * -1)(%rdi, %rdx)
|
||||
+ VZEROUPPER_RETURN
|
||||
+END (BZERO_SYMBOL(__bzero, unaligned))
|
||||
+
|
||||
#if defined SHARED && IS_IN (libc)
|
||||
ENTRY_CHK (MEMSET_CHK_SYMBOL (__memset_chk, unaligned))
|
||||
cmp %RDX_LP, %RCX_LP
|
||||
@@ -142,7 +171,6 @@ ENTRY (MEMSET_SYMBOL (__memset, unaligned))
|
||||
/* Clear the upper 32 bits. */
|
||||
mov %edx, %edx
|
||||
# endif
|
||||
-L(entry_from_bzero):
|
||||
cmpq $VEC_SIZE, %rdx
|
||||
jb L(less_vec)
|
||||
MEMSET_VDUP_TO_VEC0_HIGH()
|
||||
@@ -187,6 +215,31 @@ END (__memset_erms)
|
||||
END (MEMSET_SYMBOL (__memset, erms))
|
||||
# endif
|
||||
|
||||
+ENTRY_P2ALIGN (BZERO_SYMBOL(__bzero, unaligned_erms), 6)
|
||||
+# if VEC_SIZE > 16
|
||||
+ BZERO_ZERO_VEC0 ()
|
||||
+# endif
|
||||
+ mov %RDI_LP, %RAX_LP
|
||||
+ mov %RSI_LP, %RDX_LP
|
||||
+# ifndef USE_LESS_VEC_MASK_STORE
|
||||
+ xorl %esi, %esi
|
||||
+# endif
|
||||
+ cmp $VEC_SIZE, %RDX_LP
|
||||
+ jb L(less_vec_no_vdup)
|
||||
+# ifdef USE_LESS_VEC_MASK_STORE
|
||||
+ xorl %esi, %esi
|
||||
+# endif
|
||||
+# if VEC_SIZE <= 16
|
||||
+ BZERO_ZERO_VEC0 ()
|
||||
+# endif
|
||||
+ cmp $(VEC_SIZE * 2), %RDX_LP
|
||||
+ ja L(stosb_more_2x_vec)
|
||||
+ /* From VEC and to 2 * VEC. No branch when size == VEC_SIZE. */
|
||||
+ VMOVU %VEC(0), (%rdi)
|
||||
+ VMOVU %VEC(0), (VEC_SIZE * -1)(%rdi, %rdx)
|
||||
+ VZEROUPPER_RETURN
|
||||
+END (BZERO_SYMBOL(__bzero, unaligned_erms))
|
||||
+
|
||||
# if defined SHARED && IS_IN (libc)
|
||||
ENTRY_CHK (MEMSET_CHK_SYMBOL (__memset_chk, unaligned_erms))
|
||||
cmp %RDX_LP, %RCX_LP
|
||||
@@ -229,6 +282,7 @@ L(last_2x_vec):
|
||||
.p2align 4,, 10
|
||||
L(less_vec):
|
||||
L(less_vec_no_vdup):
|
||||
+L(less_vec_from_wmemset):
|
||||
/* Less than 1 VEC. */
|
||||
# if VEC_SIZE != 16 && VEC_SIZE != 32 && VEC_SIZE != 64
|
||||
# error Unsupported VEC_SIZE!
|
||||
@@ -374,8 +428,11 @@ L(less_vec):
|
||||
/* Broadcast esi to partial register (i.e VEC_SIZE == 32 broadcast to
|
||||
xmm). This is only does anything for AVX2. */
|
||||
MEMSET_VDUP_TO_VEC0_LOW ()
|
||||
+L(less_vec_from_wmemset):
|
||||
+#if VEC_SIZE > 16
|
||||
L(less_vec_no_vdup):
|
||||
#endif
|
||||
+#endif
|
||||
L(cross_page):
|
||||
#if VEC_SIZE > 32
|
||||
cmpl $32, %edx
|
||||
@@ -386,7 +443,10 @@ L(cross_page):
|
||||
jge L(between_16_31)
|
||||
#endif
|
||||
#ifndef USE_XMM_LESS_VEC
|
||||
- MOVQ %XMM0, %rcx
|
||||
+ MOVQ %XMM0, %SET_REG64
|
||||
+#endif
|
||||
+#if VEC_SIZE <= 16
|
||||
+L(less_vec_no_vdup):
|
||||
#endif
|
||||
cmpl $8, %edx
|
||||
jge L(between_8_15)
|
||||
@@ -395,7 +455,7 @@ L(cross_page):
|
||||
cmpl $1, %edx
|
||||
jg L(between_2_3)
|
||||
jl L(between_0_0)
|
||||
- movb %sil, (%LESS_VEC_REG)
|
||||
+ movb %SET_REG8, (%LESS_VEC_REG)
|
||||
L(between_0_0):
|
||||
ret
|
||||
|
||||
@@ -428,8 +488,8 @@ L(between_8_15):
|
||||
MOVQ %XMM0, (%rdi)
|
||||
MOVQ %XMM0, -8(%rdi, %rdx)
|
||||
#else
|
||||
- movq %rcx, (%LESS_VEC_REG)
|
||||
- movq %rcx, -8(%LESS_VEC_REG, %rdx)
|
||||
+ movq %SET_REG64, (%LESS_VEC_REG)
|
||||
+ movq %SET_REG64, -8(%LESS_VEC_REG, %rdx)
|
||||
#endif
|
||||
ret
|
||||
|
||||
@@ -442,8 +502,8 @@ L(between_4_7):
|
||||
MOVD %XMM0, (%rdi)
|
||||
MOVD %XMM0, -4(%rdi, %rdx)
|
||||
#else
|
||||
- movl %ecx, (%LESS_VEC_REG)
|
||||
- movl %ecx, -4(%LESS_VEC_REG, %rdx)
|
||||
+ movl %SET_REG32, (%LESS_VEC_REG)
|
||||
+ movl %SET_REG32, -4(%LESS_VEC_REG, %rdx)
|
||||
#endif
|
||||
ret
|
||||
|
||||
@@ -452,12 +512,12 @@ L(between_4_7):
|
||||
L(between_2_3):
|
||||
/* From 2 to 3. No branch when size == 2. */
|
||||
#ifdef USE_XMM_LESS_VEC
|
||||
- movb %sil, (%rdi)
|
||||
- movb %sil, 1(%rdi)
|
||||
- movb %sil, -1(%rdi, %rdx)
|
||||
+ movb %SET_REG8, (%rdi)
|
||||
+ movb %SET_REG8, 1(%rdi)
|
||||
+ movb %SET_REG8, -1(%rdi, %rdx)
|
||||
#else
|
||||
- movw %cx, (%LESS_VEC_REG)
|
||||
- movb %sil, -1(%LESS_VEC_REG, %rdx)
|
||||
+ movw %SET_REG16, (%LESS_VEC_REG)
|
||||
+ movb %SET_REG8, -1(%LESS_VEC_REG, %rdx)
|
||||
#endif
|
||||
ret
|
||||
END (MEMSET_SYMBOL (__memset, unaligned_erms))
|
29
glibc-upstream-2.34-208.patch
Normal file
29
glibc-upstream-2.34-208.patch
Normal file
@ -0,0 +1,29 @@
|
||||
commit 70509f9b4807295b2b4b43bffe110580fc0381ef
|
||||
Author: Noah Goldstein <goldstein.w.n@gmail.com>
|
||||
Date: Sat Feb 12 00:45:00 2022 -0600
|
||||
|
||||
x86: Set .text section in memset-vec-unaligned-erms
|
||||
|
||||
commit 3d9f171bfb5325bd5f427e9fc386453358c6e840
|
||||
Author: H.J. Lu <hjl.tools@gmail.com>
|
||||
Date: Mon Feb 7 05:55:15 2022 -0800
|
||||
|
||||
x86-64: Optimize bzero
|
||||
|
||||
Remove setting the .text section for the code. This commit
|
||||
adds that back.
|
||||
|
||||
(cherry picked from commit 7912236f4a597deb092650ca79f33504ddb4af28)
|
||||
|
||||
diff --git a/sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S b/sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S
|
||||
index 39a096a594ccb5b6..d9c577fb5ff9700f 100644
|
||||
--- a/sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S
|
||||
+++ b/sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S
|
||||
@@ -114,6 +114,7 @@
|
||||
# error SECTION is not defined!
|
||||
#endif
|
||||
|
||||
+ .section SECTION(.text), "ax", @progbits
|
||||
#if IS_IN (libc)
|
||||
# if defined SHARED
|
||||
ENTRY_CHK (WMEMSET_CHK_SYMBOL (__wmemset_chk, unaligned))
|
76
glibc-upstream-2.34-209.patch
Normal file
76
glibc-upstream-2.34-209.patch
Normal file
@ -0,0 +1,76 @@
|
||||
commit 5373c90f2ea3c3fa9931a684c9b81c648dfbe8d7
|
||||
Author: Noah Goldstein <goldstein.w.n@gmail.com>
|
||||
Date: Tue Feb 15 20:27:21 2022 -0600
|
||||
|
||||
x86: Fix bug in strncmp-evex and strncmp-avx2 [BZ #28895]
|
||||
|
||||
Logic can read before the start of `s1` / `s2` if both `s1` and `s2`
|
||||
are near the start of a page. To avoid having the result contimated by
|
||||
these comparisons the `strcmp` variants would mask off these
|
||||
comparisons. This was missing in the `strncmp` variants causing
|
||||
the bug. This commit adds the masking to `strncmp` so that out of
|
||||
range comparisons don't affect the result.
|
||||
|
||||
test-strcmp, test-strncmp, test-wcscmp, and test-wcsncmp all pass as
|
||||
well a full xcheck on x86_64 linux.
|
||||
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
|
||||
|
||||
(cherry picked from commit e108c02a5e23c8c88ce66d8705d4a24bb6b9a8bf)
|
||||
|
||||
diff --git a/string/test-strncmp.c b/string/test-strncmp.c
|
||||
index 97e831d88fd24316..56e23670ae7f90e4 100644
|
||||
--- a/string/test-strncmp.c
|
||||
+++ b/string/test-strncmp.c
|
||||
@@ -438,13 +438,23 @@ check3 (void)
|
||||
static void
|
||||
check4 (void)
|
||||
{
|
||||
- const CHAR *s1 = L ("abc");
|
||||
- CHAR *s2 = STRDUP (s1);
|
||||
+ /* To trigger bug 28895; We need 1) both s1 and s2 to be within 32 bytes of
|
||||
+ the end of the page. 2) For there to be no mismatch/null byte before the
|
||||
+ first page cross. 3) For length (`n`) to be large enough for one string to
|
||||
+ cross the page. And 4) for there to be either mismatch/null bytes before
|
||||
+ the start of the strings. */
|
||||
+
|
||||
+ size_t size = 10;
|
||||
+ size_t addr_mask = (getpagesize () - 1) ^ (sizeof (CHAR) - 1);
|
||||
+ CHAR *s1 = (CHAR *)(buf1 + (addr_mask & 0xffa));
|
||||
+ CHAR *s2 = (CHAR *)(buf2 + (addr_mask & 0xfed));
|
||||
+ int exp_result;
|
||||
|
||||
+ STRCPY (s1, L ("tst-tlsmod%"));
|
||||
+ STRCPY (s2, L ("tst-tls-manydynamic73mod"));
|
||||
+ exp_result = SIMPLE_STRNCMP (s1, s2, size);
|
||||
FOR_EACH_IMPL (impl, 0)
|
||||
- check_result (impl, s1, s2, SIZE_MAX, 0);
|
||||
-
|
||||
- free (s2);
|
||||
+ check_result (impl, s1, s2, size, exp_result);
|
||||
}
|
||||
|
||||
int
|
||||
diff --git a/sysdeps/x86_64/multiarch/strcmp-avx2.S b/sysdeps/x86_64/multiarch/strcmp-avx2.S
|
||||
index cdded412a70bad10..f9bdc5ccd03aa1f9 100644
|
||||
--- a/sysdeps/x86_64/multiarch/strcmp-avx2.S
|
||||
+++ b/sysdeps/x86_64/multiarch/strcmp-avx2.S
|
||||
@@ -661,6 +661,7 @@ L(ret8):
|
||||
# ifdef USE_AS_STRNCMP
|
||||
.p2align 4,, 10
|
||||
L(return_page_cross_end_check):
|
||||
+ andl %r10d, %ecx
|
||||
tzcntl %ecx, %ecx
|
||||
leal -VEC_SIZE(%rax, %rcx), %ecx
|
||||
cmpl %ecx, %edx
|
||||
diff --git a/sysdeps/x86_64/multiarch/strcmp-evex.S b/sysdeps/x86_64/multiarch/strcmp-evex.S
|
||||
index ed56af8ecdad48b2..0dfa62bd149c02b4 100644
|
||||
--- a/sysdeps/x86_64/multiarch/strcmp-evex.S
|
||||
+++ b/sysdeps/x86_64/multiarch/strcmp-evex.S
|
||||
@@ -689,6 +689,7 @@ L(ret8):
|
||||
# ifdef USE_AS_STRNCMP
|
||||
.p2align 4,, 10
|
||||
L(return_page_cross_end_check):
|
||||
+ andl %r10d, %ecx
|
||||
tzcntl %ecx, %ecx
|
||||
leal -VEC_SIZE(%rax, %rcx, SIZE_OF_CHAR), %ecx
|
||||
# ifdef USE_AS_WCSCMP
|
71
glibc-upstream-2.34-210.patch
Normal file
71
glibc-upstream-2.34-210.patch
Normal file
@ -0,0 +1,71 @@
|
||||
commit e123f08ad5ea4691bc37430ce536988c221332d6
|
||||
Author: Noah Goldstein <goldstein.w.n@gmail.com>
|
||||
Date: Thu Mar 24 15:50:33 2022 -0500
|
||||
|
||||
x86: Fix fallback for wcsncmp_avx2 in strcmp-avx2.S [BZ #28896]
|
||||
|
||||
Overflow case for __wcsncmp_avx2_rtm should be __wcscmp_avx2_rtm not
|
||||
__wcscmp_avx2.
|
||||
|
||||
commit ddf0992cf57a93200e0c782e2a94d0733a5a0b87
|
||||
Author: Noah Goldstein <goldstein.w.n@gmail.com>
|
||||
Date: Sun Jan 9 16:02:21 2022 -0600
|
||||
|
||||
x86: Fix __wcsncmp_avx2 in strcmp-avx2.S [BZ# 28755]
|
||||
|
||||
Set the wrong fallback function for `__wcsncmp_avx2_rtm`. It was set
|
||||
to fallback on to `__wcscmp_avx2` instead of `__wcscmp_avx2_rtm` which
|
||||
can cause spurious aborts.
|
||||
|
||||
This change will need to be backported.
|
||||
|
||||
All string/memory tests pass.
|
||||
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
|
||||
|
||||
(cherry picked from commit 9fef7039a7d04947bc89296ee0d187bc8d89b772)
|
||||
|
||||
diff --git a/sysdeps/x86/tst-strncmp-rtm.c b/sysdeps/x86/tst-strncmp-rtm.c
|
||||
index aef9866cf2fbe774..ba6543be8ce13927 100644
|
||||
--- a/sysdeps/x86/tst-strncmp-rtm.c
|
||||
+++ b/sysdeps/x86/tst-strncmp-rtm.c
|
||||
@@ -70,6 +70,16 @@ function_overflow (void)
|
||||
return 1;
|
||||
}
|
||||
|
||||
+__attribute__ ((noinline, noclone))
|
||||
+static int
|
||||
+function_overflow2 (void)
|
||||
+{
|
||||
+ if (STRNCMP (string1, string2, SIZE_MAX >> 4) == 0)
|
||||
+ return 0;
|
||||
+ else
|
||||
+ return 1;
|
||||
+}
|
||||
+
|
||||
static int
|
||||
do_test (void)
|
||||
{
|
||||
@@ -77,5 +87,10 @@ do_test (void)
|
||||
if (status != EXIT_SUCCESS)
|
||||
return status;
|
||||
status = do_test_1 (TEST_NAME, LOOP, prepare, function_overflow);
|
||||
+ if (status != EXIT_SUCCESS)
|
||||
+ return status;
|
||||
+ status = do_test_1 (TEST_NAME, LOOP, prepare, function_overflow2);
|
||||
+ if (status != EXIT_SUCCESS)
|
||||
+ return status;
|
||||
return status;
|
||||
}
|
||||
diff --git a/sysdeps/x86_64/multiarch/strcmp-avx2.S b/sysdeps/x86_64/multiarch/strcmp-avx2.S
|
||||
index f9bdc5ccd03aa1f9..09a73942086f9c9f 100644
|
||||
--- a/sysdeps/x86_64/multiarch/strcmp-avx2.S
|
||||
+++ b/sysdeps/x86_64/multiarch/strcmp-avx2.S
|
||||
@@ -122,7 +122,7 @@ ENTRY(STRCMP)
|
||||
are cases where length is large enough that it can never be a
|
||||
bound on valid memory so just use wcscmp. */
|
||||
shrq $56, %rcx
|
||||
- jnz __wcscmp_avx2
|
||||
+ jnz OVERFLOW_STRCMP
|
||||
|
||||
leaq (, %rdx, 4), %rdx
|
||||
# endif
|
170
glibc-upstream-2.34-211.patch
Normal file
170
glibc-upstream-2.34-211.patch
Normal file
@ -0,0 +1,170 @@
|
||||
commit e4a2fb76efb45210c541ee3f8ef32f317783c3a8
|
||||
Author: Florian Weimer <fweimer@redhat.com>
|
||||
Date: Wed May 11 20:30:49 2022 +0200
|
||||
|
||||
manual: Document the dlinfo function
|
||||
|
||||
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
|
||||
Tested-by: Carlos O'Donell <carlos@rehdat.com>
|
||||
(cherry picked from commit 93804a1ee084d4bdc620b2b9f91615c7da0fabe1)
|
||||
|
||||
Also includes partial backport of commit 5d28a8962dcb6ec056b81d730e
|
||||
(the addition of manual/dynlink.texi).
|
||||
|
||||
diff --git a/manual/Makefile b/manual/Makefile
|
||||
index e83444341e282916..31678681ef059e0f 100644
|
||||
--- a/manual/Makefile
|
||||
+++ b/manual/Makefile
|
||||
@@ -39,7 +39,7 @@ chapters = $(addsuffix .texi, \
|
||||
pipe socket terminal syslog math arith time \
|
||||
resource setjmp signal startup process ipc job \
|
||||
nss users sysinfo conf crypt debug threads \
|
||||
- probes tunables)
|
||||
+ dynlink probes tunables)
|
||||
appendices = lang.texi header.texi install.texi maint.texi platform.texi \
|
||||
contrib.texi
|
||||
licenses = freemanuals.texi lgpl-2.1.texi fdl-1.3.texi
|
||||
diff --git a/manual/dynlink.texi b/manual/dynlink.texi
|
||||
new file mode 100644
|
||||
index 0000000000000000..dbf3de11769d8e57
|
||||
--- /dev/null
|
||||
+++ b/manual/dynlink.texi
|
||||
@@ -0,0 +1,100 @@
|
||||
+@node Dynamic Linker
|
||||
+@c @node Dynamic Linker, Internal Probes, Threads, Top
|
||||
+@c %MENU% Loading programs and shared objects.
|
||||
+@chapter Dynamic Linker
|
||||
+@cindex dynamic linker
|
||||
+@cindex dynamic loader
|
||||
+
|
||||
+The @dfn{dynamic linker} is responsible for loading dynamically linked
|
||||
+programs and their dependencies (in the form of shared objects). The
|
||||
+dynamic linker in @theglibc{} also supports loading shared objects (such
|
||||
+as plugins) later at run time.
|
||||
+
|
||||
+Dynamic linkers are sometimes called @dfn{dynamic loaders}.
|
||||
+
|
||||
+@menu
|
||||
+* Dynamic Linker Introspection:: Interfaces for querying mapping information.
|
||||
+@end menu
|
||||
+
|
||||
+@node Dynamic Linker Introspection
|
||||
+@section Dynamic Linker Introspection
|
||||
+
|
||||
+@Theglibc{} provides various functions for querying information from the
|
||||
+dynamic linker.
|
||||
+
|
||||
+@deftypefun {int} dlinfo (void *@var{handle}, int @var{request}, void *@var{arg})
|
||||
+@safety{@mtsafe{}@asunsafe{@asucorrupt{}}@acunsafe{@acucorrupt{}}}
|
||||
+@standards{GNU, dlfcn.h}
|
||||
+This function returns information about @var{handle} in the memory
|
||||
+location @var{arg}, based on @var{request}. The @var{handle} argument
|
||||
+must be a pointer returned by @code{dlopen} or @code{dlmopen}; it must
|
||||
+not have been closed by @code{dlclose}.
|
||||
+
|
||||
+On success, @code{dlinfo} returns 0. If there is an error, the function
|
||||
+returns @math{-1}, and @code{dlerror} can be used to obtain a
|
||||
+corresponding error message.
|
||||
+
|
||||
+The following operations are defined for use with @var{request}:
|
||||
+
|
||||
+@vtable @code
|
||||
+@item RTLD_DI_LINKMAP
|
||||
+The corresponding @code{struct link_map} pointer for @var{handle} is
|
||||
+written to @code{*@var{arg}}. The @var{arg} argument must be the
|
||||
+address of an object of type @code{struct link_map *}.
|
||||
+
|
||||
+@item RTLD_DI_LMID
|
||||
+The namespace identifier of @var{handle} is written to
|
||||
+@code{*@var{arg}}. The @var{arg} argument must be the address of an
|
||||
+object of type @code{Lmid_t}.
|
||||
+
|
||||
+@item RTLD_DI_ORIGIN
|
||||
+The value of the @code{$ORIGIN} dynamic string token for @var{handle} is
|
||||
+written to the character array starting at @var{arg} as a
|
||||
+null-terminated string.
|
||||
+
|
||||
+This request type should not be used because it is prone to buffer
|
||||
+overflows.
|
||||
+
|
||||
+@item RTLD_DI_SERINFO
|
||||
+@itemx RTLD_DI_SERINFOSIZE
|
||||
+These requests can be used to obtain search path information for
|
||||
+@var{handle}. For both requests, @var{arg} must point to a
|
||||
+@code{Dl_serinfo} object. The @code{RTLD_DI_SERINFOSIZE} request must
|
||||
+be made first; it updates the @code{dls_size} and @code{dls_cnt} members
|
||||
+of the @code{Dl_serinfo} object. The caller should then allocate memory
|
||||
+to store at least @code{dls_size} bytes and pass that buffer to a
|
||||
+@code{RTLD_DI_SERINFO} request. This second request fills the
|
||||
+@code{dls_serpath} array. The number of array elements was returned in
|
||||
+the @code{dls_cnt} member in the initial @code{RTLD_DI_SERINFOSIZE}
|
||||
+request. The caller is responsible for freeing the allocated buffer.
|
||||
+
|
||||
+This interface is prone to buffer overflows in multi-threaded processes
|
||||
+because the required size can change between the
|
||||
+@code{RTLD_DI_SERINFOSIZE} and @code{RTLD_DI_SERINFO} requests.
|
||||
+
|
||||
+@item RTLD_DI_TLS_DATA
|
||||
+This request writes the address of the TLS block (in the current thread)
|
||||
+for the shared object identified by @var{handle} to @code{*@var{arg}}.
|
||||
+The argument @var{arg} must be the address of an object of type
|
||||
+@code{void *}. A null pointer is written if the object does not have
|
||||
+any associated TLS block.
|
||||
+
|
||||
+@item RTLD_DI_TLS_MODID
|
||||
+This request writes the TLS module ID for the shared object @var{handle}
|
||||
+to @code{*@var{arg}}. The argument @var{arg} must be the address of an
|
||||
+object of type @code{size_t}. The module ID is zero if the object
|
||||
+does not have an associated TLS block.
|
||||
+@end vtable
|
||||
+
|
||||
+The @code{dlinfo} function is a GNU extension.
|
||||
+@end deftypefun
|
||||
+
|
||||
+@c FIXME these are undocumented:
|
||||
+@c dladdr
|
||||
+@c dladdr1
|
||||
+@c dlclose
|
||||
+@c dlerror
|
||||
+@c dlmopen
|
||||
+@c dlopen
|
||||
+@c dlsym
|
||||
+@c dlvsym
|
||||
diff --git a/manual/libdl.texi b/manual/libdl.texi
|
||||
deleted file mode 100644
|
||||
index e3fe0452d9f41d47..0000000000000000
|
||||
--- a/manual/libdl.texi
|
||||
+++ /dev/null
|
||||
@@ -1,10 +0,0 @@
|
||||
-@c FIXME these are undocumented:
|
||||
-@c dladdr
|
||||
-@c dladdr1
|
||||
-@c dlclose
|
||||
-@c dlerror
|
||||
-@c dlinfo
|
||||
-@c dlmopen
|
||||
-@c dlopen
|
||||
-@c dlsym
|
||||
-@c dlvsym
|
||||
diff --git a/manual/probes.texi b/manual/probes.texi
|
||||
index 4aae76b81921f347..ee019e651706f492 100644
|
||||
--- a/manual/probes.texi
|
||||
+++ b/manual/probes.texi
|
||||
@@ -1,5 +1,5 @@
|
||||
@node Internal Probes
|
||||
-@c @node Internal Probes, Tunables, Threads, Top
|
||||
+@c @node Internal Probes, Tunables, Dynamic Linker, Top
|
||||
@c %MENU% Probes to monitor libc internal behavior
|
||||
@chapter Internal probes
|
||||
|
||||
diff --git a/manual/threads.texi b/manual/threads.texi
|
||||
index 06b6b277a1228af1..7f166bfa87e88c36 100644
|
||||
--- a/manual/threads.texi
|
||||
+++ b/manual/threads.texi
|
||||
@@ -1,5 +1,5 @@
|
||||
@node Threads
|
||||
-@c @node Threads, Internal Probes, Debugging Support, Top
|
||||
+@c @node Threads, Dynamic Linker, Debugging Support, Top
|
||||
@c %MENU% Functions, constants, and data types for working with threads
|
||||
@chapter Threads
|
||||
@cindex threads
|
256
glibc-upstream-2.34-212.patch
Normal file
256
glibc-upstream-2.34-212.patch
Normal file
@ -0,0 +1,256 @@
|
||||
commit 91c2e6c3db44297bf4cb3a2e3c40236c5b6a0b23
|
||||
Author: Florian Weimer <fweimer@redhat.com>
|
||||
Date: Fri Apr 29 17:00:53 2022 +0200
|
||||
|
||||
dlfcn: Implement the RTLD_DI_PHDR request type for dlinfo
|
||||
|
||||
The information is theoretically available via dl_iterate_phdr as
|
||||
well, but that approach is very slow if there are many shared
|
||||
objects.
|
||||
|
||||
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
|
||||
Tested-by: Carlos O'Donell <carlos@rehdat.com>
|
||||
(cherry picked from commit d056c212130280c0a54d9a4f72170ec621b70ce5)
|
||||
|
||||
diff --git a/dlfcn/Makefile b/dlfcn/Makefile
|
||||
index 6bbfbb8344da05cb..d3965427dabed898 100644
|
||||
--- a/dlfcn/Makefile
|
||||
+++ b/dlfcn/Makefile
|
||||
@@ -73,6 +73,10 @@ tststatic3-ENV = $(tststatic-ENV)
|
||||
tststatic4-ENV = $(tststatic-ENV)
|
||||
tststatic5-ENV = $(tststatic-ENV)
|
||||
|
||||
+tests-internal += \
|
||||
+ tst-dlinfo-phdr \
|
||||
+ # tests-internal
|
||||
+
|
||||
ifneq (,$(CXX))
|
||||
modules-names += bug-atexit3-lib
|
||||
else
|
||||
diff --git a/dlfcn/dlfcn.h b/dlfcn/dlfcn.h
|
||||
index 4a3b870a487ea789..24388cfedae4dd67 100644
|
||||
--- a/dlfcn/dlfcn.h
|
||||
+++ b/dlfcn/dlfcn.h
|
||||
@@ -162,7 +162,12 @@ enum
|
||||
segment, or if the calling thread has not allocated a block for it. */
|
||||
RTLD_DI_TLS_DATA = 10,
|
||||
|
||||
- RTLD_DI_MAX = 10
|
||||
+ /* Treat ARG as const ElfW(Phdr) **, and store the address of the
|
||||
+ program header array at that location. The dlinfo call returns
|
||||
+ the number of program headers in the array. */
|
||||
+ RTLD_DI_PHDR = 11,
|
||||
+
|
||||
+ RTLD_DI_MAX = 11
|
||||
};
|
||||
|
||||
|
||||
diff --git a/dlfcn/dlinfo.c b/dlfcn/dlinfo.c
|
||||
index 47d2daa96fa5986f..1842925fb7c594dd 100644
|
||||
--- a/dlfcn/dlinfo.c
|
||||
+++ b/dlfcn/dlinfo.c
|
||||
@@ -28,6 +28,10 @@ struct dlinfo_args
|
||||
void *handle;
|
||||
int request;
|
||||
void *arg;
|
||||
+
|
||||
+ /* This is the value that is returned from dlinfo if no error is
|
||||
+ signaled. */
|
||||
+ int result;
|
||||
};
|
||||
|
||||
static void
|
||||
@@ -40,6 +44,7 @@ dlinfo_doit (void *argsblock)
|
||||
{
|
||||
case RTLD_DI_CONFIGADDR:
|
||||
default:
|
||||
+ args->result = -1;
|
||||
_dl_signal_error (0, NULL, NULL, N_("unsupported dlinfo request"));
|
||||
break;
|
||||
|
||||
@@ -75,6 +80,11 @@ dlinfo_doit (void *argsblock)
|
||||
*(void **) args->arg = data;
|
||||
break;
|
||||
}
|
||||
+
|
||||
+ case RTLD_DI_PHDR:
|
||||
+ *(const ElfW(Phdr) **) args->arg = l->l_phdr;
|
||||
+ args->result = l->l_phnum;
|
||||
+ break;
|
||||
}
|
||||
}
|
||||
|
||||
@@ -82,7 +92,8 @@ static int
|
||||
dlinfo_implementation (void *handle, int request, void *arg)
|
||||
{
|
||||
struct dlinfo_args args = { handle, request, arg };
|
||||
- return _dlerror_run (&dlinfo_doit, &args) ? -1 : 0;
|
||||
+ _dlerror_run (&dlinfo_doit, &args);
|
||||
+ return args.result;
|
||||
}
|
||||
|
||||
#ifdef SHARED
|
||||
diff --git a/dlfcn/tst-dlinfo-phdr.c b/dlfcn/tst-dlinfo-phdr.c
|
||||
new file mode 100644
|
||||
index 0000000000000000..a15a7d48ebd3b976
|
||||
--- /dev/null
|
||||
+++ b/dlfcn/tst-dlinfo-phdr.c
|
||||
@@ -0,0 +1,125 @@
|
||||
+/* Test for dlinfo (RTLD_DI_PHDR).
|
||||
+ Copyright (C) 2022 Free Software Foundation, Inc.
|
||||
+ This file is part of the GNU C Library.
|
||||
+
|
||||
+ The GNU C Library is free software; you can redistribute it and/or
|
||||
+ modify it under the terms of the GNU Lesser General Public
|
||||
+ License as published by the Free Software Foundation; either
|
||||
+ version 2.1 of the License, or (at your option) any later version.
|
||||
+
|
||||
+ The GNU C Library is distributed in the hope that it will be useful,
|
||||
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
|
||||
+ Lesser General Public License for more details.
|
||||
+
|
||||
+ You should have received a copy of the GNU Lesser General Public
|
||||
+ License along with the GNU C Library; if not, see
|
||||
+ <https://www.gnu.org/licenses/>. */
|
||||
+
|
||||
+#include <dlfcn.h>
|
||||
+#include <link.h>
|
||||
+#include <stdbool.h>
|
||||
+#include <stdio.h>
|
||||
+#include <string.h>
|
||||
+#include <sys/auxv.h>
|
||||
+
|
||||
+#include <support/check.h>
|
||||
+#include <support/xdlfcn.h>
|
||||
+
|
||||
+/* Used to verify that the program header array appears as expected
|
||||
+ among the dl_iterate_phdr callback invocations. */
|
||||
+
|
||||
+struct dlip_callback_args
|
||||
+{
|
||||
+ struct link_map *l; /* l->l_addr is used to find the object. */
|
||||
+ const ElfW(Phdr) *phdr; /* Expected program header pointed. */
|
||||
+ int phnum; /* Expected program header count. */
|
||||
+ bool found; /* True if l->l_addr has been found. */
|
||||
+};
|
||||
+
|
||||
+static int
|
||||
+dlip_callback (struct dl_phdr_info *dlpi, size_t size, void *closure)
|
||||
+{
|
||||
+ TEST_COMPARE (sizeof (*dlpi), size);
|
||||
+ struct dlip_callback_args *args = closure;
|
||||
+
|
||||
+ if (dlpi->dlpi_addr == args->l->l_addr)
|
||||
+ {
|
||||
+ TEST_VERIFY (!args->found);
|
||||
+ args->found = true;
|
||||
+ TEST_VERIFY (args->phdr == dlpi->dlpi_phdr);
|
||||
+ TEST_COMPARE (args->phnum, dlpi->dlpi_phnum);
|
||||
+ }
|
||||
+
|
||||
+ return 0;
|
||||
+}
|
||||
+
|
||||
+static int
|
||||
+do_test (void)
|
||||
+{
|
||||
+ /* Avoid a copy relocation. */
|
||||
+ struct r_debug *debug = xdlsym (RTLD_DEFAULT, "_r_debug");
|
||||
+ struct link_map *l = (struct link_map *) debug->r_map;
|
||||
+ TEST_VERIFY_EXIT (l != NULL);
|
||||
+
|
||||
+ do
|
||||
+ {
|
||||
+ printf ("info: checking link map %p (%p) for \"%s\"\n",
|
||||
+ l, l->l_phdr, l->l_name);
|
||||
+
|
||||
+ /* Cause dlerror () to return an error message. */
|
||||
+ dlsym (RTLD_DEFAULT, "does-not-exist");
|
||||
+
|
||||
+ /* Use the extension that link maps are valid dlopen handles. */
|
||||
+ const ElfW(Phdr) *phdr;
|
||||
+ int phnum = dlinfo (l, RTLD_DI_PHDR, &phdr);
|
||||
+ TEST_VERIFY (phnum >= 0);
|
||||
+ /* Verify that the error message has been cleared. */
|
||||
+ TEST_COMPARE_STRING (dlerror (), NULL);
|
||||
+
|
||||
+ TEST_VERIFY (phdr == l->l_phdr);
|
||||
+ TEST_COMPARE (phnum, l->l_phnum);
|
||||
+
|
||||
+ /* Check that we can find PT_DYNAMIC among the array. */
|
||||
+ {
|
||||
+ bool dynamic_found = false;
|
||||
+ for (int i = 0; i < phnum; ++i)
|
||||
+ if (phdr[i].p_type == PT_DYNAMIC)
|
||||
+ {
|
||||
+ dynamic_found = true;
|
||||
+ TEST_COMPARE ((ElfW(Addr)) l->l_ld, l->l_addr + phdr[i].p_vaddr);
|
||||
+ }
|
||||
+ TEST_VERIFY (dynamic_found);
|
||||
+ }
|
||||
+
|
||||
+ /* Check that dl_iterate_phdr finds the link map with the same
|
||||
+ program headers. */
|
||||
+ {
|
||||
+ struct dlip_callback_args args =
|
||||
+ {
|
||||
+ .l = l,
|
||||
+ .phdr = phdr,
|
||||
+ .phnum = phnum,
|
||||
+ .found = false,
|
||||
+ };
|
||||
+ TEST_COMPARE (dl_iterate_phdr (dlip_callback, &args), 0);
|
||||
+ TEST_VERIFY (args.found);
|
||||
+ }
|
||||
+
|
||||
+ if (l->l_prev == NULL)
|
||||
+ {
|
||||
+ /* This is the executable, so the information is also
|
||||
+ available via getauxval. */
|
||||
+ TEST_COMPARE_STRING (l->l_name, "");
|
||||
+ TEST_VERIFY (phdr == (const ElfW(Phdr) *) getauxval (AT_PHDR));
|
||||
+ TEST_COMPARE (phnum, getauxval (AT_PHNUM));
|
||||
+ }
|
||||
+
|
||||
+ l = l->l_next;
|
||||
+ }
|
||||
+ while (l != NULL);
|
||||
+
|
||||
+ return 0;
|
||||
+}
|
||||
+
|
||||
+#include <support/test-driver.c>
|
||||
diff --git a/manual/dynlink.texi b/manual/dynlink.texi
|
||||
index dbf3de11769d8e57..7dcac64889e389fd 100644
|
||||
--- a/manual/dynlink.texi
|
||||
+++ b/manual/dynlink.texi
|
||||
@@ -30,9 +30,9 @@ location @var{arg}, based on @var{request}. The @var{handle} argument
|
||||
must be a pointer returned by @code{dlopen} or @code{dlmopen}; it must
|
||||
not have been closed by @code{dlclose}.
|
||||
|
||||
-On success, @code{dlinfo} returns 0. If there is an error, the function
|
||||
-returns @math{-1}, and @code{dlerror} can be used to obtain a
|
||||
-corresponding error message.
|
||||
+On success, @code{dlinfo} returns 0 for most request types; exceptions
|
||||
+are noted below. If there is an error, the function returns @math{-1},
|
||||
+and @code{dlerror} can be used to obtain a corresponding error message.
|
||||
|
||||
The following operations are defined for use with @var{request}:
|
||||
|
||||
@@ -84,6 +84,15 @@ This request writes the TLS module ID for the shared object @var{handle}
|
||||
to @code{*@var{arg}}. The argument @var{arg} must be the address of an
|
||||
object of type @code{size_t}. The module ID is zero if the object
|
||||
does not have an associated TLS block.
|
||||
+
|
||||
+@item RTLD_DI_PHDR
|
||||
+This request writes the address of the program header array to
|
||||
+@code{*@var{arg}}. The argument @var{arg} must be the address of an
|
||||
+object of type @code{const ElfW(Phdr) *} (that is,
|
||||
+@code{const Elf32_Phdr *} or @code{const Elf64_Phdr *}, as appropriate
|
||||
+for the current architecture). For this request, the value returned by
|
||||
+@code{dlinfo} is the number of program headers in the program header
|
||||
+array.
|
||||
@end vtable
|
||||
|
||||
The @code{dlinfo} function is a GNU extension.
|
31
glibc-upstream-2.34-213.patch
Normal file
31
glibc-upstream-2.34-213.patch
Normal file
@ -0,0 +1,31 @@
|
||||
commit b72bbba23687ed67887d1d18c51cce5cc9c575ca
|
||||
Author: Siddhesh Poyarekar <siddhesh@sourceware.org>
|
||||
Date: Fri May 13 10:01:47 2022 +0530
|
||||
|
||||
fortify: Ensure that __glibc_fortify condition is a constant [BZ #29141]
|
||||
|
||||
The fix c8ee1c85 introduced a -1 check for object size without also
|
||||
checking that object size is a constant. Because of this, the tree
|
||||
optimizer passes in gcc fail to fold away one of the branches in
|
||||
__glibc_fortify and trips on a spurious Wstringop-overflow. The warning
|
||||
itself is incorrect and the branch does go away eventually in DCE in the
|
||||
rtl passes in gcc, but the constant check is a helpful hint to simplify
|
||||
code early, so add it in.
|
||||
|
||||
Resolves: BZ #29141
|
||||
Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
|
||||
(cherry picked from commit 61a87530108ec9181e1b18a9b727ec3cc3ba7532)
|
||||
|
||||
diff --git a/misc/sys/cdefs.h b/misc/sys/cdefs.h
|
||||
index b36013b9a6b4d9c3..e0ecd9147ee3ce48 100644
|
||||
--- a/misc/sys/cdefs.h
|
||||
+++ b/misc/sys/cdefs.h
|
||||
@@ -163,7 +163,7 @@
|
||||
/* Length is known to be safe at compile time if the __L * __S <= __OBJSZ
|
||||
condition can be folded to a constant and if it is true, or unknown (-1) */
|
||||
#define __glibc_safe_or_unknown_len(__l, __s, __osz) \
|
||||
- ((__osz) == (__SIZE_TYPE__) -1 \
|
||||
+ ((__builtin_constant_p (__osz) && (__osz) == (__SIZE_TYPE__) -1) \
|
||||
|| (__glibc_unsigned_or_positive (__l) \
|
||||
&& __builtin_constant_p (__glibc_safe_len_cond ((__SIZE_TYPE__) (__l), \
|
||||
(__s), (__osz))) \
|
22
glibc-upstream-2.34-214.patch
Normal file
22
glibc-upstream-2.34-214.patch
Normal file
@ -0,0 +1,22 @@
|
||||
commit 8de6e4a199ba6cc8aaeb43924b974eed67164bd6
|
||||
Author: H.J. Lu <hjl.tools@gmail.com>
|
||||
Date: Sat Feb 5 11:06:01 2022 -0800
|
||||
|
||||
x86: Improve L to support L(XXX_SYMBOL (YYY, ZZZ))
|
||||
|
||||
(cherry picked from commit 1283948f236f209b7d3f44b69a42b96806fa6da0)
|
||||
|
||||
diff --git a/sysdeps/x86/sysdep.h b/sysdeps/x86/sysdep.h
|
||||
index 937180c1bd791570..deda1c4e492f6176 100644
|
||||
--- a/sysdeps/x86/sysdep.h
|
||||
+++ b/sysdeps/x86/sysdep.h
|
||||
@@ -111,7 +111,8 @@ enum cf_protection_level
|
||||
/* Local label name for asm code. */
|
||||
#ifndef L
|
||||
/* ELF-like local names start with `.L'. */
|
||||
-# define L(name) .L##name
|
||||
+# define LOCAL_LABEL(name) .L##name
|
||||
+# define L(name) LOCAL_LABEL(name)
|
||||
#endif
|
||||
|
||||
#define atom_text_section .section ".text.atom", "ax"
|
98
glibc-upstream-2.34-215.patch
Normal file
98
glibc-upstream-2.34-215.patch
Normal file
@ -0,0 +1,98 @@
|
||||
commit 6cba46c85804988f4fd41ef03e8a170a4c987a86
|
||||
Author: H.J. Lu <hjl.tools@gmail.com>
|
||||
Date: Sat Feb 5 11:52:33 2022 -0800
|
||||
|
||||
x86_64/multiarch: Sort sysdep_routines and put one entry per line
|
||||
|
||||
(cherry picked from commit c328d0152d4b14cca58407ec68143894c8863004)
|
||||
|
||||
diff --git a/sysdeps/x86_64/multiarch/Makefile b/sysdeps/x86_64/multiarch/Makefile
|
||||
index 37d8d6f0bd2d10cc..8c9e7812c6af10b8 100644
|
||||
--- a/sysdeps/x86_64/multiarch/Makefile
|
||||
+++ b/sysdeps/x86_64/multiarch/Makefile
|
||||
@@ -132,37 +132,55 @@ CFLAGS-strspn-c.c += -msse4
|
||||
endif
|
||||
|
||||
ifeq ($(subdir),wcsmbs)
|
||||
-sysdep_routines += wmemcmp-sse4 wmemcmp-ssse3 wmemcmp-c \
|
||||
- wmemcmp-avx2-movbe \
|
||||
- wmemchr-sse2 wmemchr-avx2 \
|
||||
- wcscmp-sse2 wcscmp-avx2 \
|
||||
- wcsncmp-sse2 wcsncmp-avx2 \
|
||||
- wcscpy-ssse3 wcscpy-c \
|
||||
- wcschr-sse2 wcschr-avx2 \
|
||||
- wcsrchr-sse2 wcsrchr-avx2 \
|
||||
- wcslen-sse2 wcslen-sse4_1 wcslen-avx2 \
|
||||
- wcsnlen-c wcsnlen-sse4_1 wcsnlen-avx2 \
|
||||
- wcschr-avx2-rtm \
|
||||
- wcscmp-avx2-rtm \
|
||||
- wcslen-avx2-rtm \
|
||||
- wcsncmp-avx2-rtm \
|
||||
- wcsnlen-avx2-rtm \
|
||||
- wcsrchr-avx2-rtm \
|
||||
- wmemchr-avx2-rtm \
|
||||
- wmemcmp-avx2-movbe-rtm \
|
||||
- wcschr-evex \
|
||||
- wcscmp-evex \
|
||||
- wcslen-evex \
|
||||
- wcsncmp-evex \
|
||||
- wcsnlen-evex \
|
||||
- wcsrchr-evex \
|
||||
- wmemchr-evex \
|
||||
- wmemcmp-evex-movbe \
|
||||
- wmemchr-evex-rtm
|
||||
+sysdep_routines += \
|
||||
+ wcschr-avx2 \
|
||||
+ wcschr-avx2-rtm \
|
||||
+ wcschr-evex \
|
||||
+ wcschr-sse2 \
|
||||
+ wcscmp-avx2 \
|
||||
+ wcscmp-avx2-rtm \
|
||||
+ wcscmp-evex \
|
||||
+ wcscmp-sse2 \
|
||||
+ wcscpy-c \
|
||||
+ wcscpy-ssse3 \
|
||||
+ wcslen-avx2 \
|
||||
+ wcslen-avx2-rtm \
|
||||
+ wcslen-evex \
|
||||
+ wcslen-sse2 \
|
||||
+ wcslen-sse4_1 \
|
||||
+ wcsncmp-avx2 \
|
||||
+ wcsncmp-avx2-rtm \
|
||||
+ wcsncmp-evex \
|
||||
+ wcsncmp-sse2 \
|
||||
+ wcsnlen-avx2 \
|
||||
+ wcsnlen-avx2-rtm \
|
||||
+ wcsnlen-c \
|
||||
+ wcsnlen-evex \
|
||||
+ wcsnlen-sse4_1 \
|
||||
+ wcsrchr-avx2 \
|
||||
+ wcsrchr-avx2-rtm \
|
||||
+ wcsrchr-evex \
|
||||
+ wcsrchr-sse2 \
|
||||
+ wmemchr-avx2 \
|
||||
+ wmemchr-avx2-rtm \
|
||||
+ wmemchr-evex \
|
||||
+ wmemchr-evex-rtm \
|
||||
+ wmemchr-sse2 \
|
||||
+ wmemcmp-avx2-movbe \
|
||||
+ wmemcmp-avx2-movbe-rtm \
|
||||
+ wmemcmp-c \
|
||||
+ wmemcmp-evex-movbe \
|
||||
+ wmemcmp-sse4 \
|
||||
+ wmemcmp-ssse3 \
|
||||
+# sysdep_routines
|
||||
endif
|
||||
|
||||
ifeq ($(subdir),debug)
|
||||
-sysdep_routines += memcpy_chk-nonshared mempcpy_chk-nonshared \
|
||||
- memmove_chk-nonshared memset_chk-nonshared \
|
||||
- wmemset_chk-nonshared
|
||||
+sysdep_routines += \
|
||||
+ memcpy_chk-nonshared \
|
||||
+ memmove_chk-nonshared \
|
||||
+ mempcpy_chk-nonshared \
|
||||
+ memset_chk-nonshared \
|
||||
+ wmemset_chk-nonshared \
|
||||
+# sysdep_routines
|
||||
endif
|
32
glibc-upstream-2.34-216.patch
Normal file
32
glibc-upstream-2.34-216.patch
Normal file
@ -0,0 +1,32 @@
|
||||
commit 37f373e33496ea437cc7e375cc835c20d4b35fb2
|
||||
Author: H.J. Lu <hjl.tools@gmail.com>
|
||||
Date: Thu Feb 10 11:52:50 2022 -0800
|
||||
|
||||
x86-64: Remove bzero weak alias in SS2 memset
|
||||
|
||||
commit 3d9f171bfb5325bd5f427e9fc386453358c6e840
|
||||
Author: H.J. Lu <hjl.tools@gmail.com>
|
||||
Date: Mon Feb 7 05:55:15 2022 -0800
|
||||
|
||||
x86-64: Optimize bzero
|
||||
|
||||
added the optimized bzero. Remove bzero weak alias in SS2 memset to
|
||||
avoid undefined __bzero in memset-sse2-unaligned-erms.
|
||||
|
||||
(cherry picked from commit 0fb8800029d230b3711bf722b2a47db92d0e273f)
|
||||
|
||||
diff --git a/sysdeps/x86_64/multiarch/memset-sse2-unaligned-erms.S b/sysdeps/x86_64/multiarch/memset-sse2-unaligned-erms.S
|
||||
index bac74ac37fd3c144..2951f7f5f70e274a 100644
|
||||
--- a/sysdeps/x86_64/multiarch/memset-sse2-unaligned-erms.S
|
||||
+++ b/sysdeps/x86_64/multiarch/memset-sse2-unaligned-erms.S
|
||||
@@ -31,9 +31,7 @@
|
||||
# endif
|
||||
|
||||
# undef weak_alias
|
||||
-# define weak_alias(original, alias) \
|
||||
- .weak bzero; bzero = __bzero
|
||||
-
|
||||
+# define weak_alias(original, alias)
|
||||
# undef strong_alias
|
||||
# define strong_alias(ignored1, ignored2)
|
||||
#endif
|
24
glibc-upstream-2.34-217.patch
Normal file
24
glibc-upstream-2.34-217.patch
Normal file
@ -0,0 +1,24 @@
|
||||
commit dd457606ca4583b4a5e83d4e8956e6f9db61df6d
|
||||
Author: Adhemerval Zanella <adhemerval.zanella@linaro.org>
|
||||
Date: Thu Feb 10 11:23:24 2022 -0300
|
||||
|
||||
x86_64: Remove bcopy optimizations
|
||||
|
||||
The symbols is not present in current POSIX specification and compiler
|
||||
already generates memmove call.
|
||||
|
||||
(cherry picked from commit bf92893a14ebc161b08b28acc24fa06ae6be19cb)
|
||||
|
||||
diff --git a/sysdeps/x86_64/multiarch/bcopy.S b/sysdeps/x86_64/multiarch/bcopy.S
|
||||
deleted file mode 100644
|
||||
index 639f02bde3ac3ed1..0000000000000000
|
||||
--- a/sysdeps/x86_64/multiarch/bcopy.S
|
||||
+++ /dev/null
|
||||
@@ -1,7 +0,0 @@
|
||||
-#include <sysdep.h>
|
||||
-
|
||||
- .text
|
||||
-ENTRY(bcopy)
|
||||
- xchg %rdi, %rsi
|
||||
- jmp __libc_memmove /* Branch to IFUNC memmove. */
|
||||
-END(bcopy)
|
367
glibc-upstream-2.34-218.patch
Normal file
367
glibc-upstream-2.34-218.patch
Normal file
@ -0,0 +1,367 @@
|
||||
commit 3c55c207564c0ae30d78d01689b4ae16bf38dd63
|
||||
Author: Noah Goldstein <goldstein.w.n@gmail.com>
|
||||
Date: Wed Mar 23 16:57:16 2022 -0500
|
||||
|
||||
x86: Code cleanup in strchr-avx2 and comment justifying branch
|
||||
|
||||
Small code cleanup for size: -53 bytes.
|
||||
|
||||
Add comment justifying using a branch to do NULL/non-null return.
|
||||
|
||||
All string/memory tests pass and no regressions in benchtests.
|
||||
|
||||
geometric_mean(N=20) of all benchmarks Original / New: 1.00
|
||||
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
|
||||
|
||||
(cherry picked from commit a6fbf4d51e9ba8063c4f8331564892ead9c67344)
|
||||
|
||||
diff --git a/sysdeps/x86_64/multiarch/strchr-avx2.S b/sysdeps/x86_64/multiarch/strchr-avx2.S
|
||||
index 413942b96a835c4a..ef4ce0f3677e30c8 100644
|
||||
--- a/sysdeps/x86_64/multiarch/strchr-avx2.S
|
||||
+++ b/sysdeps/x86_64/multiarch/strchr-avx2.S
|
||||
@@ -48,13 +48,13 @@
|
||||
# define PAGE_SIZE 4096
|
||||
|
||||
.section SECTION(.text),"ax",@progbits
|
||||
-ENTRY (STRCHR)
|
||||
+ENTRY_P2ALIGN (STRCHR, 5)
|
||||
/* Broadcast CHAR to YMM0. */
|
||||
vmovd %esi, %xmm0
|
||||
movl %edi, %eax
|
||||
andl $(PAGE_SIZE - 1), %eax
|
||||
VPBROADCAST %xmm0, %ymm0
|
||||
- vpxor %xmm9, %xmm9, %xmm9
|
||||
+ vpxor %xmm1, %xmm1, %xmm1
|
||||
|
||||
/* Check if we cross page boundary with one vector load. */
|
||||
cmpl $(PAGE_SIZE - VEC_SIZE), %eax
|
||||
@@ -62,37 +62,29 @@ ENTRY (STRCHR)
|
||||
|
||||
/* Check the first VEC_SIZE bytes. Search for both CHAR and the
|
||||
null byte. */
|
||||
- vmovdqu (%rdi), %ymm8
|
||||
- VPCMPEQ %ymm8, %ymm0, %ymm1
|
||||
- VPCMPEQ %ymm8, %ymm9, %ymm2
|
||||
- vpor %ymm1, %ymm2, %ymm1
|
||||
- vpmovmskb %ymm1, %eax
|
||||
+ vmovdqu (%rdi), %ymm2
|
||||
+ VPCMPEQ %ymm2, %ymm0, %ymm3
|
||||
+ VPCMPEQ %ymm2, %ymm1, %ymm2
|
||||
+ vpor %ymm3, %ymm2, %ymm3
|
||||
+ vpmovmskb %ymm3, %eax
|
||||
testl %eax, %eax
|
||||
jz L(aligned_more)
|
||||
tzcntl %eax, %eax
|
||||
# ifndef USE_AS_STRCHRNUL
|
||||
- /* Found CHAR or the null byte. */
|
||||
- cmp (%rdi, %rax), %CHAR_REG
|
||||
- jne L(zero)
|
||||
-# endif
|
||||
- addq %rdi, %rax
|
||||
- VZEROUPPER_RETURN
|
||||
-
|
||||
- /* .p2align 5 helps keep performance more consistent if ENTRY()
|
||||
- alignment % 32 was either 16 or 0. As well this makes the
|
||||
- alignment % 32 of the loop_4x_vec fixed which makes tuning it
|
||||
- easier. */
|
||||
- .p2align 5
|
||||
-L(first_vec_x4):
|
||||
- tzcntl %eax, %eax
|
||||
- addq $(VEC_SIZE * 3 + 1), %rdi
|
||||
-# ifndef USE_AS_STRCHRNUL
|
||||
- /* Found CHAR or the null byte. */
|
||||
+ /* Found CHAR or the null byte. */
|
||||
cmp (%rdi, %rax), %CHAR_REG
|
||||
+ /* NB: Use a branch instead of cmovcc here. The expectation is
|
||||
+ that with strchr the user will branch based on input being
|
||||
+ null. Since this branch will be 100% predictive of the user
|
||||
+ branch a branch miss here should save what otherwise would
|
||||
+ be branch miss in the user code. Otherwise using a branch 1)
|
||||
+ saves code size and 2) is faster in highly predictable
|
||||
+ environments. */
|
||||
jne L(zero)
|
||||
# endif
|
||||
addq %rdi, %rax
|
||||
- VZEROUPPER_RETURN
|
||||
+L(return_vzeroupper):
|
||||
+ ZERO_UPPER_VEC_REGISTERS_RETURN
|
||||
|
||||
# ifndef USE_AS_STRCHRNUL
|
||||
L(zero):
|
||||
@@ -103,7 +95,8 @@ L(zero):
|
||||
|
||||
.p2align 4
|
||||
L(first_vec_x1):
|
||||
- tzcntl %eax, %eax
|
||||
+ /* Use bsf to save code size. */
|
||||
+ bsfl %eax, %eax
|
||||
incq %rdi
|
||||
# ifndef USE_AS_STRCHRNUL
|
||||
/* Found CHAR or the null byte. */
|
||||
@@ -113,9 +106,10 @@ L(first_vec_x1):
|
||||
addq %rdi, %rax
|
||||
VZEROUPPER_RETURN
|
||||
|
||||
- .p2align 4
|
||||
+ .p2align 4,, 10
|
||||
L(first_vec_x2):
|
||||
- tzcntl %eax, %eax
|
||||
+ /* Use bsf to save code size. */
|
||||
+ bsfl %eax, %eax
|
||||
addq $(VEC_SIZE + 1), %rdi
|
||||
# ifndef USE_AS_STRCHRNUL
|
||||
/* Found CHAR or the null byte. */
|
||||
@@ -125,9 +119,10 @@ L(first_vec_x2):
|
||||
addq %rdi, %rax
|
||||
VZEROUPPER_RETURN
|
||||
|
||||
- .p2align 4
|
||||
+ .p2align 4,, 8
|
||||
L(first_vec_x3):
|
||||
- tzcntl %eax, %eax
|
||||
+ /* Use bsf to save code size. */
|
||||
+ bsfl %eax, %eax
|
||||
addq $(VEC_SIZE * 2 + 1), %rdi
|
||||
# ifndef USE_AS_STRCHRNUL
|
||||
/* Found CHAR or the null byte. */
|
||||
@@ -137,6 +132,21 @@ L(first_vec_x3):
|
||||
addq %rdi, %rax
|
||||
VZEROUPPER_RETURN
|
||||
|
||||
+ .p2align 4,, 10
|
||||
+L(first_vec_x4):
|
||||
+ /* Use bsf to save code size. */
|
||||
+ bsfl %eax, %eax
|
||||
+ addq $(VEC_SIZE * 3 + 1), %rdi
|
||||
+# ifndef USE_AS_STRCHRNUL
|
||||
+ /* Found CHAR or the null byte. */
|
||||
+ cmp (%rdi, %rax), %CHAR_REG
|
||||
+ jne L(zero)
|
||||
+# endif
|
||||
+ addq %rdi, %rax
|
||||
+ VZEROUPPER_RETURN
|
||||
+
|
||||
+
|
||||
+
|
||||
.p2align 4
|
||||
L(aligned_more):
|
||||
/* Align data to VEC_SIZE - 1. This is the same number of
|
||||
@@ -146,90 +156,92 @@ L(aligned_more):
|
||||
L(cross_page_continue):
|
||||
/* Check the next 4 * VEC_SIZE. Only one VEC_SIZE at a time
|
||||
since data is only aligned to VEC_SIZE. */
|
||||
- vmovdqa 1(%rdi), %ymm8
|
||||
- VPCMPEQ %ymm8, %ymm0, %ymm1
|
||||
- VPCMPEQ %ymm8, %ymm9, %ymm2
|
||||
- vpor %ymm1, %ymm2, %ymm1
|
||||
- vpmovmskb %ymm1, %eax
|
||||
+ vmovdqa 1(%rdi), %ymm2
|
||||
+ VPCMPEQ %ymm2, %ymm0, %ymm3
|
||||
+ VPCMPEQ %ymm2, %ymm1, %ymm2
|
||||
+ vpor %ymm3, %ymm2, %ymm3
|
||||
+ vpmovmskb %ymm3, %eax
|
||||
testl %eax, %eax
|
||||
jnz L(first_vec_x1)
|
||||
|
||||
- vmovdqa (VEC_SIZE + 1)(%rdi), %ymm8
|
||||
- VPCMPEQ %ymm8, %ymm0, %ymm1
|
||||
- VPCMPEQ %ymm8, %ymm9, %ymm2
|
||||
- vpor %ymm1, %ymm2, %ymm1
|
||||
- vpmovmskb %ymm1, %eax
|
||||
+ vmovdqa (VEC_SIZE + 1)(%rdi), %ymm2
|
||||
+ VPCMPEQ %ymm2, %ymm0, %ymm3
|
||||
+ VPCMPEQ %ymm2, %ymm1, %ymm2
|
||||
+ vpor %ymm3, %ymm2, %ymm3
|
||||
+ vpmovmskb %ymm3, %eax
|
||||
testl %eax, %eax
|
||||
jnz L(first_vec_x2)
|
||||
|
||||
- vmovdqa (VEC_SIZE * 2 + 1)(%rdi), %ymm8
|
||||
- VPCMPEQ %ymm8, %ymm0, %ymm1
|
||||
- VPCMPEQ %ymm8, %ymm9, %ymm2
|
||||
- vpor %ymm1, %ymm2, %ymm1
|
||||
- vpmovmskb %ymm1, %eax
|
||||
+ vmovdqa (VEC_SIZE * 2 + 1)(%rdi), %ymm2
|
||||
+ VPCMPEQ %ymm2, %ymm0, %ymm3
|
||||
+ VPCMPEQ %ymm2, %ymm1, %ymm2
|
||||
+ vpor %ymm3, %ymm2, %ymm3
|
||||
+ vpmovmskb %ymm3, %eax
|
||||
testl %eax, %eax
|
||||
jnz L(first_vec_x3)
|
||||
|
||||
- vmovdqa (VEC_SIZE * 3 + 1)(%rdi), %ymm8
|
||||
- VPCMPEQ %ymm8, %ymm0, %ymm1
|
||||
- VPCMPEQ %ymm8, %ymm9, %ymm2
|
||||
- vpor %ymm1, %ymm2, %ymm1
|
||||
- vpmovmskb %ymm1, %eax
|
||||
+ vmovdqa (VEC_SIZE * 3 + 1)(%rdi), %ymm2
|
||||
+ VPCMPEQ %ymm2, %ymm0, %ymm3
|
||||
+ VPCMPEQ %ymm2, %ymm1, %ymm2
|
||||
+ vpor %ymm3, %ymm2, %ymm3
|
||||
+ vpmovmskb %ymm3, %eax
|
||||
testl %eax, %eax
|
||||
jnz L(first_vec_x4)
|
||||
- /* Align data to VEC_SIZE * 4 - 1. */
|
||||
- addq $(VEC_SIZE * 4 + 1), %rdi
|
||||
- andq $-(VEC_SIZE * 4), %rdi
|
||||
+ /* Align data to VEC_SIZE * 4 - 1. */
|
||||
+ incq %rdi
|
||||
+ orq $(VEC_SIZE * 4 - 1), %rdi
|
||||
.p2align 4
|
||||
L(loop_4x_vec):
|
||||
/* Compare 4 * VEC at a time forward. */
|
||||
- vmovdqa (%rdi), %ymm5
|
||||
- vmovdqa (VEC_SIZE)(%rdi), %ymm6
|
||||
- vmovdqa (VEC_SIZE * 2)(%rdi), %ymm7
|
||||
- vmovdqa (VEC_SIZE * 3)(%rdi), %ymm8
|
||||
+ vmovdqa 1(%rdi), %ymm6
|
||||
+ vmovdqa (VEC_SIZE + 1)(%rdi), %ymm7
|
||||
|
||||
/* Leaves only CHARS matching esi as 0. */
|
||||
- vpxor %ymm5, %ymm0, %ymm1
|
||||
vpxor %ymm6, %ymm0, %ymm2
|
||||
vpxor %ymm7, %ymm0, %ymm3
|
||||
- vpxor %ymm8, %ymm0, %ymm4
|
||||
|
||||
- VPMINU %ymm1, %ymm5, %ymm1
|
||||
VPMINU %ymm2, %ymm6, %ymm2
|
||||
VPMINU %ymm3, %ymm7, %ymm3
|
||||
- VPMINU %ymm4, %ymm8, %ymm4
|
||||
|
||||
- VPMINU %ymm1, %ymm2, %ymm5
|
||||
- VPMINU %ymm3, %ymm4, %ymm6
|
||||
+ vmovdqa (VEC_SIZE * 2 + 1)(%rdi), %ymm6
|
||||
+ vmovdqa (VEC_SIZE * 3 + 1)(%rdi), %ymm7
|
||||
+
|
||||
+ vpxor %ymm6, %ymm0, %ymm4
|
||||
+ vpxor %ymm7, %ymm0, %ymm5
|
||||
+
|
||||
+ VPMINU %ymm4, %ymm6, %ymm4
|
||||
+ VPMINU %ymm5, %ymm7, %ymm5
|
||||
|
||||
- VPMINU %ymm5, %ymm6, %ymm6
|
||||
+ VPMINU %ymm2, %ymm3, %ymm6
|
||||
+ VPMINU %ymm4, %ymm5, %ymm7
|
||||
|
||||
- VPCMPEQ %ymm6, %ymm9, %ymm6
|
||||
- vpmovmskb %ymm6, %ecx
|
||||
+ VPMINU %ymm6, %ymm7, %ymm7
|
||||
+
|
||||
+ VPCMPEQ %ymm7, %ymm1, %ymm7
|
||||
+ vpmovmskb %ymm7, %ecx
|
||||
subq $-(VEC_SIZE * 4), %rdi
|
||||
testl %ecx, %ecx
|
||||
jz L(loop_4x_vec)
|
||||
|
||||
-
|
||||
- VPCMPEQ %ymm1, %ymm9, %ymm1
|
||||
- vpmovmskb %ymm1, %eax
|
||||
+ VPCMPEQ %ymm2, %ymm1, %ymm2
|
||||
+ vpmovmskb %ymm2, %eax
|
||||
testl %eax, %eax
|
||||
jnz L(last_vec_x0)
|
||||
|
||||
|
||||
- VPCMPEQ %ymm5, %ymm9, %ymm2
|
||||
- vpmovmskb %ymm2, %eax
|
||||
+ VPCMPEQ %ymm3, %ymm1, %ymm3
|
||||
+ vpmovmskb %ymm3, %eax
|
||||
testl %eax, %eax
|
||||
jnz L(last_vec_x1)
|
||||
|
||||
- VPCMPEQ %ymm3, %ymm9, %ymm3
|
||||
- vpmovmskb %ymm3, %eax
|
||||
+ VPCMPEQ %ymm4, %ymm1, %ymm4
|
||||
+ vpmovmskb %ymm4, %eax
|
||||
/* rcx has combined result from all 4 VEC. It will only be used
|
||||
if the first 3 other VEC all did not contain a match. */
|
||||
salq $32, %rcx
|
||||
orq %rcx, %rax
|
||||
tzcntq %rax, %rax
|
||||
- subq $(VEC_SIZE * 2), %rdi
|
||||
+ subq $(VEC_SIZE * 2 - 1), %rdi
|
||||
# ifndef USE_AS_STRCHRNUL
|
||||
/* Found CHAR or the null byte. */
|
||||
cmp (%rdi, %rax), %CHAR_REG
|
||||
@@ -239,10 +251,11 @@ L(loop_4x_vec):
|
||||
VZEROUPPER_RETURN
|
||||
|
||||
|
||||
- .p2align 4
|
||||
+ .p2align 4,, 10
|
||||
L(last_vec_x0):
|
||||
- tzcntl %eax, %eax
|
||||
- addq $-(VEC_SIZE * 4), %rdi
|
||||
+ /* Use bsf to save code size. */
|
||||
+ bsfl %eax, %eax
|
||||
+ addq $-(VEC_SIZE * 4 - 1), %rdi
|
||||
# ifndef USE_AS_STRCHRNUL
|
||||
/* Found CHAR or the null byte. */
|
||||
cmp (%rdi, %rax), %CHAR_REG
|
||||
@@ -251,16 +264,11 @@ L(last_vec_x0):
|
||||
addq %rdi, %rax
|
||||
VZEROUPPER_RETURN
|
||||
|
||||
-# ifndef USE_AS_STRCHRNUL
|
||||
-L(zero_end):
|
||||
- xorl %eax, %eax
|
||||
- VZEROUPPER_RETURN
|
||||
-# endif
|
||||
|
||||
- .p2align 4
|
||||
+ .p2align 4,, 10
|
||||
L(last_vec_x1):
|
||||
tzcntl %eax, %eax
|
||||
- subq $(VEC_SIZE * 3), %rdi
|
||||
+ subq $(VEC_SIZE * 3 - 1), %rdi
|
||||
# ifndef USE_AS_STRCHRNUL
|
||||
/* Found CHAR or the null byte. */
|
||||
cmp (%rdi, %rax), %CHAR_REG
|
||||
@@ -269,18 +277,23 @@ L(last_vec_x1):
|
||||
addq %rdi, %rax
|
||||
VZEROUPPER_RETURN
|
||||
|
||||
+# ifndef USE_AS_STRCHRNUL
|
||||
+L(zero_end):
|
||||
+ xorl %eax, %eax
|
||||
+ VZEROUPPER_RETURN
|
||||
+# endif
|
||||
|
||||
/* Cold case for crossing page with first load. */
|
||||
- .p2align 4
|
||||
+ .p2align 4,, 8
|
||||
L(cross_page_boundary):
|
||||
movq %rdi, %rdx
|
||||
/* Align rdi to VEC_SIZE - 1. */
|
||||
orq $(VEC_SIZE - 1), %rdi
|
||||
- vmovdqa -(VEC_SIZE - 1)(%rdi), %ymm8
|
||||
- VPCMPEQ %ymm8, %ymm0, %ymm1
|
||||
- VPCMPEQ %ymm8, %ymm9, %ymm2
|
||||
- vpor %ymm1, %ymm2, %ymm1
|
||||
- vpmovmskb %ymm1, %eax
|
||||
+ vmovdqa -(VEC_SIZE - 1)(%rdi), %ymm2
|
||||
+ VPCMPEQ %ymm2, %ymm0, %ymm3
|
||||
+ VPCMPEQ %ymm2, %ymm1, %ymm2
|
||||
+ vpor %ymm3, %ymm2, %ymm3
|
||||
+ vpmovmskb %ymm3, %eax
|
||||
/* Remove the leading bytes. sarxl only uses bits [5:0] of COUNT
|
||||
so no need to manually mod edx. */
|
||||
sarxl %edx, %eax, %eax
|
||||
@@ -291,13 +304,10 @@ L(cross_page_boundary):
|
||||
xorl %ecx, %ecx
|
||||
/* Found CHAR or the null byte. */
|
||||
cmp (%rdx, %rax), %CHAR_REG
|
||||
- leaq (%rdx, %rax), %rax
|
||||
- cmovne %rcx, %rax
|
||||
-# else
|
||||
- addq %rdx, %rax
|
||||
+ jne L(zero_end)
|
||||
# endif
|
||||
-L(return_vzeroupper):
|
||||
- ZERO_UPPER_VEC_REGISTERS_RETURN
|
||||
+ addq %rdx, %rax
|
||||
+ VZEROUPPER_RETURN
|
||||
|
||||
END (STRCHR)
|
||||
-# endif
|
||||
+#endif
|
338
glibc-upstream-2.34-219.patch
Normal file
338
glibc-upstream-2.34-219.patch
Normal file
@ -0,0 +1,338 @@
|
||||
commit dd6d3a0bbcc67cb2b50b0add0c599f9f99491d8b
|
||||
Author: Noah Goldstein <goldstein.w.n@gmail.com>
|
||||
Date: Wed Mar 23 16:57:18 2022 -0500
|
||||
|
||||
x86: Code cleanup in strchr-evex and comment justifying branch
|
||||
|
||||
Small code cleanup for size: -81 bytes.
|
||||
|
||||
Add comment justifying using a branch to do NULL/non-null return.
|
||||
|
||||
All string/memory tests pass and no regressions in benchtests.
|
||||
|
||||
geometric_mean(N=20) of all benchmarks New / Original: .985
|
||||
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
|
||||
|
||||
(cherry picked from commit ec285ea90415458225623ddc0492ae3f705af043)
|
||||
|
||||
diff --git a/sysdeps/x86_64/multiarch/strchr-evex.S b/sysdeps/x86_64/multiarch/strchr-evex.S
|
||||
index 7f9d4ee48ddaa998..0b49e0ac54e7b0dd 100644
|
||||
--- a/sysdeps/x86_64/multiarch/strchr-evex.S
|
||||
+++ b/sysdeps/x86_64/multiarch/strchr-evex.S
|
||||
@@ -30,6 +30,7 @@
|
||||
# ifdef USE_AS_WCSCHR
|
||||
# define VPBROADCAST vpbroadcastd
|
||||
# define VPCMP vpcmpd
|
||||
+# define VPTESTN vptestnmd
|
||||
# define VPMINU vpminud
|
||||
# define CHAR_REG esi
|
||||
# define SHIFT_REG ecx
|
||||
@@ -37,6 +38,7 @@
|
||||
# else
|
||||
# define VPBROADCAST vpbroadcastb
|
||||
# define VPCMP vpcmpb
|
||||
+# define VPTESTN vptestnmb
|
||||
# define VPMINU vpminub
|
||||
# define CHAR_REG sil
|
||||
# define SHIFT_REG edx
|
||||
@@ -61,13 +63,11 @@
|
||||
# define CHAR_PER_VEC (VEC_SIZE / CHAR_SIZE)
|
||||
|
||||
.section .text.evex,"ax",@progbits
|
||||
-ENTRY (STRCHR)
|
||||
+ENTRY_P2ALIGN (STRCHR, 5)
|
||||
/* Broadcast CHAR to YMM0. */
|
||||
VPBROADCAST %esi, %YMM0
|
||||
movl %edi, %eax
|
||||
andl $(PAGE_SIZE - 1), %eax
|
||||
- vpxorq %XMMZERO, %XMMZERO, %XMMZERO
|
||||
-
|
||||
/* Check if we cross page boundary with one vector load.
|
||||
Otherwise it is safe to use an unaligned load. */
|
||||
cmpl $(PAGE_SIZE - VEC_SIZE), %eax
|
||||
@@ -81,49 +81,35 @@ ENTRY (STRCHR)
|
||||
vpxorq %YMM1, %YMM0, %YMM2
|
||||
VPMINU %YMM2, %YMM1, %YMM2
|
||||
/* Each bit in K0 represents a CHAR or a null byte in YMM1. */
|
||||
- VPCMP $0, %YMMZERO, %YMM2, %k0
|
||||
+ VPTESTN %YMM2, %YMM2, %k0
|
||||
kmovd %k0, %eax
|
||||
testl %eax, %eax
|
||||
jz L(aligned_more)
|
||||
tzcntl %eax, %eax
|
||||
+# ifndef USE_AS_STRCHRNUL
|
||||
+ /* Found CHAR or the null byte. */
|
||||
+ cmp (%rdi, %rax, CHAR_SIZE), %CHAR_REG
|
||||
+ /* NB: Use a branch instead of cmovcc here. The expectation is
|
||||
+ that with strchr the user will branch based on input being
|
||||
+ null. Since this branch will be 100% predictive of the user
|
||||
+ branch a branch miss here should save what otherwise would
|
||||
+ be branch miss in the user code. Otherwise using a branch 1)
|
||||
+ saves code size and 2) is faster in highly predictable
|
||||
+ environments. */
|
||||
+ jne L(zero)
|
||||
+# endif
|
||||
# ifdef USE_AS_WCSCHR
|
||||
/* NB: Multiply wchar_t count by 4 to get the number of bytes.
|
||||
*/
|
||||
leaq (%rdi, %rax, CHAR_SIZE), %rax
|
||||
# else
|
||||
addq %rdi, %rax
|
||||
-# endif
|
||||
-# ifndef USE_AS_STRCHRNUL
|
||||
- /* Found CHAR or the null byte. */
|
||||
- cmp (%rax), %CHAR_REG
|
||||
- jne L(zero)
|
||||
# endif
|
||||
ret
|
||||
|
||||
- /* .p2align 5 helps keep performance more consistent if ENTRY()
|
||||
- alignment % 32 was either 16 or 0. As well this makes the
|
||||
- alignment % 32 of the loop_4x_vec fixed which makes tuning it
|
||||
- easier. */
|
||||
- .p2align 5
|
||||
-L(first_vec_x3):
|
||||
- tzcntl %eax, %eax
|
||||
-# ifndef USE_AS_STRCHRNUL
|
||||
- /* Found CHAR or the null byte. */
|
||||
- cmp (VEC_SIZE * 3)(%rdi, %rax, CHAR_SIZE), %CHAR_REG
|
||||
- jne L(zero)
|
||||
-# endif
|
||||
- /* NB: Multiply sizeof char type (1 or 4) to get the number of
|
||||
- bytes. */
|
||||
- leaq (VEC_SIZE * 3)(%rdi, %rax, CHAR_SIZE), %rax
|
||||
- ret
|
||||
|
||||
-# ifndef USE_AS_STRCHRNUL
|
||||
-L(zero):
|
||||
- xorl %eax, %eax
|
||||
- ret
|
||||
-# endif
|
||||
|
||||
- .p2align 4
|
||||
+ .p2align 4,, 10
|
||||
L(first_vec_x4):
|
||||
# ifndef USE_AS_STRCHRNUL
|
||||
/* Check to see if first match was CHAR (k0) or null (k1). */
|
||||
@@ -144,9 +130,18 @@ L(first_vec_x4):
|
||||
leaq (VEC_SIZE * 4)(%rdi, %rax, CHAR_SIZE), %rax
|
||||
ret
|
||||
|
||||
+# ifndef USE_AS_STRCHRNUL
|
||||
+L(zero):
|
||||
+ xorl %eax, %eax
|
||||
+ ret
|
||||
+# endif
|
||||
+
|
||||
+
|
||||
.p2align 4
|
||||
L(first_vec_x1):
|
||||
- tzcntl %eax, %eax
|
||||
+ /* Use bsf here to save 1-byte keeping keeping the block in 1x
|
||||
+ fetch block. eax guranteed non-zero. */
|
||||
+ bsfl %eax, %eax
|
||||
# ifndef USE_AS_STRCHRNUL
|
||||
/* Found CHAR or the null byte. */
|
||||
cmp (VEC_SIZE)(%rdi, %rax, CHAR_SIZE), %CHAR_REG
|
||||
@@ -158,7 +153,7 @@ L(first_vec_x1):
|
||||
leaq (VEC_SIZE)(%rdi, %rax, CHAR_SIZE), %rax
|
||||
ret
|
||||
|
||||
- .p2align 4
|
||||
+ .p2align 4,, 10
|
||||
L(first_vec_x2):
|
||||
# ifndef USE_AS_STRCHRNUL
|
||||
/* Check to see if first match was CHAR (k0) or null (k1). */
|
||||
@@ -179,6 +174,21 @@ L(first_vec_x2):
|
||||
leaq (VEC_SIZE * 2)(%rdi, %rax, CHAR_SIZE), %rax
|
||||
ret
|
||||
|
||||
+ .p2align 4,, 10
|
||||
+L(first_vec_x3):
|
||||
+ /* Use bsf here to save 1-byte keeping keeping the block in 1x
|
||||
+ fetch block. eax guranteed non-zero. */
|
||||
+ bsfl %eax, %eax
|
||||
+# ifndef USE_AS_STRCHRNUL
|
||||
+ /* Found CHAR or the null byte. */
|
||||
+ cmp (VEC_SIZE * 3)(%rdi, %rax, CHAR_SIZE), %CHAR_REG
|
||||
+ jne L(zero)
|
||||
+# endif
|
||||
+ /* NB: Multiply sizeof char type (1 or 4) to get the number of
|
||||
+ bytes. */
|
||||
+ leaq (VEC_SIZE * 3)(%rdi, %rax, CHAR_SIZE), %rax
|
||||
+ ret
|
||||
+
|
||||
.p2align 4
|
||||
L(aligned_more):
|
||||
/* Align data to VEC_SIZE. */
|
||||
@@ -195,7 +205,7 @@ L(cross_page_continue):
|
||||
vpxorq %YMM1, %YMM0, %YMM2
|
||||
VPMINU %YMM2, %YMM1, %YMM2
|
||||
/* Each bit in K0 represents a CHAR or a null byte in YMM1. */
|
||||
- VPCMP $0, %YMMZERO, %YMM2, %k0
|
||||
+ VPTESTN %YMM2, %YMM2, %k0
|
||||
kmovd %k0, %eax
|
||||
testl %eax, %eax
|
||||
jnz L(first_vec_x1)
|
||||
@@ -206,7 +216,7 @@ L(cross_page_continue):
|
||||
/* Each bit in K0 represents a CHAR in YMM1. */
|
||||
VPCMP $0, %YMM1, %YMM0, %k0
|
||||
/* Each bit in K1 represents a CHAR in YMM1. */
|
||||
- VPCMP $0, %YMM1, %YMMZERO, %k1
|
||||
+ VPTESTN %YMM1, %YMM1, %k1
|
||||
kortestd %k0, %k1
|
||||
jnz L(first_vec_x2)
|
||||
|
||||
@@ -215,7 +225,7 @@ L(cross_page_continue):
|
||||
vpxorq %YMM1, %YMM0, %YMM2
|
||||
VPMINU %YMM2, %YMM1, %YMM2
|
||||
/* Each bit in K0 represents a CHAR or a null byte in YMM1. */
|
||||
- VPCMP $0, %YMMZERO, %YMM2, %k0
|
||||
+ VPTESTN %YMM2, %YMM2, %k0
|
||||
kmovd %k0, %eax
|
||||
testl %eax, %eax
|
||||
jnz L(first_vec_x3)
|
||||
@@ -224,7 +234,7 @@ L(cross_page_continue):
|
||||
/* Each bit in K0 represents a CHAR in YMM1. */
|
||||
VPCMP $0, %YMM1, %YMM0, %k0
|
||||
/* Each bit in K1 represents a CHAR in YMM1. */
|
||||
- VPCMP $0, %YMM1, %YMMZERO, %k1
|
||||
+ VPTESTN %YMM1, %YMM1, %k1
|
||||
kortestd %k0, %k1
|
||||
jnz L(first_vec_x4)
|
||||
|
||||
@@ -265,33 +275,33 @@ L(loop_4x_vec):
|
||||
VPMINU %YMM3, %YMM4, %YMM4
|
||||
VPMINU %YMM2, %YMM4, %YMM4{%k4}{z}
|
||||
|
||||
- VPCMP $0, %YMMZERO, %YMM4, %k1
|
||||
+ VPTESTN %YMM4, %YMM4, %k1
|
||||
kmovd %k1, %ecx
|
||||
subq $-(VEC_SIZE * 4), %rdi
|
||||
testl %ecx, %ecx
|
||||
jz L(loop_4x_vec)
|
||||
|
||||
- VPCMP $0, %YMMZERO, %YMM1, %k0
|
||||
+ VPTESTN %YMM1, %YMM1, %k0
|
||||
kmovd %k0, %eax
|
||||
testl %eax, %eax
|
||||
jnz L(last_vec_x1)
|
||||
|
||||
- VPCMP $0, %YMMZERO, %YMM2, %k0
|
||||
+ VPTESTN %YMM2, %YMM2, %k0
|
||||
kmovd %k0, %eax
|
||||
testl %eax, %eax
|
||||
jnz L(last_vec_x2)
|
||||
|
||||
- VPCMP $0, %YMMZERO, %YMM3, %k0
|
||||
+ VPTESTN %YMM3, %YMM3, %k0
|
||||
kmovd %k0, %eax
|
||||
/* Combine YMM3 matches (eax) with YMM4 matches (ecx). */
|
||||
# ifdef USE_AS_WCSCHR
|
||||
sall $8, %ecx
|
||||
orl %ecx, %eax
|
||||
- tzcntl %eax, %eax
|
||||
+ bsfl %eax, %eax
|
||||
# else
|
||||
salq $32, %rcx
|
||||
orq %rcx, %rax
|
||||
- tzcntq %rax, %rax
|
||||
+ bsfq %rax, %rax
|
||||
# endif
|
||||
# ifndef USE_AS_STRCHRNUL
|
||||
/* Check if match was CHAR or null. */
|
||||
@@ -303,28 +313,28 @@ L(loop_4x_vec):
|
||||
leaq (VEC_SIZE * 2)(%rdi, %rax, CHAR_SIZE), %rax
|
||||
ret
|
||||
|
||||
-# ifndef USE_AS_STRCHRNUL
|
||||
-L(zero_end):
|
||||
- xorl %eax, %eax
|
||||
- ret
|
||||
+ .p2align 4,, 8
|
||||
+L(last_vec_x1):
|
||||
+ bsfl %eax, %eax
|
||||
+# ifdef USE_AS_WCSCHR
|
||||
+ /* NB: Multiply wchar_t count by 4 to get the number of bytes.
|
||||
+ */
|
||||
+ leaq (%rdi, %rax, CHAR_SIZE), %rax
|
||||
+# else
|
||||
+ addq %rdi, %rax
|
||||
# endif
|
||||
|
||||
- .p2align 4
|
||||
-L(last_vec_x1):
|
||||
- tzcntl %eax, %eax
|
||||
# ifndef USE_AS_STRCHRNUL
|
||||
/* Check if match was null. */
|
||||
- cmp (%rdi, %rax, CHAR_SIZE), %CHAR_REG
|
||||
+ cmp (%rax), %CHAR_REG
|
||||
jne L(zero_end)
|
||||
# endif
|
||||
- /* NB: Multiply sizeof char type (1 or 4) to get the number of
|
||||
- bytes. */
|
||||
- leaq (%rdi, %rax, CHAR_SIZE), %rax
|
||||
+
|
||||
ret
|
||||
|
||||
- .p2align 4
|
||||
+ .p2align 4,, 8
|
||||
L(last_vec_x2):
|
||||
- tzcntl %eax, %eax
|
||||
+ bsfl %eax, %eax
|
||||
# ifndef USE_AS_STRCHRNUL
|
||||
/* Check if match was null. */
|
||||
cmp (VEC_SIZE)(%rdi, %rax, CHAR_SIZE), %CHAR_REG
|
||||
@@ -336,7 +346,7 @@ L(last_vec_x2):
|
||||
ret
|
||||
|
||||
/* Cold case for crossing page with first load. */
|
||||
- .p2align 4
|
||||
+ .p2align 4,, 8
|
||||
L(cross_page_boundary):
|
||||
movq %rdi, %rdx
|
||||
/* Align rdi. */
|
||||
@@ -346,9 +356,9 @@ L(cross_page_boundary):
|
||||
vpxorq %YMM1, %YMM0, %YMM2
|
||||
VPMINU %YMM2, %YMM1, %YMM2
|
||||
/* Each bit in K0 represents a CHAR or a null byte in YMM1. */
|
||||
- VPCMP $0, %YMMZERO, %YMM2, %k0
|
||||
+ VPTESTN %YMM2, %YMM2, %k0
|
||||
kmovd %k0, %eax
|
||||
- /* Remove the leading bits. */
|
||||
+ /* Remove the leading bits. */
|
||||
# ifdef USE_AS_WCSCHR
|
||||
movl %edx, %SHIFT_REG
|
||||
/* NB: Divide shift count by 4 since each bit in K1 represent 4
|
||||
@@ -360,20 +370,24 @@ L(cross_page_boundary):
|
||||
/* If eax is zero continue. */
|
||||
testl %eax, %eax
|
||||
jz L(cross_page_continue)
|
||||
- tzcntl %eax, %eax
|
||||
-# ifndef USE_AS_STRCHRNUL
|
||||
- /* Check to see if match was CHAR or null. */
|
||||
- cmp (%rdx, %rax, CHAR_SIZE), %CHAR_REG
|
||||
- jne L(zero_end)
|
||||
-# endif
|
||||
+ bsfl %eax, %eax
|
||||
+
|
||||
# ifdef USE_AS_WCSCHR
|
||||
/* NB: Multiply wchar_t count by 4 to get the number of
|
||||
bytes. */
|
||||
leaq (%rdx, %rax, CHAR_SIZE), %rax
|
||||
# else
|
||||
addq %rdx, %rax
|
||||
+# endif
|
||||
+# ifndef USE_AS_STRCHRNUL
|
||||
+ /* Check to see if match was CHAR or null. */
|
||||
+ cmp (%rax), %CHAR_REG
|
||||
+ je L(cross_page_ret)
|
||||
+L(zero_end):
|
||||
+ xorl %eax, %eax
|
||||
+L(cross_page_ret):
|
||||
# endif
|
||||
ret
|
||||
|
||||
END (STRCHR)
|
||||
-# endif
|
||||
+#endif
|
143
glibc-upstream-2.34-220.patch
Normal file
143
glibc-upstream-2.34-220.patch
Normal file
@ -0,0 +1,143 @@
|
||||
commit 0ae1006967eef11909fbed0f6ecef2f260b133d3
|
||||
Author: Noah Goldstein <goldstein.w.n@gmail.com>
|
||||
Date: Wed Mar 23 16:57:22 2022 -0500
|
||||
|
||||
x86: Optimize strcspn and strpbrk in strcspn-c.c
|
||||
|
||||
Use _mm_cmpeq_epi8 and _mm_movemask_epi8 to get strlen instead of
|
||||
_mm_cmpistri. Also change offset to unsigned to avoid unnecessary
|
||||
sign extensions.
|
||||
|
||||
geometric_mean(N=20) of all benchmarks that dont fallback on
|
||||
sse2/strlen; New / Original: .928
|
||||
|
||||
All string/memory tests pass.
|
||||
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
|
||||
|
||||
(cherry picked from commit 30d627d477d7255345a4b713cf352ac32d644d61)
|
||||
|
||||
diff --git a/sysdeps/x86_64/multiarch/strcspn-c.c b/sysdeps/x86_64/multiarch/strcspn-c.c
|
||||
index c56ddbd22f014653..2436b6dcd90d8efe 100644
|
||||
--- a/sysdeps/x86_64/multiarch/strcspn-c.c
|
||||
+++ b/sysdeps/x86_64/multiarch/strcspn-c.c
|
||||
@@ -85,83 +85,74 @@ STRCSPN_SSE42 (const char *s, const char *a)
|
||||
RETURN (NULL, strlen (s));
|
||||
|
||||
const char *aligned;
|
||||
- __m128i mask;
|
||||
- int offset = (int) ((size_t) a & 15);
|
||||
+ __m128i mask, maskz, zero;
|
||||
+ unsigned int maskz_bits;
|
||||
+ unsigned int offset = (unsigned int) ((size_t) a & 15);
|
||||
+ zero = _mm_set1_epi8 (0);
|
||||
if (offset != 0)
|
||||
{
|
||||
/* Load masks. */
|
||||
aligned = (const char *) ((size_t) a & -16L);
|
||||
__m128i mask0 = _mm_load_si128 ((__m128i *) aligned);
|
||||
-
|
||||
- mask = __m128i_shift_right (mask0, offset);
|
||||
+ maskz = _mm_cmpeq_epi8 (mask0, zero);
|
||||
|
||||
/* Find where the NULL terminator is. */
|
||||
- int length = _mm_cmpistri (mask, mask, 0x3a);
|
||||
- if (length == 16 - offset)
|
||||
- {
|
||||
- /* There is no NULL terminator. */
|
||||
- __m128i mask1 = _mm_load_si128 ((__m128i *) (aligned + 16));
|
||||
- int index = _mm_cmpistri (mask1, mask1, 0x3a);
|
||||
- length += index;
|
||||
-
|
||||
- /* Don't use SSE4.2 if the length of A > 16. */
|
||||
- if (length > 16)
|
||||
- return STRCSPN_SSE2 (s, a);
|
||||
-
|
||||
- if (index != 0)
|
||||
- {
|
||||
- /* Combine mask0 and mask1. We could play games with
|
||||
- palignr, but frankly this data should be in L1 now
|
||||
- so do the merge via an unaligned load. */
|
||||
- mask = _mm_loadu_si128 ((__m128i *) a);
|
||||
- }
|
||||
- }
|
||||
+ maskz_bits = _mm_movemask_epi8 (maskz) >> offset;
|
||||
+ if (maskz_bits != 0)
|
||||
+ {
|
||||
+ mask = __m128i_shift_right (mask0, offset);
|
||||
+ offset = (unsigned int) ((size_t) s & 15);
|
||||
+ if (offset)
|
||||
+ goto start_unaligned;
|
||||
+
|
||||
+ aligned = s;
|
||||
+ goto start_loop;
|
||||
+ }
|
||||
}
|
||||
- else
|
||||
- {
|
||||
- /* A is aligned. */
|
||||
- mask = _mm_load_si128 ((__m128i *) a);
|
||||
|
||||
- /* Find where the NULL terminator is. */
|
||||
- int length = _mm_cmpistri (mask, mask, 0x3a);
|
||||
- if (length == 16)
|
||||
- {
|
||||
- /* There is no NULL terminator. Don't use SSE4.2 if the length
|
||||
- of A > 16. */
|
||||
- if (a[16] != 0)
|
||||
- return STRCSPN_SSE2 (s, a);
|
||||
- }
|
||||
+ /* A is aligned. */
|
||||
+ mask = _mm_loadu_si128 ((__m128i *) a);
|
||||
+ /* Find where the NULL terminator is. */
|
||||
+ maskz = _mm_cmpeq_epi8 (mask, zero);
|
||||
+ maskz_bits = _mm_movemask_epi8 (maskz);
|
||||
+ if (maskz_bits == 0)
|
||||
+ {
|
||||
+ /* There is no NULL terminator. Don't use SSE4.2 if the length
|
||||
+ of A > 16. */
|
||||
+ if (a[16] != 0)
|
||||
+ return STRCSPN_SSE2 (s, a);
|
||||
}
|
||||
|
||||
- offset = (int) ((size_t) s & 15);
|
||||
+ aligned = s;
|
||||
+ offset = (unsigned int) ((size_t) s & 15);
|
||||
if (offset != 0)
|
||||
{
|
||||
+ start_unaligned:
|
||||
/* Check partial string. */
|
||||
aligned = (const char *) ((size_t) s & -16L);
|
||||
__m128i value = _mm_load_si128 ((__m128i *) aligned);
|
||||
|
||||
value = __m128i_shift_right (value, offset);
|
||||
|
||||
- int length = _mm_cmpistri (mask, value, 0x2);
|
||||
+ unsigned int length = _mm_cmpistri (mask, value, 0x2);
|
||||
/* No need to check ZFlag since ZFlag is always 1. */
|
||||
- int cflag = _mm_cmpistrc (mask, value, 0x2);
|
||||
+ unsigned int cflag = _mm_cmpistrc (mask, value, 0x2);
|
||||
if (cflag)
|
||||
RETURN ((char *) (s + length), length);
|
||||
/* Find where the NULL terminator is. */
|
||||
- int index = _mm_cmpistri (value, value, 0x3a);
|
||||
+ unsigned int index = _mm_cmpistri (value, value, 0x3a);
|
||||
if (index < 16 - offset)
|
||||
RETURN (NULL, index);
|
||||
aligned += 16;
|
||||
}
|
||||
- else
|
||||
- aligned = s;
|
||||
|
||||
+start_loop:
|
||||
while (1)
|
||||
{
|
||||
__m128i value = _mm_load_si128 ((__m128i *) aligned);
|
||||
- int index = _mm_cmpistri (mask, value, 0x2);
|
||||
- int cflag = _mm_cmpistrc (mask, value, 0x2);
|
||||
- int zflag = _mm_cmpistrz (mask, value, 0x2);
|
||||
+ unsigned int index = _mm_cmpistri (mask, value, 0x2);
|
||||
+ unsigned int cflag = _mm_cmpistrc (mask, value, 0x2);
|
||||
+ unsigned int zflag = _mm_cmpistrz (mask, value, 0x2);
|
||||
if (cflag)
|
||||
RETURN ((char *) (aligned + index), (size_t) (aligned + index - s));
|
||||
if (zflag)
|
143
glibc-upstream-2.34-221.patch
Normal file
143
glibc-upstream-2.34-221.patch
Normal file
@ -0,0 +1,143 @@
|
||||
commit 0a2da0111037b1cc214f8f40ca5bdebf36f35cbd
|
||||
Author: Noah Goldstein <goldstein.w.n@gmail.com>
|
||||
Date: Wed Mar 23 16:57:24 2022 -0500
|
||||
|
||||
x86: Optimize strspn in strspn-c.c
|
||||
|
||||
Use _mm_cmpeq_epi8 and _mm_movemask_epi8 to get strlen instead of
|
||||
_mm_cmpistri. Also change offset to unsigned to avoid unnecessary
|
||||
sign extensions.
|
||||
|
||||
geometric_mean(N=20) of all benchmarks that dont fallback on
|
||||
sse2; New / Original: .901
|
||||
|
||||
All string/memory tests pass.
|
||||
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
|
||||
|
||||
(cherry picked from commit 412d10343168b05b8cf6c3683457cf9711d28046)
|
||||
|
||||
diff --git a/sysdeps/x86_64/multiarch/strspn-c.c b/sysdeps/x86_64/multiarch/strspn-c.c
|
||||
index a17196296b9ebe52..3bcc479f1b52ff6a 100644
|
||||
--- a/sysdeps/x86_64/multiarch/strspn-c.c
|
||||
+++ b/sysdeps/x86_64/multiarch/strspn-c.c
|
||||
@@ -63,81 +63,73 @@ __strspn_sse42 (const char *s, const char *a)
|
||||
return 0;
|
||||
|
||||
const char *aligned;
|
||||
- __m128i mask;
|
||||
- int offset = (int) ((size_t) a & 15);
|
||||
+ __m128i mask, maskz, zero;
|
||||
+ unsigned int maskz_bits;
|
||||
+ unsigned int offset = (int) ((size_t) a & 15);
|
||||
+ zero = _mm_set1_epi8 (0);
|
||||
if (offset != 0)
|
||||
{
|
||||
/* Load masks. */
|
||||
aligned = (const char *) ((size_t) a & -16L);
|
||||
__m128i mask0 = _mm_load_si128 ((__m128i *) aligned);
|
||||
-
|
||||
- mask = __m128i_shift_right (mask0, offset);
|
||||
+ maskz = _mm_cmpeq_epi8 (mask0, zero);
|
||||
|
||||
/* Find where the NULL terminator is. */
|
||||
- int length = _mm_cmpistri (mask, mask, 0x3a);
|
||||
- if (length == 16 - offset)
|
||||
- {
|
||||
- /* There is no NULL terminator. */
|
||||
- __m128i mask1 = _mm_load_si128 ((__m128i *) (aligned + 16));
|
||||
- int index = _mm_cmpistri (mask1, mask1, 0x3a);
|
||||
- length += index;
|
||||
-
|
||||
- /* Don't use SSE4.2 if the length of A > 16. */
|
||||
- if (length > 16)
|
||||
- return __strspn_sse2 (s, a);
|
||||
-
|
||||
- if (index != 0)
|
||||
- {
|
||||
- /* Combine mask0 and mask1. We could play games with
|
||||
- palignr, but frankly this data should be in L1 now
|
||||
- so do the merge via an unaligned load. */
|
||||
- mask = _mm_loadu_si128 ((__m128i *) a);
|
||||
- }
|
||||
- }
|
||||
+ maskz_bits = _mm_movemask_epi8 (maskz) >> offset;
|
||||
+ if (maskz_bits != 0)
|
||||
+ {
|
||||
+ mask = __m128i_shift_right (mask0, offset);
|
||||
+ offset = (unsigned int) ((size_t) s & 15);
|
||||
+ if (offset)
|
||||
+ goto start_unaligned;
|
||||
+
|
||||
+ aligned = s;
|
||||
+ goto start_loop;
|
||||
+ }
|
||||
}
|
||||
- else
|
||||
- {
|
||||
- /* A is aligned. */
|
||||
- mask = _mm_load_si128 ((__m128i *) a);
|
||||
|
||||
- /* Find where the NULL terminator is. */
|
||||
- int length = _mm_cmpistri (mask, mask, 0x3a);
|
||||
- if (length == 16)
|
||||
- {
|
||||
- /* There is no NULL terminator. Don't use SSE4.2 if the length
|
||||
- of A > 16. */
|
||||
- if (a[16] != 0)
|
||||
- return __strspn_sse2 (s, a);
|
||||
- }
|
||||
+ /* A is aligned. */
|
||||
+ mask = _mm_loadu_si128 ((__m128i *) a);
|
||||
+
|
||||
+ /* Find where the NULL terminator is. */
|
||||
+ maskz = _mm_cmpeq_epi8 (mask, zero);
|
||||
+ maskz_bits = _mm_movemask_epi8 (maskz);
|
||||
+ if (maskz_bits == 0)
|
||||
+ {
|
||||
+ /* There is no NULL terminator. Don't use SSE4.2 if the length
|
||||
+ of A > 16. */
|
||||
+ if (a[16] != 0)
|
||||
+ return __strspn_sse2 (s, a);
|
||||
}
|
||||
+ aligned = s;
|
||||
+ offset = (unsigned int) ((size_t) s & 15);
|
||||
|
||||
- offset = (int) ((size_t) s & 15);
|
||||
if (offset != 0)
|
||||
{
|
||||
+ start_unaligned:
|
||||
/* Check partial string. */
|
||||
aligned = (const char *) ((size_t) s & -16L);
|
||||
__m128i value = _mm_load_si128 ((__m128i *) aligned);
|
||||
+ __m128i adj_value = __m128i_shift_right (value, offset);
|
||||
|
||||
- value = __m128i_shift_right (value, offset);
|
||||
-
|
||||
- int length = _mm_cmpistri (mask, value, 0x12);
|
||||
+ unsigned int length = _mm_cmpistri (mask, adj_value, 0x12);
|
||||
/* No need to check CFlag since it is always 1. */
|
||||
if (length < 16 - offset)
|
||||
return length;
|
||||
/* Find where the NULL terminator is. */
|
||||
- int index = _mm_cmpistri (value, value, 0x3a);
|
||||
- if (index < 16 - offset)
|
||||
+ maskz = _mm_cmpeq_epi8 (value, zero);
|
||||
+ maskz_bits = _mm_movemask_epi8 (maskz) >> offset;
|
||||
+ if (maskz_bits != 0)
|
||||
return length;
|
||||
aligned += 16;
|
||||
}
|
||||
- else
|
||||
- aligned = s;
|
||||
|
||||
+start_loop:
|
||||
while (1)
|
||||
{
|
||||
__m128i value = _mm_load_si128 ((__m128i *) aligned);
|
||||
- int index = _mm_cmpistri (mask, value, 0x12);
|
||||
- int cflag = _mm_cmpistrc (mask, value, 0x12);
|
||||
+ unsigned int index = _mm_cmpistri (mask, value, 0x12);
|
||||
+ unsigned int cflag = _mm_cmpistrc (mask, value, 0x12);
|
||||
if (cflag)
|
||||
return (size_t) (aligned + index - s);
|
||||
aligned += 16;
|
164
glibc-upstream-2.34-222.patch
Normal file
164
glibc-upstream-2.34-222.patch
Normal file
@ -0,0 +1,164 @@
|
||||
commit 0dafa75e3c42994d0f23db62651d1802577272f2
|
||||
Author: Noah Goldstein <goldstein.w.n@gmail.com>
|
||||
Date: Wed Mar 23 16:57:26 2022 -0500
|
||||
|
||||
x86: Remove strcspn-sse2.S and use the generic implementation
|
||||
|
||||
The generic implementation is faster.
|
||||
|
||||
geometric_mean(N=20) of all benchmarks New / Original: .678
|
||||
|
||||
All string/memory tests pass.
|
||||
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
|
||||
|
||||
(cherry picked from commit fe28e7d9d9535ebab4081d195c553b4fbf39d9ae)
|
||||
|
||||
diff --git a/sysdeps/x86_64/multiarch/strcspn-sse2.S b/sysdeps/x86_64/multiarch/strcspn-sse2.c
|
||||
similarity index 89%
|
||||
rename from sysdeps/x86_64/multiarch/strcspn-sse2.S
|
||||
rename to sysdeps/x86_64/multiarch/strcspn-sse2.c
|
||||
index 63b260a9ed265230..9bd3dac82d90b3a5 100644
|
||||
--- a/sysdeps/x86_64/multiarch/strcspn-sse2.S
|
||||
+++ b/sysdeps/x86_64/multiarch/strcspn-sse2.c
|
||||
@@ -19,10 +19,10 @@
|
||||
#if IS_IN (libc)
|
||||
|
||||
# include <sysdep.h>
|
||||
-# define strcspn __strcspn_sse2
|
||||
+# define STRCSPN __strcspn_sse2
|
||||
|
||||
# undef libc_hidden_builtin_def
|
||||
-# define libc_hidden_builtin_def(strcspn)
|
||||
+# define libc_hidden_builtin_def(STRCSPN)
|
||||
#endif
|
||||
|
||||
-#include <sysdeps/x86_64/strcspn.S>
|
||||
+#include <string/strcspn.c>
|
||||
diff --git a/sysdeps/x86_64/strcspn.S b/sysdeps/x86_64/strcspn.S
|
||||
deleted file mode 100644
|
||||
index 6035a274c87bafb0..0000000000000000
|
||||
--- a/sysdeps/x86_64/strcspn.S
|
||||
+++ /dev/null
|
||||
@@ -1,122 +0,0 @@
|
||||
-/* strcspn (str, ss) -- Return the length of the initial segment of STR
|
||||
- which contains no characters from SS.
|
||||
- For AMD x86-64.
|
||||
- Copyright (C) 1994-2021 Free Software Foundation, Inc.
|
||||
- This file is part of the GNU C Library.
|
||||
- Contributed by Ulrich Drepper <drepper@gnu.ai.mit.edu>.
|
||||
- Bug fixes by Alan Modra <Alan@SPRI.Levels.UniSA.Edu.Au>.
|
||||
- Adopted for x86-64 by Andreas Jaeger <aj@suse.de>.
|
||||
-
|
||||
- The GNU C Library is free software; you can redistribute it and/or
|
||||
- modify it under the terms of the GNU Lesser General Public
|
||||
- License as published by the Free Software Foundation; either
|
||||
- version 2.1 of the License, or (at your option) any later version.
|
||||
-
|
||||
- The GNU C Library is distributed in the hope that it will be useful,
|
||||
- but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||
- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
|
||||
- Lesser General Public License for more details.
|
||||
-
|
||||
- You should have received a copy of the GNU Lesser General Public
|
||||
- License along with the GNU C Library; if not, see
|
||||
- <https://www.gnu.org/licenses/>. */
|
||||
-
|
||||
-#include <sysdep.h>
|
||||
-#include "asm-syntax.h"
|
||||
-
|
||||
- .text
|
||||
-ENTRY (strcspn)
|
||||
-
|
||||
- movq %rdi, %rdx /* Save SRC. */
|
||||
-
|
||||
- /* First we create a table with flags for all possible characters.
|
||||
- For the ASCII (7bit/8bit) or ISO-8859-X character sets which are
|
||||
- supported by the C string functions we have 256 characters.
|
||||
- Before inserting marks for the stop characters we clear the whole
|
||||
- table. */
|
||||
- movq %rdi, %r8 /* Save value. */
|
||||
- subq $256, %rsp /* Make space for 256 bytes. */
|
||||
- cfi_adjust_cfa_offset(256)
|
||||
- movl $32, %ecx /* 32*8 bytes = 256 bytes. */
|
||||
- movq %rsp, %rdi
|
||||
- xorl %eax, %eax /* We store 0s. */
|
||||
- cld
|
||||
- rep
|
||||
- stosq
|
||||
-
|
||||
- movq %rsi, %rax /* Setup skipset. */
|
||||
-
|
||||
-/* For understanding the following code remember that %rcx == 0 now.
|
||||
- Although all the following instruction only modify %cl we always
|
||||
- have a correct zero-extended 64-bit value in %rcx. */
|
||||
-
|
||||
- .p2align 4
|
||||
-L(2): movb (%rax), %cl /* get byte from skipset */
|
||||
- testb %cl, %cl /* is NUL char? */
|
||||
- jz L(1) /* yes => start compare loop */
|
||||
- movb %cl, (%rsp,%rcx) /* set corresponding byte in skipset table */
|
||||
-
|
||||
- movb 1(%rax), %cl /* get byte from skipset */
|
||||
- testb $0xff, %cl /* is NUL char? */
|
||||
- jz L(1) /* yes => start compare loop */
|
||||
- movb %cl, (%rsp,%rcx) /* set corresponding byte in skipset table */
|
||||
-
|
||||
- movb 2(%rax), %cl /* get byte from skipset */
|
||||
- testb $0xff, %cl /* is NUL char? */
|
||||
- jz L(1) /* yes => start compare loop */
|
||||
- movb %cl, (%rsp,%rcx) /* set corresponding byte in skipset table */
|
||||
-
|
||||
- movb 3(%rax), %cl /* get byte from skipset */
|
||||
- addq $4, %rax /* increment skipset pointer */
|
||||
- movb %cl, (%rsp,%rcx) /* set corresponding byte in skipset table */
|
||||
- testb $0xff, %cl /* is NUL char? */
|
||||
- jnz L(2) /* no => process next dword from skipset */
|
||||
-
|
||||
-L(1): leaq -4(%rdx), %rax /* prepare loop */
|
||||
-
|
||||
- /* We use a neat trick for the following loop. Normally we would
|
||||
- have to test for two termination conditions
|
||||
- 1. a character in the skipset was found
|
||||
- and
|
||||
- 2. the end of the string was found
|
||||
- But as a sign that the character is in the skipset we store its
|
||||
- value in the table. But the value of NUL is NUL so the loop
|
||||
- terminates for NUL in every case. */
|
||||
-
|
||||
- .p2align 4
|
||||
-L(3): addq $4, %rax /* adjust pointer for full loop round */
|
||||
-
|
||||
- movb (%rax), %cl /* get byte from string */
|
||||
- cmpb %cl, (%rsp,%rcx) /* is it contained in skipset? */
|
||||
- je L(4) /* yes => return */
|
||||
-
|
||||
- movb 1(%rax), %cl /* get byte from string */
|
||||
- cmpb %cl, (%rsp,%rcx) /* is it contained in skipset? */
|
||||
- je L(5) /* yes => return */
|
||||
-
|
||||
- movb 2(%rax), %cl /* get byte from string */
|
||||
- cmpb %cl, (%rsp,%rcx) /* is it contained in skipset? */
|
||||
- jz L(6) /* yes => return */
|
||||
-
|
||||
- movb 3(%rax), %cl /* get byte from string */
|
||||
- cmpb %cl, (%rsp,%rcx) /* is it contained in skipset? */
|
||||
- jne L(3) /* no => start loop again */
|
||||
-
|
||||
- incq %rax /* adjust pointer */
|
||||
-L(6): incq %rax
|
||||
-L(5): incq %rax
|
||||
-
|
||||
-L(4): addq $256, %rsp /* remove skipset */
|
||||
- cfi_adjust_cfa_offset(-256)
|
||||
-#ifdef USE_AS_STRPBRK
|
||||
- xorl %edx,%edx
|
||||
- orb %cl, %cl /* was last character NUL? */
|
||||
- cmovzq %rdx, %rax /* Yes: return NULL */
|
||||
-#else
|
||||
- subq %rdx, %rax /* we have to return the number of valid
|
||||
- characters, so compute distance to first
|
||||
- non-valid character */
|
||||
-#endif
|
||||
- ret
|
||||
-END (strcspn)
|
||||
-libc_hidden_builtin_def (strcspn)
|
44
glibc-upstream-2.34-223.patch
Normal file
44
glibc-upstream-2.34-223.patch
Normal file
@ -0,0 +1,44 @@
|
||||
commit 38115446558e6d0976299eb592ba7266681c27d5
|
||||
Author: Noah Goldstein <goldstein.w.n@gmail.com>
|
||||
Date: Wed Mar 23 16:57:27 2022 -0500
|
||||
|
||||
x86: Remove strpbrk-sse2.S and use the generic implementation
|
||||
|
||||
The generic implementation is faster (see strcspn commit).
|
||||
|
||||
All string/memory tests pass.
|
||||
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
|
||||
|
||||
(cherry picked from commit 653358535280a599382cb6c77538a187dac6a87f)
|
||||
|
||||
diff --git a/sysdeps/x86_64/multiarch/strpbrk-sse2.S b/sysdeps/x86_64/multiarch/strpbrk-sse2.c
|
||||
similarity index 87%
|
||||
rename from sysdeps/x86_64/multiarch/strpbrk-sse2.S
|
||||
rename to sysdeps/x86_64/multiarch/strpbrk-sse2.c
|
||||
index c5b95d08ff09cb27..8a58f051c35163dd 100644
|
||||
--- a/sysdeps/x86_64/multiarch/strpbrk-sse2.S
|
||||
+++ b/sysdeps/x86_64/multiarch/strpbrk-sse2.c
|
||||
@@ -19,11 +19,10 @@
|
||||
#if IS_IN (libc)
|
||||
|
||||
# include <sysdep.h>
|
||||
-# define strcspn __strpbrk_sse2
|
||||
+# define STRPBRK __strpbrk_sse2
|
||||
|
||||
# undef libc_hidden_builtin_def
|
||||
-# define libc_hidden_builtin_def(strpbrk)
|
||||
+# define libc_hidden_builtin_def(STRPBRK)
|
||||
#endif
|
||||
|
||||
-#define USE_AS_STRPBRK
|
||||
-#include <sysdeps/x86_64/strcspn.S>
|
||||
+#include <string/strpbrk.c>
|
||||
diff --git a/sysdeps/x86_64/strpbrk.S b/sysdeps/x86_64/strpbrk.S
|
||||
deleted file mode 100644
|
||||
index 21888a5b923974f9..0000000000000000
|
||||
--- a/sysdeps/x86_64/strpbrk.S
|
||||
+++ /dev/null
|
||||
@@ -1,3 +0,0 @@
|
||||
-#define strcspn strpbrk
|
||||
-#define USE_AS_STRPBRK
|
||||
-#include <sysdeps/x86_64/strcspn.S>
|
157
glibc-upstream-2.34-224.patch
Normal file
157
glibc-upstream-2.34-224.patch
Normal file
@ -0,0 +1,157 @@
|
||||
commit a4b1cae068d4d6e3117dd49e7d0599e4c62ac39f
|
||||
Author: Noah Goldstein <goldstein.w.n@gmail.com>
|
||||
Date: Wed Mar 23 16:57:29 2022 -0500
|
||||
|
||||
x86: Remove strspn-sse2.S and use the generic implementation
|
||||
|
||||
The generic implementation is faster.
|
||||
|
||||
geometric_mean(N=20) of all benchmarks New / Original: .710
|
||||
|
||||
All string/memory tests pass.
|
||||
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
|
||||
|
||||
(cherry picked from commit 9c8a6ad620b49a27120ecdd7049c26bf05900397)
|
||||
|
||||
diff --git a/sysdeps/x86_64/multiarch/strspn-sse2.S b/sysdeps/x86_64/multiarch/strspn-sse2.c
|
||||
similarity index 89%
|
||||
rename from sysdeps/x86_64/multiarch/strspn-sse2.S
|
||||
rename to sysdeps/x86_64/multiarch/strspn-sse2.c
|
||||
index e919fe492cc15151..f5e5686db1037740 100644
|
||||
--- a/sysdeps/x86_64/multiarch/strspn-sse2.S
|
||||
+++ b/sysdeps/x86_64/multiarch/strspn-sse2.c
|
||||
@@ -19,10 +19,10 @@
|
||||
#if IS_IN (libc)
|
||||
|
||||
# include <sysdep.h>
|
||||
-# define strspn __strspn_sse2
|
||||
+# define STRSPN __strspn_sse2
|
||||
|
||||
# undef libc_hidden_builtin_def
|
||||
-# define libc_hidden_builtin_def(strspn)
|
||||
+# define libc_hidden_builtin_def(STRSPN)
|
||||
#endif
|
||||
|
||||
-#include <sysdeps/x86_64/strspn.S>
|
||||
+#include <string/strspn.c>
|
||||
diff --git a/sysdeps/x86_64/strspn.S b/sysdeps/x86_64/strspn.S
|
||||
deleted file mode 100644
|
||||
index e878f328852792db..0000000000000000
|
||||
--- a/sysdeps/x86_64/strspn.S
|
||||
+++ /dev/null
|
||||
@@ -1,115 +0,0 @@
|
||||
-/* strspn (str, ss) -- Return the length of the initial segment of STR
|
||||
- which contains only characters from SS.
|
||||
- For AMD x86-64.
|
||||
- Copyright (C) 1994-2021 Free Software Foundation, Inc.
|
||||
- This file is part of the GNU C Library.
|
||||
- Contributed by Ulrich Drepper <drepper@gnu.ai.mit.edu>.
|
||||
- Bug fixes by Alan Modra <Alan@SPRI.Levels.UniSA.Edu.Au>.
|
||||
- Adopted for x86-64 by Andreas Jaeger <aj@suse.de>.
|
||||
-
|
||||
- The GNU C Library is free software; you can redistribute it and/or
|
||||
- modify it under the terms of the GNU Lesser General Public
|
||||
- License as published by the Free Software Foundation; either
|
||||
- version 2.1 of the License, or (at your option) any later version.
|
||||
-
|
||||
- The GNU C Library is distributed in the hope that it will be useful,
|
||||
- but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||
- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
|
||||
- Lesser General Public License for more details.
|
||||
-
|
||||
- You should have received a copy of the GNU Lesser General Public
|
||||
- License along with the GNU C Library; if not, see
|
||||
- <https://www.gnu.org/licenses/>. */
|
||||
-
|
||||
-#include <sysdep.h>
|
||||
-
|
||||
- .text
|
||||
-ENTRY (strspn)
|
||||
-
|
||||
- movq %rdi, %rdx /* Save SRC. */
|
||||
-
|
||||
- /* First we create a table with flags for all possible characters.
|
||||
- For the ASCII (7bit/8bit) or ISO-8859-X character sets which are
|
||||
- supported by the C string functions we have 256 characters.
|
||||
- Before inserting marks for the stop characters we clear the whole
|
||||
- table. */
|
||||
- movq %rdi, %r8 /* Save value. */
|
||||
- subq $256, %rsp /* Make space for 256 bytes. */
|
||||
- cfi_adjust_cfa_offset(256)
|
||||
- movl $32, %ecx /* 32*8 bytes = 256 bytes. */
|
||||
- movq %rsp, %rdi
|
||||
- xorl %eax, %eax /* We store 0s. */
|
||||
- cld
|
||||
- rep
|
||||
- stosq
|
||||
-
|
||||
- movq %rsi, %rax /* Setup stopset. */
|
||||
-
|
||||
-/* For understanding the following code remember that %rcx == 0 now.
|
||||
- Although all the following instruction only modify %cl we always
|
||||
- have a correct zero-extended 64-bit value in %rcx. */
|
||||
-
|
||||
- .p2align 4
|
||||
-L(2): movb (%rax), %cl /* get byte from stopset */
|
||||
- testb %cl, %cl /* is NUL char? */
|
||||
- jz L(1) /* yes => start compare loop */
|
||||
- movb %cl, (%rsp,%rcx) /* set corresponding byte in stopset table */
|
||||
-
|
||||
- movb 1(%rax), %cl /* get byte from stopset */
|
||||
- testb $0xff, %cl /* is NUL char? */
|
||||
- jz L(1) /* yes => start compare loop */
|
||||
- movb %cl, (%rsp,%rcx) /* set corresponding byte in stopset table */
|
||||
-
|
||||
- movb 2(%rax), %cl /* get byte from stopset */
|
||||
- testb $0xff, %cl /* is NUL char? */
|
||||
- jz L(1) /* yes => start compare loop */
|
||||
- movb %cl, (%rsp,%rcx) /* set corresponding byte in stopset table */
|
||||
-
|
||||
- movb 3(%rax), %cl /* get byte from stopset */
|
||||
- addq $4, %rax /* increment stopset pointer */
|
||||
- movb %cl, (%rsp,%rcx) /* set corresponding byte in stopset table */
|
||||
- testb $0xff, %cl /* is NUL char? */
|
||||
- jnz L(2) /* no => process next dword from stopset */
|
||||
-
|
||||
-L(1): leaq -4(%rdx), %rax /* prepare loop */
|
||||
-
|
||||
- /* We use a neat trick for the following loop. Normally we would
|
||||
- have to test for two termination conditions
|
||||
- 1. a character in the stopset was found
|
||||
- and
|
||||
- 2. the end of the string was found
|
||||
- But as a sign that the character is in the stopset we store its
|
||||
- value in the table. But the value of NUL is NUL so the loop
|
||||
- terminates for NUL in every case. */
|
||||
-
|
||||
- .p2align 4
|
||||
-L(3): addq $4, %rax /* adjust pointer for full loop round */
|
||||
-
|
||||
- movb (%rax), %cl /* get byte from string */
|
||||
- testb %cl, (%rsp,%rcx) /* is it contained in skipset? */
|
||||
- jz L(4) /* no => return */
|
||||
-
|
||||
- movb 1(%rax), %cl /* get byte from string */
|
||||
- testb %cl, (%rsp,%rcx) /* is it contained in skipset? */
|
||||
- jz L(5) /* no => return */
|
||||
-
|
||||
- movb 2(%rax), %cl /* get byte from string */
|
||||
- testb %cl, (%rsp,%rcx) /* is it contained in skipset? */
|
||||
- jz L(6) /* no => return */
|
||||
-
|
||||
- movb 3(%rax), %cl /* get byte from string */
|
||||
- testb %cl, (%rsp,%rcx) /* is it contained in skipset? */
|
||||
- jnz L(3) /* yes => start loop again */
|
||||
-
|
||||
- incq %rax /* adjust pointer */
|
||||
-L(6): incq %rax
|
||||
-L(5): incq %rax
|
||||
-
|
||||
-L(4): addq $256, %rsp /* remove stopset */
|
||||
- cfi_adjust_cfa_offset(-256)
|
||||
- subq %rdx, %rax /* we have to return the number of valid
|
||||
- characters, so compute distance to first
|
||||
- non-valid character */
|
||||
- ret
|
||||
-END (strspn)
|
||||
-libc_hidden_builtin_def (strspn)
|
118
glibc-upstream-2.34-225.patch
Normal file
118
glibc-upstream-2.34-225.patch
Normal file
@ -0,0 +1,118 @@
|
||||
commit 5997011826b7bbb7015f56bf143a6e4fd0f5a7df
|
||||
Author: Noah Goldstein <goldstein.w.n@gmail.com>
|
||||
Date: Wed Mar 23 16:57:36 2022 -0500
|
||||
|
||||
x86: Optimize str{n}casecmp TOLOWER logic in strcmp.S
|
||||
|
||||
Slightly faster method of doing TOLOWER that saves an
|
||||
instruction.
|
||||
|
||||
Also replace the hard coded 5-byte no with .p2align 4. On builds with
|
||||
CET enabled this misaligned entry to strcasecmp.
|
||||
|
||||
geometric_mean(N=40) of all benchmarks New / Original: .894
|
||||
|
||||
All string/memory tests pass.
|
||||
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
|
||||
|
||||
(cherry picked from commit 670b54bc585ea4a94f3b2e9272ba44aa6b730b73)
|
||||
|
||||
diff --git a/sysdeps/x86_64/strcmp.S b/sysdeps/x86_64/strcmp.S
|
||||
index 7f8a1bc756f86aee..ca70b540eb2dd190 100644
|
||||
--- a/sysdeps/x86_64/strcmp.S
|
||||
+++ b/sysdeps/x86_64/strcmp.S
|
||||
@@ -78,9 +78,8 @@ ENTRY2 (__strcasecmp)
|
||||
movq __libc_tsd_LOCALE@gottpoff(%rip),%rax
|
||||
mov %fs:(%rax),%RDX_LP
|
||||
|
||||
- // XXX 5 byte should be before the function
|
||||
- /* 5-byte NOP. */
|
||||
- .byte 0x0f,0x1f,0x44,0x00,0x00
|
||||
+ /* Either 1 or 5 bytes (dependeing if CET is enabled). */
|
||||
+ .p2align 4
|
||||
END2 (__strcasecmp)
|
||||
# ifndef NO_NOLOCALE_ALIAS
|
||||
weak_alias (__strcasecmp, strcasecmp)
|
||||
@@ -97,9 +96,8 @@ ENTRY2 (__strncasecmp)
|
||||
movq __libc_tsd_LOCALE@gottpoff(%rip),%rax
|
||||
mov %fs:(%rax),%RCX_LP
|
||||
|
||||
- // XXX 5 byte should be before the function
|
||||
- /* 5-byte NOP. */
|
||||
- .byte 0x0f,0x1f,0x44,0x00,0x00
|
||||
+ /* Either 1 or 5 bytes (dependeing if CET is enabled). */
|
||||
+ .p2align 4
|
||||
END2 (__strncasecmp)
|
||||
# ifndef NO_NOLOCALE_ALIAS
|
||||
weak_alias (__strncasecmp, strncasecmp)
|
||||
@@ -149,22 +147,22 @@ ENTRY (STRCMP)
|
||||
#if defined USE_AS_STRCASECMP_L || defined USE_AS_STRNCASECMP_L
|
||||
.section .rodata.cst16,"aM",@progbits,16
|
||||
.align 16
|
||||
-.Lbelowupper:
|
||||
- .quad 0x4040404040404040
|
||||
- .quad 0x4040404040404040
|
||||
-.Ltopupper:
|
||||
- .quad 0x5b5b5b5b5b5b5b5b
|
||||
- .quad 0x5b5b5b5b5b5b5b5b
|
||||
-.Ltouppermask:
|
||||
+.Llcase_min:
|
||||
+ .quad 0x3f3f3f3f3f3f3f3f
|
||||
+ .quad 0x3f3f3f3f3f3f3f3f
|
||||
+.Llcase_max:
|
||||
+ .quad 0x9999999999999999
|
||||
+ .quad 0x9999999999999999
|
||||
+.Lcase_add:
|
||||
.quad 0x2020202020202020
|
||||
.quad 0x2020202020202020
|
||||
.previous
|
||||
- movdqa .Lbelowupper(%rip), %xmm5
|
||||
-# define UCLOW_reg %xmm5
|
||||
- movdqa .Ltopupper(%rip), %xmm6
|
||||
-# define UCHIGH_reg %xmm6
|
||||
- movdqa .Ltouppermask(%rip), %xmm7
|
||||
-# define LCQWORD_reg %xmm7
|
||||
+ movdqa .Llcase_min(%rip), %xmm5
|
||||
+# define LCASE_MIN_reg %xmm5
|
||||
+ movdqa .Llcase_max(%rip), %xmm6
|
||||
+# define LCASE_MAX_reg %xmm6
|
||||
+ movdqa .Lcase_add(%rip), %xmm7
|
||||
+# define CASE_ADD_reg %xmm7
|
||||
#endif
|
||||
cmp $0x30, %ecx
|
||||
ja LABEL(crosscache) /* rsi: 16-byte load will cross cache line */
|
||||
@@ -175,22 +173,18 @@ ENTRY (STRCMP)
|
||||
movhpd 8(%rdi), %xmm1
|
||||
movhpd 8(%rsi), %xmm2
|
||||
#if defined USE_AS_STRCASECMP_L || defined USE_AS_STRNCASECMP_L
|
||||
-# define TOLOWER(reg1, reg2) \
|
||||
- movdqa reg1, %xmm8; \
|
||||
- movdqa UCHIGH_reg, %xmm9; \
|
||||
- movdqa reg2, %xmm10; \
|
||||
- movdqa UCHIGH_reg, %xmm11; \
|
||||
- pcmpgtb UCLOW_reg, %xmm8; \
|
||||
- pcmpgtb reg1, %xmm9; \
|
||||
- pcmpgtb UCLOW_reg, %xmm10; \
|
||||
- pcmpgtb reg2, %xmm11; \
|
||||
- pand %xmm9, %xmm8; \
|
||||
- pand %xmm11, %xmm10; \
|
||||
- pand LCQWORD_reg, %xmm8; \
|
||||
- pand LCQWORD_reg, %xmm10; \
|
||||
- por %xmm8, reg1; \
|
||||
- por %xmm10, reg2
|
||||
- TOLOWER (%xmm1, %xmm2)
|
||||
+# define TOLOWER(reg1, reg2) \
|
||||
+ movdqa LCASE_MIN_reg, %xmm8; \
|
||||
+ movdqa LCASE_MIN_reg, %xmm9; \
|
||||
+ paddb reg1, %xmm8; \
|
||||
+ paddb reg2, %xmm9; \
|
||||
+ pcmpgtb LCASE_MAX_reg, %xmm8; \
|
||||
+ pcmpgtb LCASE_MAX_reg, %xmm9; \
|
||||
+ pandn CASE_ADD_reg, %xmm8; \
|
||||
+ pandn CASE_ADD_reg, %xmm9; \
|
||||
+ paddb %xmm8, reg1; \
|
||||
+ paddb %xmm9, reg2
|
||||
+ TOLOWER (%xmm1, %xmm2)
|
||||
#else
|
||||
# define TOLOWER(reg1, reg2)
|
||||
#endif
|
139
glibc-upstream-2.34-226.patch
Normal file
139
glibc-upstream-2.34-226.patch
Normal file
@ -0,0 +1,139 @@
|
||||
commit 3605c744078bb048d876298aaf12a2869e8071b8
|
||||
Author: Noah Goldstein <goldstein.w.n@gmail.com>
|
||||
Date: Wed Mar 23 16:57:38 2022 -0500
|
||||
|
||||
x86: Optimize str{n}casecmp TOLOWER logic in strcmp-sse42.S
|
||||
|
||||
Slightly faster method of doing TOLOWER that saves an
|
||||
instruction.
|
||||
|
||||
Also replace the hard coded 5-byte no with .p2align 4. On builds with
|
||||
CET enabled this misaligned entry to strcasecmp.
|
||||
|
||||
geometric_mean(N=40) of all benchmarks New / Original: .920
|
||||
|
||||
All string/memory tests pass.
|
||||
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
|
||||
|
||||
(cherry picked from commit d154758e618ec9324f5d339c46db0aa27e8b1226)
|
||||
|
||||
diff --git a/sysdeps/x86_64/multiarch/strcmp-sse42.S b/sysdeps/x86_64/multiarch/strcmp-sse42.S
|
||||
index 6197a723b9e0606e..a6825de8195ad8c6 100644
|
||||
--- a/sysdeps/x86_64/multiarch/strcmp-sse42.S
|
||||
+++ b/sysdeps/x86_64/multiarch/strcmp-sse42.S
|
||||
@@ -89,9 +89,8 @@ ENTRY (GLABEL(__strcasecmp))
|
||||
movq __libc_tsd_LOCALE@gottpoff(%rip),%rax
|
||||
mov %fs:(%rax),%RDX_LP
|
||||
|
||||
- // XXX 5 byte should be before the function
|
||||
- /* 5-byte NOP. */
|
||||
- .byte 0x0f,0x1f,0x44,0x00,0x00
|
||||
+ /* Either 1 or 5 bytes (dependeing if CET is enabled). */
|
||||
+ .p2align 4
|
||||
END (GLABEL(__strcasecmp))
|
||||
/* FALLTHROUGH to strcasecmp_l. */
|
||||
#endif
|
||||
@@ -100,9 +99,8 @@ ENTRY (GLABEL(__strncasecmp))
|
||||
movq __libc_tsd_LOCALE@gottpoff(%rip),%rax
|
||||
mov %fs:(%rax),%RCX_LP
|
||||
|
||||
- // XXX 5 byte should be before the function
|
||||
- /* 5-byte NOP. */
|
||||
- .byte 0x0f,0x1f,0x44,0x00,0x00
|
||||
+ /* Either 1 or 5 bytes (dependeing if CET is enabled). */
|
||||
+ .p2align 4
|
||||
END (GLABEL(__strncasecmp))
|
||||
/* FALLTHROUGH to strncasecmp_l. */
|
||||
#endif
|
||||
@@ -170,27 +168,22 @@ STRCMP_SSE42:
|
||||
#if defined USE_AS_STRCASECMP_L || defined USE_AS_STRNCASECMP_L
|
||||
.section .rodata.cst16,"aM",@progbits,16
|
||||
.align 16
|
||||
-LABEL(belowupper):
|
||||
- .quad 0x4040404040404040
|
||||
- .quad 0x4040404040404040
|
||||
-LABEL(topupper):
|
||||
-# ifdef USE_AVX
|
||||
- .quad 0x5a5a5a5a5a5a5a5a
|
||||
- .quad 0x5a5a5a5a5a5a5a5a
|
||||
-# else
|
||||
- .quad 0x5b5b5b5b5b5b5b5b
|
||||
- .quad 0x5b5b5b5b5b5b5b5b
|
||||
-# endif
|
||||
-LABEL(touppermask):
|
||||
+LABEL(lcase_min):
|
||||
+ .quad 0x3f3f3f3f3f3f3f3f
|
||||
+ .quad 0x3f3f3f3f3f3f3f3f
|
||||
+LABEL(lcase_max):
|
||||
+ .quad 0x9999999999999999
|
||||
+ .quad 0x9999999999999999
|
||||
+LABEL(case_add):
|
||||
.quad 0x2020202020202020
|
||||
.quad 0x2020202020202020
|
||||
.previous
|
||||
- movdqa LABEL(belowupper)(%rip), %xmm4
|
||||
-# define UCLOW_reg %xmm4
|
||||
- movdqa LABEL(topupper)(%rip), %xmm5
|
||||
-# define UCHIGH_reg %xmm5
|
||||
- movdqa LABEL(touppermask)(%rip), %xmm6
|
||||
-# define LCQWORD_reg %xmm6
|
||||
+ movdqa LABEL(lcase_min)(%rip), %xmm4
|
||||
+# define LCASE_MIN_reg %xmm4
|
||||
+ movdqa LABEL(lcase_max)(%rip), %xmm5
|
||||
+# define LCASE_MAX_reg %xmm5
|
||||
+ movdqa LABEL(case_add)(%rip), %xmm6
|
||||
+# define CASE_ADD_reg %xmm6
|
||||
#endif
|
||||
cmp $0x30, %ecx
|
||||
ja LABEL(crosscache)/* rsi: 16-byte load will cross cache line */
|
||||
@@ -201,32 +194,26 @@ LABEL(touppermask):
|
||||
#if defined USE_AS_STRCASECMP_L || defined USE_AS_STRNCASECMP_L
|
||||
# ifdef USE_AVX
|
||||
# define TOLOWER(reg1, reg2) \
|
||||
- vpcmpgtb UCLOW_reg, reg1, %xmm7; \
|
||||
- vpcmpgtb UCHIGH_reg, reg1, %xmm8; \
|
||||
- vpcmpgtb UCLOW_reg, reg2, %xmm9; \
|
||||
- vpcmpgtb UCHIGH_reg, reg2, %xmm10; \
|
||||
- vpandn %xmm7, %xmm8, %xmm8; \
|
||||
- vpandn %xmm9, %xmm10, %xmm10; \
|
||||
- vpand LCQWORD_reg, %xmm8, %xmm8; \
|
||||
- vpand LCQWORD_reg, %xmm10, %xmm10; \
|
||||
- vpor reg1, %xmm8, reg1; \
|
||||
- vpor reg2, %xmm10, reg2
|
||||
+ vpaddb LCASE_MIN_reg, reg1, %xmm7; \
|
||||
+ vpaddb LCASE_MIN_reg, reg2, %xmm8; \
|
||||
+ vpcmpgtb LCASE_MAX_reg, %xmm7, %xmm7; \
|
||||
+ vpcmpgtb LCASE_MAX_reg, %xmm8, %xmm8; \
|
||||
+ vpandn CASE_ADD_reg, %xmm7, %xmm7; \
|
||||
+ vpandn CASE_ADD_reg, %xmm8, %xmm8; \
|
||||
+ vpaddb %xmm7, reg1, reg1; \
|
||||
+ vpaddb %xmm8, reg2, reg2
|
||||
# else
|
||||
# define TOLOWER(reg1, reg2) \
|
||||
- movdqa reg1, %xmm7; \
|
||||
- movdqa UCHIGH_reg, %xmm8; \
|
||||
- movdqa reg2, %xmm9; \
|
||||
- movdqa UCHIGH_reg, %xmm10; \
|
||||
- pcmpgtb UCLOW_reg, %xmm7; \
|
||||
- pcmpgtb reg1, %xmm8; \
|
||||
- pcmpgtb UCLOW_reg, %xmm9; \
|
||||
- pcmpgtb reg2, %xmm10; \
|
||||
- pand %xmm8, %xmm7; \
|
||||
- pand %xmm10, %xmm9; \
|
||||
- pand LCQWORD_reg, %xmm7; \
|
||||
- pand LCQWORD_reg, %xmm9; \
|
||||
- por %xmm7, reg1; \
|
||||
- por %xmm9, reg2
|
||||
+ movdqa LCASE_MIN_reg, %xmm7; \
|
||||
+ movdqa LCASE_MIN_reg, %xmm8; \
|
||||
+ paddb reg1, %xmm7; \
|
||||
+ paddb reg2, %xmm8; \
|
||||
+ pcmpgtb LCASE_MAX_reg, %xmm7; \
|
||||
+ pcmpgtb LCASE_MAX_reg, %xmm8; \
|
||||
+ pandn CASE_ADD_reg, %xmm7; \
|
||||
+ pandn CASE_ADD_reg, %xmm8; \
|
||||
+ paddb %xmm7, reg1; \
|
||||
+ paddb %xmm8, reg2
|
||||
# endif
|
||||
TOLOWER (%xmm1, %xmm2)
|
||||
#else
|
744
glibc-upstream-2.34-227.patch
Normal file
744
glibc-upstream-2.34-227.patch
Normal file
@ -0,0 +1,744 @@
|
||||
commit 3051cf3e745015a9106cf71be7f7adbb2f83fcac
|
||||
Author: Noah Goldstein <goldstein.w.n@gmail.com>
|
||||
Date: Thu Mar 24 18:56:12 2022 -0500
|
||||
|
||||
x86: Add AVX2 optimized str{n}casecmp
|
||||
|
||||
geometric_mean(N=40) of all benchmarks AVX2 / SSE42: .702
|
||||
|
||||
All string/memory tests pass.
|
||||
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
|
||||
|
||||
(cherry picked from commit bbf81222343fed5cd704001a2ae0d86c71544151)
|
||||
|
||||
diff --git a/sysdeps/x86_64/multiarch/Makefile b/sysdeps/x86_64/multiarch/Makefile
|
||||
index 8c9e7812c6af10b8..711ecf2ee45d61b9 100644
|
||||
--- a/sysdeps/x86_64/multiarch/Makefile
|
||||
+++ b/sysdeps/x86_64/multiarch/Makefile
|
||||
@@ -51,6 +51,8 @@ sysdep_routines += \
|
||||
stpncpy-sse2-unaligned \
|
||||
stpncpy-ssse3 \
|
||||
strcasecmp_l-avx \
|
||||
+ strcasecmp_l-avx2 \
|
||||
+ strcasecmp_l-avx2-rtm \
|
||||
strcasecmp_l-sse2 \
|
||||
strcasecmp_l-sse4_2 \
|
||||
strcasecmp_l-ssse3 \
|
||||
@@ -89,6 +91,8 @@ sysdep_routines += \
|
||||
strlen-evex \
|
||||
strlen-sse2 \
|
||||
strncase_l-avx \
|
||||
+ strncase_l-avx2 \
|
||||
+ strncase_l-avx2-rtm \
|
||||
strncase_l-sse2 \
|
||||
strncase_l-sse4_2 \
|
||||
strncase_l-ssse3 \
|
||||
diff --git a/sysdeps/x86_64/multiarch/ifunc-impl-list.c b/sysdeps/x86_64/multiarch/ifunc-impl-list.c
|
||||
index 4992d7bd3206a7c0..a687b387c91aa9ae 100644
|
||||
--- a/sysdeps/x86_64/multiarch/ifunc-impl-list.c
|
||||
+++ b/sysdeps/x86_64/multiarch/ifunc-impl-list.c
|
||||
@@ -418,6 +418,13 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array,
|
||||
|
||||
/* Support sysdeps/x86_64/multiarch/strcasecmp_l.c. */
|
||||
IFUNC_IMPL (i, name, strcasecmp,
|
||||
+ IFUNC_IMPL_ADD (array, i, strcasecmp,
|
||||
+ CPU_FEATURE_USABLE (AVX2),
|
||||
+ __strcasecmp_avx2)
|
||||
+ IFUNC_IMPL_ADD (array, i, strcasecmp,
|
||||
+ (CPU_FEATURE_USABLE (AVX2)
|
||||
+ && CPU_FEATURE_USABLE (RTM)),
|
||||
+ __strcasecmp_avx2_rtm)
|
||||
IFUNC_IMPL_ADD (array, i, strcasecmp,
|
||||
CPU_FEATURE_USABLE (AVX),
|
||||
__strcasecmp_avx)
|
||||
@@ -431,6 +438,13 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array,
|
||||
|
||||
/* Support sysdeps/x86_64/multiarch/strcasecmp_l.c. */
|
||||
IFUNC_IMPL (i, name, strcasecmp_l,
|
||||
+ IFUNC_IMPL_ADD (array, i, strcasecmp,
|
||||
+ CPU_FEATURE_USABLE (AVX2),
|
||||
+ __strcasecmp_l_avx2)
|
||||
+ IFUNC_IMPL_ADD (array, i, strcasecmp,
|
||||
+ (CPU_FEATURE_USABLE (AVX2)
|
||||
+ && CPU_FEATURE_USABLE (RTM)),
|
||||
+ __strcasecmp_l_avx2_rtm)
|
||||
IFUNC_IMPL_ADD (array, i, strcasecmp_l,
|
||||
CPU_FEATURE_USABLE (AVX),
|
||||
__strcasecmp_l_avx)
|
||||
@@ -558,6 +572,13 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array,
|
||||
|
||||
/* Support sysdeps/x86_64/multiarch/strncase_l.c. */
|
||||
IFUNC_IMPL (i, name, strncasecmp,
|
||||
+ IFUNC_IMPL_ADD (array, i, strncasecmp,
|
||||
+ CPU_FEATURE_USABLE (AVX2),
|
||||
+ __strncasecmp_avx2)
|
||||
+ IFUNC_IMPL_ADD (array, i, strncasecmp,
|
||||
+ (CPU_FEATURE_USABLE (AVX2)
|
||||
+ && CPU_FEATURE_USABLE (RTM)),
|
||||
+ __strncasecmp_avx2_rtm)
|
||||
IFUNC_IMPL_ADD (array, i, strncasecmp,
|
||||
CPU_FEATURE_USABLE (AVX),
|
||||
__strncasecmp_avx)
|
||||
@@ -572,6 +593,13 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array,
|
||||
|
||||
/* Support sysdeps/x86_64/multiarch/strncase_l.c. */
|
||||
IFUNC_IMPL (i, name, strncasecmp_l,
|
||||
+ IFUNC_IMPL_ADD (array, i, strncasecmp,
|
||||
+ CPU_FEATURE_USABLE (AVX2),
|
||||
+ __strncasecmp_l_avx2)
|
||||
+ IFUNC_IMPL_ADD (array, i, strncasecmp,
|
||||
+ (CPU_FEATURE_USABLE (AVX2)
|
||||
+ && CPU_FEATURE_USABLE (RTM)),
|
||||
+ __strncasecmp_l_avx2_rtm)
|
||||
IFUNC_IMPL_ADD (array, i, strncasecmp_l,
|
||||
CPU_FEATURE_USABLE (AVX),
|
||||
__strncasecmp_l_avx)
|
||||
diff --git a/sysdeps/x86_64/multiarch/ifunc-strcasecmp.h b/sysdeps/x86_64/multiarch/ifunc-strcasecmp.h
|
||||
index 931770e079fcc69f..64d0cd6ef25f73c0 100644
|
||||
--- a/sysdeps/x86_64/multiarch/ifunc-strcasecmp.h
|
||||
+++ b/sysdeps/x86_64/multiarch/ifunc-strcasecmp.h
|
||||
@@ -23,12 +23,24 @@ extern __typeof (REDIRECT_NAME) OPTIMIZE (sse2) attribute_hidden;
|
||||
extern __typeof (REDIRECT_NAME) OPTIMIZE (ssse3) attribute_hidden;
|
||||
extern __typeof (REDIRECT_NAME) OPTIMIZE (sse42) attribute_hidden;
|
||||
extern __typeof (REDIRECT_NAME) OPTIMIZE (avx) attribute_hidden;
|
||||
+extern __typeof (REDIRECT_NAME) OPTIMIZE (avx2) attribute_hidden;
|
||||
+extern __typeof (REDIRECT_NAME) OPTIMIZE (avx2_rtm) attribute_hidden;
|
||||
|
||||
static inline void *
|
||||
IFUNC_SELECTOR (void)
|
||||
{
|
||||
const struct cpu_features* cpu_features = __get_cpu_features ();
|
||||
|
||||
+ if (CPU_FEATURE_USABLE_P (cpu_features, AVX2)
|
||||
+ && CPU_FEATURES_ARCH_P (cpu_features, AVX_Fast_Unaligned_Load))
|
||||
+ {
|
||||
+ if (CPU_FEATURE_USABLE_P (cpu_features, RTM))
|
||||
+ return OPTIMIZE (avx2_rtm);
|
||||
+
|
||||
+ if (!CPU_FEATURES_ARCH_P (cpu_features, Prefer_No_VZEROUPPER))
|
||||
+ return OPTIMIZE (avx2);
|
||||
+ }
|
||||
+
|
||||
if (CPU_FEATURE_USABLE_P (cpu_features, AVX))
|
||||
return OPTIMIZE (avx);
|
||||
|
||||
diff --git a/sysdeps/x86_64/multiarch/strcasecmp_l-avx2-rtm.S b/sysdeps/x86_64/multiarch/strcasecmp_l-avx2-rtm.S
|
||||
new file mode 100644
|
||||
index 0000000000000000..09957fc3c543b40c
|
||||
--- /dev/null
|
||||
+++ b/sysdeps/x86_64/multiarch/strcasecmp_l-avx2-rtm.S
|
||||
@@ -0,0 +1,15 @@
|
||||
+#ifndef STRCMP
|
||||
+# define STRCMP __strcasecmp_l_avx2_rtm
|
||||
+#endif
|
||||
+
|
||||
+#define _GLABEL(x) x ## _rtm
|
||||
+#define GLABEL(x) _GLABEL(x)
|
||||
+
|
||||
+#define ZERO_UPPER_VEC_REGISTERS_RETURN \
|
||||
+ ZERO_UPPER_VEC_REGISTERS_RETURN_XTEST
|
||||
+
|
||||
+#define VZEROUPPER_RETURN jmp L(return_vzeroupper)
|
||||
+
|
||||
+#define SECTION(p) p##.avx.rtm
|
||||
+
|
||||
+#include "strcasecmp_l-avx2.S"
|
||||
diff --git a/sysdeps/x86_64/multiarch/strcasecmp_l-avx2.S b/sysdeps/x86_64/multiarch/strcasecmp_l-avx2.S
|
||||
new file mode 100644
|
||||
index 0000000000000000..e2762f2a222b2a65
|
||||
--- /dev/null
|
||||
+++ b/sysdeps/x86_64/multiarch/strcasecmp_l-avx2.S
|
||||
@@ -0,0 +1,23 @@
|
||||
+/* strcasecmp_l optimized with AVX2.
|
||||
+ Copyright (C) 2017-2022 Free Software Foundation, Inc.
|
||||
+ This file is part of the GNU C Library.
|
||||
+
|
||||
+ The GNU C Library is free software; you can redistribute it and/or
|
||||
+ modify it under the terms of the GNU Lesser General Public
|
||||
+ License as published by the Free Software Foundation; either
|
||||
+ version 2.1 of the License, or (at your option) any later version.
|
||||
+
|
||||
+ The GNU C Library is distributed in the hope that it will be useful,
|
||||
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
|
||||
+ Lesser General Public License for more details.
|
||||
+
|
||||
+ You should have received a copy of the GNU Lesser General Public
|
||||
+ License along with the GNU C Library; if not, see
|
||||
+ <https://www.gnu.org/licenses/>. */
|
||||
+
|
||||
+#ifndef STRCMP
|
||||
+# define STRCMP __strcasecmp_l_avx2
|
||||
+#endif
|
||||
+#define USE_AS_STRCASECMP_L
|
||||
+#include "strcmp-avx2.S"
|
||||
diff --git a/sysdeps/x86_64/multiarch/strcmp-avx2.S b/sysdeps/x86_64/multiarch/strcmp-avx2.S
|
||||
index 09a73942086f9c9f..aa91f6e48a0e1ce5 100644
|
||||
--- a/sysdeps/x86_64/multiarch/strcmp-avx2.S
|
||||
+++ b/sysdeps/x86_64/multiarch/strcmp-avx2.S
|
||||
@@ -20,6 +20,10 @@
|
||||
|
||||
# include <sysdep.h>
|
||||
|
||||
+# if defined USE_AS_STRCASECMP_L
|
||||
+# include "locale-defines.h"
|
||||
+# endif
|
||||
+
|
||||
# ifndef STRCMP
|
||||
# define STRCMP __strcmp_avx2
|
||||
# endif
|
||||
@@ -74,13 +78,88 @@
|
||||
# define VEC_OFFSET (-VEC_SIZE)
|
||||
# endif
|
||||
|
||||
+# ifdef USE_AS_STRCASECMP_L
|
||||
+# define BYTE_LOOP_REG OFFSET_REG
|
||||
+# else
|
||||
+# define BYTE_LOOP_REG ecx
|
||||
+# endif
|
||||
+
|
||||
+# ifdef USE_AS_STRCASECMP_L
|
||||
+# ifdef USE_AS_STRNCMP
|
||||
+# define STRCASECMP __strncasecmp_avx2
|
||||
+# define LOCALE_REG rcx
|
||||
+# define LOCALE_REG_LP RCX_LP
|
||||
+# define STRCASECMP_NONASCII __strncasecmp_l_nonascii
|
||||
+# else
|
||||
+# define STRCASECMP __strcasecmp_avx2
|
||||
+# define LOCALE_REG rdx
|
||||
+# define LOCALE_REG_LP RDX_LP
|
||||
+# define STRCASECMP_NONASCII __strcasecmp_l_nonascii
|
||||
+# endif
|
||||
+# endif
|
||||
+
|
||||
# define xmmZERO xmm15
|
||||
# define ymmZERO ymm15
|
||||
|
||||
+# define LCASE_MIN_ymm %ymm10
|
||||
+# define LCASE_MAX_ymm %ymm11
|
||||
+# define CASE_ADD_ymm %ymm12
|
||||
+
|
||||
+# define LCASE_MIN_xmm %xmm10
|
||||
+# define LCASE_MAX_xmm %xmm11
|
||||
+# define CASE_ADD_xmm %xmm12
|
||||
+
|
||||
+ /* r11 is never use elsewhere so this is safe to maintain. */
|
||||
+# define TOLOWER_BASE %r11
|
||||
+
|
||||
# ifndef SECTION
|
||||
# define SECTION(p) p##.avx
|
||||
# endif
|
||||
|
||||
+# ifdef USE_AS_STRCASECMP_L
|
||||
+# define REG(x, y) x ## y
|
||||
+# define TOLOWER(reg1_in, reg1_out, reg2_in, reg2_out, ext) \
|
||||
+ vpaddb REG(LCASE_MIN_, ext), reg1_in, REG(%ext, 8); \
|
||||
+ vpaddb REG(LCASE_MIN_, ext), reg2_in, REG(%ext, 9); \
|
||||
+ vpcmpgtb REG(LCASE_MAX_, ext), REG(%ext, 8), REG(%ext, 8); \
|
||||
+ vpcmpgtb REG(LCASE_MAX_, ext), REG(%ext, 9), REG(%ext, 9); \
|
||||
+ vpandn REG(CASE_ADD_, ext), REG(%ext, 8), REG(%ext, 8); \
|
||||
+ vpandn REG(CASE_ADD_, ext), REG(%ext, 9), REG(%ext, 9); \
|
||||
+ vpaddb REG(%ext, 8), reg1_in, reg1_out; \
|
||||
+ vpaddb REG(%ext, 9), reg2_in, reg2_out
|
||||
+
|
||||
+# define TOLOWER_gpr(src, dst) movl (TOLOWER_BASE, src, 4), dst
|
||||
+# define TOLOWER_ymm(...) TOLOWER(__VA_ARGS__, ymm)
|
||||
+# define TOLOWER_xmm(...) TOLOWER(__VA_ARGS__, xmm)
|
||||
+
|
||||
+# define CMP_R1_R2(s1_reg, s2_reg, scratch_reg, reg_out, ext) \
|
||||
+ TOLOWER (s1_reg, scratch_reg, s2_reg, s2_reg, ext); \
|
||||
+ VPCMPEQ scratch_reg, s2_reg, reg_out
|
||||
+
|
||||
+# define CMP_R1_S2(s1_reg, s2_mem, scratch_reg, reg_out, ext) \
|
||||
+ VMOVU s2_mem, reg_out; \
|
||||
+ CMP_R1_R2(s1_reg, reg_out, scratch_reg, reg_out, ext)
|
||||
+
|
||||
+# define CMP_R1_R2_ymm(...) CMP_R1_R2(__VA_ARGS__, ymm)
|
||||
+# define CMP_R1_R2_xmm(...) CMP_R1_R2(__VA_ARGS__, xmm)
|
||||
+
|
||||
+# define CMP_R1_S2_ymm(...) CMP_R1_S2(__VA_ARGS__, ymm)
|
||||
+# define CMP_R1_S2_xmm(...) CMP_R1_S2(__VA_ARGS__, xmm)
|
||||
+
|
||||
+# else
|
||||
+# define TOLOWER_gpr(...)
|
||||
+# define TOLOWER_ymm(...)
|
||||
+# define TOLOWER_xmm(...)
|
||||
+
|
||||
+# define CMP_R1_R2_ymm(s1_reg, s2_reg, scratch_reg, reg_out) \
|
||||
+ VPCMPEQ s2_reg, s1_reg, reg_out
|
||||
+
|
||||
+# define CMP_R1_R2_xmm(...) CMP_R1_R2_ymm(__VA_ARGS__)
|
||||
+
|
||||
+# define CMP_R1_S2_ymm(...) CMP_R1_R2_ymm(__VA_ARGS__)
|
||||
+# define CMP_R1_S2_xmm(...) CMP_R1_R2_xmm(__VA_ARGS__)
|
||||
+# endif
|
||||
+
|
||||
/* Warning!
|
||||
wcscmp/wcsncmp have to use SIGNED comparison for elements.
|
||||
strcmp/strncmp have to use UNSIGNED comparison for elements.
|
||||
@@ -102,8 +181,49 @@
|
||||
returned. */
|
||||
|
||||
.section SECTION(.text), "ax", @progbits
|
||||
-ENTRY(STRCMP)
|
||||
+ .align 16
|
||||
+ .type STRCMP, @function
|
||||
+ .globl STRCMP
|
||||
+ .hidden STRCMP
|
||||
+
|
||||
+# ifndef GLABEL
|
||||
+# define GLABEL(...) __VA_ARGS__
|
||||
+# endif
|
||||
+
|
||||
+# ifdef USE_AS_STRCASECMP_L
|
||||
+ENTRY (GLABEL(STRCASECMP))
|
||||
+ movq __libc_tsd_LOCALE@gottpoff(%rip), %rax
|
||||
+ mov %fs:(%rax), %LOCALE_REG_LP
|
||||
+
|
||||
+ /* Either 1 or 5 bytes (dependeing if CET is enabled). */
|
||||
+ .p2align 4
|
||||
+END (GLABEL(STRCASECMP))
|
||||
+ /* FALLTHROUGH to strcasecmp/strncasecmp_l. */
|
||||
+# endif
|
||||
+
|
||||
+ .p2align 4
|
||||
+STRCMP:
|
||||
+ cfi_startproc
|
||||
+ _CET_ENDBR
|
||||
+ CALL_MCOUNT
|
||||
+
|
||||
+# if defined USE_AS_STRCASECMP_L
|
||||
+ /* We have to fall back on the C implementation for locales with
|
||||
+ encodings not matching ASCII for single bytes. */
|
||||
+# if LOCALE_T___LOCALES != 0 || LC_CTYPE != 0
|
||||
+ mov LOCALE_T___LOCALES + LC_CTYPE * LP_SIZE(%LOCALE_REG), %RAX_LP
|
||||
+# else
|
||||
+ mov (%LOCALE_REG), %RAX_LP
|
||||
+# endif
|
||||
+ testl $1, LOCALE_DATA_VALUES + _NL_CTYPE_NONASCII_CASE * SIZEOF_VALUES(%rax)
|
||||
+ jne STRCASECMP_NONASCII
|
||||
+ leaq _nl_C_LC_CTYPE_tolower + 128 * 4(%rip), TOLOWER_BASE
|
||||
+# endif
|
||||
+
|
||||
# ifdef USE_AS_STRNCMP
|
||||
+ /* Don't overwrite LOCALE_REG (rcx) until we have pass
|
||||
+ L(one_or_less). Otherwise we might use the wrong locale in
|
||||
+ the OVERFLOW_STRCMP (strcasecmp_l). */
|
||||
# ifdef __ILP32__
|
||||
/* Clear the upper 32 bits. */
|
||||
movl %edx, %edx
|
||||
@@ -128,6 +248,30 @@ ENTRY(STRCMP)
|
||||
# endif
|
||||
# endif
|
||||
vpxor %xmmZERO, %xmmZERO, %xmmZERO
|
||||
+# if defined USE_AS_STRCASECMP_L
|
||||
+ .section .rodata.cst32, "aM", @progbits, 32
|
||||
+ .align 32
|
||||
+L(lcase_min):
|
||||
+ .quad 0x3f3f3f3f3f3f3f3f
|
||||
+ .quad 0x3f3f3f3f3f3f3f3f
|
||||
+ .quad 0x3f3f3f3f3f3f3f3f
|
||||
+ .quad 0x3f3f3f3f3f3f3f3f
|
||||
+L(lcase_max):
|
||||
+ .quad 0x9999999999999999
|
||||
+ .quad 0x9999999999999999
|
||||
+ .quad 0x9999999999999999
|
||||
+ .quad 0x9999999999999999
|
||||
+L(case_add):
|
||||
+ .quad 0x2020202020202020
|
||||
+ .quad 0x2020202020202020
|
||||
+ .quad 0x2020202020202020
|
||||
+ .quad 0x2020202020202020
|
||||
+ .previous
|
||||
+
|
||||
+ vmovdqa L(lcase_min)(%rip), LCASE_MIN_ymm
|
||||
+ vmovdqa L(lcase_max)(%rip), LCASE_MAX_ymm
|
||||
+ vmovdqa L(case_add)(%rip), CASE_ADD_ymm
|
||||
+# endif
|
||||
movl %edi, %eax
|
||||
orl %esi, %eax
|
||||
sall $20, %eax
|
||||
@@ -138,8 +282,10 @@ ENTRY(STRCMP)
|
||||
L(no_page_cross):
|
||||
/* Safe to compare 4x vectors. */
|
||||
VMOVU (%rdi), %ymm0
|
||||
- /* 1s where s1 and s2 equal. */
|
||||
- VPCMPEQ (%rsi), %ymm0, %ymm1
|
||||
+ /* 1s where s1 and s2 equal. Just VPCMPEQ if its not strcasecmp.
|
||||
+ Otherwise converts ymm0 and load from rsi to lower. ymm2 is
|
||||
+ scratch and ymm1 is the return. */
|
||||
+ CMP_R1_S2_ymm (%ymm0, (%rsi), %ymm2, %ymm1)
|
||||
/* 1s at null CHAR. */
|
||||
VPCMPEQ %ymm0, %ymmZERO, %ymm2
|
||||
/* 1s where s1 and s2 equal AND not null CHAR. */
|
||||
@@ -172,6 +318,8 @@ L(return_vec_0):
|
||||
# else
|
||||
movzbl (%rdi, %rcx), %eax
|
||||
movzbl (%rsi, %rcx), %ecx
|
||||
+ TOLOWER_gpr (%rax, %eax)
|
||||
+ TOLOWER_gpr (%rcx, %ecx)
|
||||
subl %ecx, %eax
|
||||
# endif
|
||||
L(ret0):
|
||||
@@ -192,6 +340,10 @@ L(ret_zero):
|
||||
|
||||
.p2align 4,, 5
|
||||
L(one_or_less):
|
||||
+# ifdef USE_AS_STRCASECMP_L
|
||||
+ /* Set locale argument for strcasecmp. */
|
||||
+ movq %LOCALE_REG, %rdx
|
||||
+# endif
|
||||
jb L(ret_zero)
|
||||
# ifdef USE_AS_WCSCMP
|
||||
/* 'nbe' covers the case where length is negative (large
|
||||
@@ -211,6 +363,8 @@ L(one_or_less):
|
||||
jnbe __strcmp_avx2
|
||||
movzbl (%rdi), %eax
|
||||
movzbl (%rsi), %ecx
|
||||
+ TOLOWER_gpr (%rax, %eax)
|
||||
+ TOLOWER_gpr (%rcx, %ecx)
|
||||
subl %ecx, %eax
|
||||
# endif
|
||||
L(ret1):
|
||||
@@ -238,6 +392,8 @@ L(return_vec_1):
|
||||
# else
|
||||
movzbl VEC_SIZE(%rdi, %rcx), %eax
|
||||
movzbl VEC_SIZE(%rsi, %rcx), %ecx
|
||||
+ TOLOWER_gpr (%rax, %eax)
|
||||
+ TOLOWER_gpr (%rcx, %ecx)
|
||||
subl %ecx, %eax
|
||||
# endif
|
||||
L(ret2):
|
||||
@@ -269,6 +425,8 @@ L(return_vec_2):
|
||||
# else
|
||||
movzbl (VEC_SIZE * 2)(%rdi, %rcx), %eax
|
||||
movzbl (VEC_SIZE * 2)(%rsi, %rcx), %ecx
|
||||
+ TOLOWER_gpr (%rax, %eax)
|
||||
+ TOLOWER_gpr (%rcx, %ecx)
|
||||
subl %ecx, %eax
|
||||
# endif
|
||||
L(ret3):
|
||||
@@ -289,6 +447,8 @@ L(return_vec_3):
|
||||
# else
|
||||
movzbl (VEC_SIZE * 3)(%rdi, %rcx), %eax
|
||||
movzbl (VEC_SIZE * 3)(%rsi, %rcx), %ecx
|
||||
+ TOLOWER_gpr (%rax, %eax)
|
||||
+ TOLOWER_gpr (%rcx, %ecx)
|
||||
subl %ecx, %eax
|
||||
# endif
|
||||
L(ret4):
|
||||
@@ -299,7 +459,7 @@ L(ret4):
|
||||
L(more_3x_vec):
|
||||
/* Safe to compare 4x vectors. */
|
||||
VMOVU VEC_SIZE(%rdi), %ymm0
|
||||
- VPCMPEQ VEC_SIZE(%rsi), %ymm0, %ymm1
|
||||
+ CMP_R1_S2_ymm (%ymm0, VEC_SIZE(%rsi), %ymm2, %ymm1)
|
||||
VPCMPEQ %ymm0, %ymmZERO, %ymm2
|
||||
vpandn %ymm1, %ymm2, %ymm1
|
||||
vpmovmskb %ymm1, %ecx
|
||||
@@ -312,7 +472,7 @@ L(more_3x_vec):
|
||||
# endif
|
||||
|
||||
VMOVU (VEC_SIZE * 2)(%rdi), %ymm0
|
||||
- VPCMPEQ (VEC_SIZE * 2)(%rsi), %ymm0, %ymm1
|
||||
+ CMP_R1_S2_ymm (%ymm0, (VEC_SIZE * 2)(%rsi), %ymm2, %ymm1)
|
||||
VPCMPEQ %ymm0, %ymmZERO, %ymm2
|
||||
vpandn %ymm1, %ymm2, %ymm1
|
||||
vpmovmskb %ymm1, %ecx
|
||||
@@ -320,7 +480,7 @@ L(more_3x_vec):
|
||||
jnz L(return_vec_2)
|
||||
|
||||
VMOVU (VEC_SIZE * 3)(%rdi), %ymm0
|
||||
- VPCMPEQ (VEC_SIZE * 3)(%rsi), %ymm0, %ymm1
|
||||
+ CMP_R1_S2_ymm (%ymm0, (VEC_SIZE * 3)(%rsi), %ymm2, %ymm1)
|
||||
VPCMPEQ %ymm0, %ymmZERO, %ymm2
|
||||
vpandn %ymm1, %ymm2, %ymm1
|
||||
vpmovmskb %ymm1, %ecx
|
||||
@@ -395,12 +555,10 @@ L(loop_skip_page_cross_check):
|
||||
VMOVA (VEC_SIZE * 3)(%rdi), %ymm6
|
||||
|
||||
/* ymm1 all 1s where s1 and s2 equal. All 0s otherwise. */
|
||||
- VPCMPEQ (VEC_SIZE * 0)(%rsi), %ymm0, %ymm1
|
||||
-
|
||||
- VPCMPEQ (VEC_SIZE * 1)(%rsi), %ymm2, %ymm3
|
||||
- VPCMPEQ (VEC_SIZE * 2)(%rsi), %ymm4, %ymm5
|
||||
- VPCMPEQ (VEC_SIZE * 3)(%rsi), %ymm6, %ymm7
|
||||
-
|
||||
+ CMP_R1_S2_ymm (%ymm0, (VEC_SIZE * 0)(%rsi), %ymm3, %ymm1)
|
||||
+ CMP_R1_S2_ymm (%ymm2, (VEC_SIZE * 1)(%rsi), %ymm5, %ymm3)
|
||||
+ CMP_R1_S2_ymm (%ymm4, (VEC_SIZE * 2)(%rsi), %ymm7, %ymm5)
|
||||
+ CMP_R1_S2_ymm (%ymm6, (VEC_SIZE * 3)(%rsi), %ymm13, %ymm7)
|
||||
|
||||
/* If any mismatches or null CHAR then 0 CHAR, otherwise non-
|
||||
zero. */
|
||||
@@ -469,6 +627,8 @@ L(return_vec_2_3_end):
|
||||
# else
|
||||
movzbl (VEC_SIZE * 2 - VEC_OFFSET)(%rdi, %LOOP_REG64), %eax
|
||||
movzbl (VEC_SIZE * 2 - VEC_OFFSET)(%rsi, %LOOP_REG64), %ecx
|
||||
+ TOLOWER_gpr (%rax, %eax)
|
||||
+ TOLOWER_gpr (%rcx, %ecx)
|
||||
subl %ecx, %eax
|
||||
xorl %r8d, %eax
|
||||
subl %r8d, %eax
|
||||
@@ -512,6 +672,8 @@ L(return_vec_0_end):
|
||||
# else
|
||||
movzbl (%rdi, %rcx), %eax
|
||||
movzbl (%rsi, %rcx), %ecx
|
||||
+ TOLOWER_gpr (%rax, %eax)
|
||||
+ TOLOWER_gpr (%rcx, %ecx)
|
||||
subl %ecx, %eax
|
||||
xorl %r8d, %eax
|
||||
subl %r8d, %eax
|
||||
@@ -534,6 +696,8 @@ L(return_vec_1_end):
|
||||
# else
|
||||
movzbl VEC_SIZE(%rdi, %rcx), %eax
|
||||
movzbl VEC_SIZE(%rsi, %rcx), %ecx
|
||||
+ TOLOWER_gpr (%rax, %eax)
|
||||
+ TOLOWER_gpr (%rcx, %ecx)
|
||||
subl %ecx, %eax
|
||||
xorl %r8d, %eax
|
||||
subl %r8d, %eax
|
||||
@@ -560,6 +724,8 @@ L(return_vec_2_end):
|
||||
# else
|
||||
movzbl (VEC_SIZE * 2)(%rdi, %rcx), %eax
|
||||
movzbl (VEC_SIZE * 2)(%rsi, %rcx), %ecx
|
||||
+ TOLOWER_gpr (%rax, %eax)
|
||||
+ TOLOWER_gpr (%rcx, %ecx)
|
||||
subl %ecx, %eax
|
||||
xorl %r8d, %eax
|
||||
subl %r8d, %eax
|
||||
@@ -587,7 +753,7 @@ L(page_cross_during_loop):
|
||||
jle L(less_1x_vec_till_page_cross)
|
||||
|
||||
VMOVA (%rdi), %ymm0
|
||||
- VPCMPEQ (%rsi), %ymm0, %ymm1
|
||||
+ CMP_R1_S2_ymm (%ymm0, (%rsi), %ymm2, %ymm1)
|
||||
VPCMPEQ %ymm0, %ymmZERO, %ymm2
|
||||
vpandn %ymm1, %ymm2, %ymm1
|
||||
vpmovmskb %ymm1, %ecx
|
||||
@@ -609,7 +775,7 @@ L(less_1x_vec_till_page_cross):
|
||||
here, it means the previous page (rdi - VEC_SIZE) has already
|
||||
been loaded earlier so must be valid. */
|
||||
VMOVU -VEC_SIZE(%rdi, %rax), %ymm0
|
||||
- VPCMPEQ -VEC_SIZE(%rsi, %rax), %ymm0, %ymm1
|
||||
+ CMP_R1_S2_ymm (%ymm0, -VEC_SIZE(%rsi, %rax), %ymm2, %ymm1)
|
||||
VPCMPEQ %ymm0, %ymmZERO, %ymm2
|
||||
vpandn %ymm1, %ymm2, %ymm1
|
||||
vpmovmskb %ymm1, %ecx
|
||||
@@ -651,6 +817,8 @@ L(return_page_cross_cmp_mem):
|
||||
# else
|
||||
movzbl VEC_OFFSET(%rdi, %rcx), %eax
|
||||
movzbl VEC_OFFSET(%rsi, %rcx), %ecx
|
||||
+ TOLOWER_gpr (%rax, %eax)
|
||||
+ TOLOWER_gpr (%rcx, %ecx)
|
||||
subl %ecx, %eax
|
||||
xorl %r8d, %eax
|
||||
subl %r8d, %eax
|
||||
@@ -677,7 +845,7 @@ L(more_2x_vec_till_page_cross):
|
||||
iteration here. */
|
||||
|
||||
VMOVU VEC_SIZE(%rdi), %ymm0
|
||||
- VPCMPEQ VEC_SIZE(%rsi), %ymm0, %ymm1
|
||||
+ CMP_R1_S2_ymm (%ymm0, VEC_SIZE(%rsi), %ymm2, %ymm1)
|
||||
VPCMPEQ %ymm0, %ymmZERO, %ymm2
|
||||
vpandn %ymm1, %ymm2, %ymm1
|
||||
vpmovmskb %ymm1, %ecx
|
||||
@@ -693,7 +861,7 @@ L(more_2x_vec_till_page_cross):
|
||||
|
||||
/* Safe to include comparisons from lower bytes. */
|
||||
VMOVU -(VEC_SIZE * 2)(%rdi, %rax), %ymm0
|
||||
- VPCMPEQ -(VEC_SIZE * 2)(%rsi, %rax), %ymm0, %ymm1
|
||||
+ CMP_R1_S2_ymm (%ymm0, -(VEC_SIZE * 2)(%rsi, %rax), %ymm2, %ymm1)
|
||||
VPCMPEQ %ymm0, %ymmZERO, %ymm2
|
||||
vpandn %ymm1, %ymm2, %ymm1
|
||||
vpmovmskb %ymm1, %ecx
|
||||
@@ -701,7 +869,7 @@ L(more_2x_vec_till_page_cross):
|
||||
jnz L(return_vec_page_cross_0)
|
||||
|
||||
VMOVU -(VEC_SIZE * 1)(%rdi, %rax), %ymm0
|
||||
- VPCMPEQ -(VEC_SIZE * 1)(%rsi, %rax), %ymm0, %ymm1
|
||||
+ CMP_R1_S2_ymm (%ymm0, -(VEC_SIZE * 1)(%rsi, %rax), %ymm2, %ymm1)
|
||||
VPCMPEQ %ymm0, %ymmZERO, %ymm2
|
||||
vpandn %ymm1, %ymm2, %ymm1
|
||||
vpmovmskb %ymm1, %ecx
|
||||
@@ -719,8 +887,8 @@ L(more_2x_vec_till_page_cross):
|
||||
VMOVA (VEC_SIZE * 2)(%rdi), %ymm4
|
||||
VMOVA (VEC_SIZE * 3)(%rdi), %ymm6
|
||||
|
||||
- VPCMPEQ (VEC_SIZE * 2)(%rsi), %ymm4, %ymm5
|
||||
- VPCMPEQ (VEC_SIZE * 3)(%rsi), %ymm6, %ymm7
|
||||
+ CMP_R1_S2_ymm (%ymm4, (VEC_SIZE * 2)(%rsi), %ymm7, %ymm5)
|
||||
+ CMP_R1_S2_ymm (%ymm6, (VEC_SIZE * 3)(%rsi), %ymm13, %ymm7)
|
||||
vpand %ymm4, %ymm5, %ymm5
|
||||
vpand %ymm6, %ymm7, %ymm7
|
||||
VPMINU %ymm5, %ymm7, %ymm7
|
||||
@@ -771,6 +939,8 @@ L(return_vec_page_cross_1):
|
||||
# else
|
||||
movzbl VEC_OFFSET(%rdi, %rcx), %eax
|
||||
movzbl VEC_OFFSET(%rsi, %rcx), %ecx
|
||||
+ TOLOWER_gpr (%rax, %eax)
|
||||
+ TOLOWER_gpr (%rcx, %ecx)
|
||||
subl %ecx, %eax
|
||||
xorl %r8d, %eax
|
||||
subl %r8d, %eax
|
||||
@@ -826,7 +996,7 @@ L(page_cross):
|
||||
L(page_cross_loop):
|
||||
|
||||
VMOVU (%rdi, %OFFSET_REG64), %ymm0
|
||||
- VPCMPEQ (%rsi, %OFFSET_REG64), %ymm0, %ymm1
|
||||
+ CMP_R1_S2_ymm (%ymm0, (%rsi, %OFFSET_REG64), %ymm2, %ymm1)
|
||||
VPCMPEQ %ymm0, %ymmZERO, %ymm2
|
||||
vpandn %ymm1, %ymm2, %ymm1
|
||||
vpmovmskb %ymm1, %ecx
|
||||
@@ -844,11 +1014,11 @@ L(page_cross_loop):
|
||||
subl %eax, %OFFSET_REG
|
||||
/* OFFSET_REG has distance to page cross - VEC_SIZE. Guranteed
|
||||
to not cross page so is safe to load. Since we have already
|
||||
- loaded at least 1 VEC from rsi it is also guranteed to be safe.
|
||||
- */
|
||||
+ loaded at least 1 VEC from rsi it is also guranteed to be
|
||||
+ safe. */
|
||||
|
||||
VMOVU (%rdi, %OFFSET_REG64), %ymm0
|
||||
- VPCMPEQ (%rsi, %OFFSET_REG64), %ymm0, %ymm1
|
||||
+ CMP_R1_S2_ymm (%ymm0, (%rsi, %OFFSET_REG64), %ymm2, %ymm1)
|
||||
VPCMPEQ %ymm0, %ymmZERO, %ymm2
|
||||
vpandn %ymm1, %ymm2, %ymm1
|
||||
vpmovmskb %ymm1, %ecx
|
||||
@@ -881,6 +1051,8 @@ L(ret_vec_page_cross_cont):
|
||||
# else
|
||||
movzbl (%rdi, %rcx), %eax
|
||||
movzbl (%rsi, %rcx), %ecx
|
||||
+ TOLOWER_gpr (%rax, %eax)
|
||||
+ TOLOWER_gpr (%rcx, %ecx)
|
||||
subl %ecx, %eax
|
||||
xorl %r8d, %eax
|
||||
subl %r8d, %eax
|
||||
@@ -934,7 +1106,7 @@ L(less_1x_vec_till_page):
|
||||
ja L(less_16_till_page)
|
||||
|
||||
VMOVU (%rdi), %xmm0
|
||||
- VPCMPEQ (%rsi), %xmm0, %xmm1
|
||||
+ CMP_R1_S2_xmm (%xmm0, (%rsi), %xmm2, %xmm1)
|
||||
VPCMPEQ %xmm0, %xmmZERO, %xmm2
|
||||
vpandn %xmm1, %xmm2, %xmm1
|
||||
vpmovmskb %ymm1, %ecx
|
||||
@@ -952,7 +1124,7 @@ L(less_1x_vec_till_page):
|
||||
# endif
|
||||
|
||||
VMOVU (%rdi, %OFFSET_REG64), %xmm0
|
||||
- VPCMPEQ (%rsi, %OFFSET_REG64), %xmm0, %xmm1
|
||||
+ CMP_R1_S2_xmm (%xmm0, (%rsi, %OFFSET_REG64), %xmm2, %xmm1)
|
||||
VPCMPEQ %xmm0, %xmmZERO, %xmm2
|
||||
vpandn %xmm1, %xmm2, %xmm1
|
||||
vpmovmskb %ymm1, %ecx
|
||||
@@ -990,7 +1162,7 @@ L(less_16_till_page):
|
||||
vmovq (%rdi), %xmm0
|
||||
vmovq (%rsi), %xmm1
|
||||
VPCMPEQ %xmm0, %xmmZERO, %xmm2
|
||||
- VPCMPEQ %xmm1, %xmm0, %xmm1
|
||||
+ CMP_R1_R2_xmm (%xmm0, %xmm1, %xmm3, %xmm1)
|
||||
vpandn %xmm1, %xmm2, %xmm1
|
||||
vpmovmskb %ymm1, %ecx
|
||||
incb %cl
|
||||
@@ -1010,7 +1182,7 @@ L(less_16_till_page):
|
||||
vmovq (%rdi, %OFFSET_REG64), %xmm0
|
||||
vmovq (%rsi, %OFFSET_REG64), %xmm1
|
||||
VPCMPEQ %xmm0, %xmmZERO, %xmm2
|
||||
- VPCMPEQ %xmm1, %xmm0, %xmm1
|
||||
+ CMP_R1_R2_xmm (%xmm0, %xmm1, %xmm3, %xmm1)
|
||||
vpandn %xmm1, %xmm2, %xmm1
|
||||
vpmovmskb %ymm1, %ecx
|
||||
incb %cl
|
||||
@@ -1066,7 +1238,7 @@ L(ret_less_8_wcs):
|
||||
vmovd (%rdi), %xmm0
|
||||
vmovd (%rsi), %xmm1
|
||||
VPCMPEQ %xmm0, %xmmZERO, %xmm2
|
||||
- VPCMPEQ %xmm1, %xmm0, %xmm1
|
||||
+ CMP_R1_R2_xmm (%xmm0, %xmm1, %xmm3, %xmm1)
|
||||
vpandn %xmm1, %xmm2, %xmm1
|
||||
vpmovmskb %ymm1, %ecx
|
||||
subl $0xf, %ecx
|
||||
@@ -1085,7 +1257,7 @@ L(ret_less_8_wcs):
|
||||
vmovd (%rdi, %OFFSET_REG64), %xmm0
|
||||
vmovd (%rsi, %OFFSET_REG64), %xmm1
|
||||
VPCMPEQ %xmm0, %xmmZERO, %xmm2
|
||||
- VPCMPEQ %xmm1, %xmm0, %xmm1
|
||||
+ CMP_R1_R2_xmm (%xmm0, %xmm1, %xmm3, %xmm1)
|
||||
vpandn %xmm1, %xmm2, %xmm1
|
||||
vpmovmskb %ymm1, %ecx
|
||||
subl $0xf, %ecx
|
||||
@@ -1119,7 +1291,9 @@ L(less_4_till_page):
|
||||
L(less_4_loop):
|
||||
movzbl (%rdi), %eax
|
||||
movzbl (%rsi, %rdi), %ecx
|
||||
- subl %ecx, %eax
|
||||
+ TOLOWER_gpr (%rax, %eax)
|
||||
+ TOLOWER_gpr (%rcx, %BYTE_LOOP_REG)
|
||||
+ subl %BYTE_LOOP_REG, %eax
|
||||
jnz L(ret_less_4_loop)
|
||||
testl %ecx, %ecx
|
||||
jz L(ret_zero_4_loop)
|
||||
@@ -1146,5 +1320,6 @@ L(ret_less_4_loop):
|
||||
subl %r8d, %eax
|
||||
ret
|
||||
# endif
|
||||
-END(STRCMP)
|
||||
+ cfi_endproc
|
||||
+ .size STRCMP, .-STRCMP
|
||||
#endif
|
||||
diff --git a/sysdeps/x86_64/multiarch/strncase_l-avx2-rtm.S b/sysdeps/x86_64/multiarch/strncase_l-avx2-rtm.S
|
||||
new file mode 100644
|
||||
index 0000000000000000..58c05dcfb8643791
|
||||
--- /dev/null
|
||||
+++ b/sysdeps/x86_64/multiarch/strncase_l-avx2-rtm.S
|
||||
@@ -0,0 +1,16 @@
|
||||
+#ifndef STRCMP
|
||||
+# define STRCMP __strncasecmp_l_avx2_rtm
|
||||
+#endif
|
||||
+
|
||||
+#define _GLABEL(x) x ## _rtm
|
||||
+#define GLABEL(x) _GLABEL(x)
|
||||
+
|
||||
+#define ZERO_UPPER_VEC_REGISTERS_RETURN \
|
||||
+ ZERO_UPPER_VEC_REGISTERS_RETURN_XTEST
|
||||
+
|
||||
+#define VZEROUPPER_RETURN jmp L(return_vzeroupper)
|
||||
+
|
||||
+#define SECTION(p) p##.avx.rtm
|
||||
+#define OVERFLOW_STRCMP __strcasecmp_l_avx2_rtm
|
||||
+
|
||||
+#include "strncase_l-avx2.S"
|
||||
diff --git a/sysdeps/x86_64/multiarch/strncase_l-avx2.S b/sysdeps/x86_64/multiarch/strncase_l-avx2.S
|
||||
new file mode 100644
|
||||
index 0000000000000000..48c0aa21f84ad32c
|
||||
--- /dev/null
|
||||
+++ b/sysdeps/x86_64/multiarch/strncase_l-avx2.S
|
||||
@@ -0,0 +1,27 @@
|
||||
+/* strncasecmp_l optimized with AVX2.
|
||||
+ Copyright (C) 2017-2022 Free Software Foundation, Inc.
|
||||
+ This file is part of the GNU C Library.
|
||||
+
|
||||
+ The GNU C Library is free software; you can redistribute it and/or
|
||||
+ modify it under the terms of the GNU Lesser General Public
|
||||
+ License as published by the Free Software Foundation; either
|
||||
+ version 2.1 of the License, or (at your option) any later version.
|
||||
+
|
||||
+ The GNU C Library is distributed in the hope that it will be useful,
|
||||
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
|
||||
+ Lesser General Public License for more details.
|
||||
+
|
||||
+ You should have received a copy of the GNU Lesser General Public
|
||||
+ License along with the GNU C Library; if not, see
|
||||
+ <https://www.gnu.org/licenses/>. */
|
||||
+
|
||||
+#ifndef STRCMP
|
||||
+# define STRCMP __strncasecmp_l_avx2
|
||||
+#endif
|
||||
+#define USE_AS_STRCASECMP_L
|
||||
+#define USE_AS_STRNCMP
|
||||
+#ifndef OVERFLOW_STRCMP
|
||||
+# define OVERFLOW_STRCMP __strcasecmp_l_avx2
|
||||
+#endif
|
||||
+#include "strcmp-avx2.S"
|
803
glibc-upstream-2.34-228.patch
Normal file
803
glibc-upstream-2.34-228.patch
Normal file
@ -0,0 +1,803 @@
|
||||
commit b13a2e68eb3b84f2a7b587132ec2ea813815febf
|
||||
Author: Noah Goldstein <goldstein.w.n@gmail.com>
|
||||
Date: Thu Mar 24 18:56:13 2022 -0500
|
||||
|
||||
x86: Add EVEX optimized str{n}casecmp
|
||||
|
||||
geometric_mean(N=40) of all benchmarks EVEX / SSE42: .621
|
||||
|
||||
All string/memory tests pass.
|
||||
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
|
||||
|
||||
(cherry picked from commit 84e7c46df4086873eae28a1fb87d2cf5388b1e16)
|
||||
|
||||
diff --git a/sysdeps/x86_64/multiarch/Makefile b/sysdeps/x86_64/multiarch/Makefile
|
||||
index 711ecf2ee45d61b9..359712c1491a2431 100644
|
||||
--- a/sysdeps/x86_64/multiarch/Makefile
|
||||
+++ b/sysdeps/x86_64/multiarch/Makefile
|
||||
@@ -53,6 +53,7 @@ sysdep_routines += \
|
||||
strcasecmp_l-avx \
|
||||
strcasecmp_l-avx2 \
|
||||
strcasecmp_l-avx2-rtm \
|
||||
+ strcasecmp_l-evex \
|
||||
strcasecmp_l-sse2 \
|
||||
strcasecmp_l-sse4_2 \
|
||||
strcasecmp_l-ssse3 \
|
||||
@@ -93,6 +94,7 @@ sysdep_routines += \
|
||||
strncase_l-avx \
|
||||
strncase_l-avx2 \
|
||||
strncase_l-avx2-rtm \
|
||||
+ strncase_l-evex \
|
||||
strncase_l-sse2 \
|
||||
strncase_l-sse4_2 \
|
||||
strncase_l-ssse3 \
|
||||
diff --git a/sysdeps/x86_64/multiarch/ifunc-impl-list.c b/sysdeps/x86_64/multiarch/ifunc-impl-list.c
|
||||
index a687b387c91aa9ae..f6994e5406933d53 100644
|
||||
--- a/sysdeps/x86_64/multiarch/ifunc-impl-list.c
|
||||
+++ b/sysdeps/x86_64/multiarch/ifunc-impl-list.c
|
||||
@@ -418,6 +418,10 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array,
|
||||
|
||||
/* Support sysdeps/x86_64/multiarch/strcasecmp_l.c. */
|
||||
IFUNC_IMPL (i, name, strcasecmp,
|
||||
+ IFUNC_IMPL_ADD (array, i, strcasecmp,
|
||||
+ (CPU_FEATURE_USABLE (AVX512VL)
|
||||
+ && CPU_FEATURE_USABLE (AVX512BW)),
|
||||
+ __strcasecmp_evex)
|
||||
IFUNC_IMPL_ADD (array, i, strcasecmp,
|
||||
CPU_FEATURE_USABLE (AVX2),
|
||||
__strcasecmp_avx2)
|
||||
@@ -438,6 +442,10 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array,
|
||||
|
||||
/* Support sysdeps/x86_64/multiarch/strcasecmp_l.c. */
|
||||
IFUNC_IMPL (i, name, strcasecmp_l,
|
||||
+ IFUNC_IMPL_ADD (array, i, strcasecmp,
|
||||
+ (CPU_FEATURE_USABLE (AVX512VL)
|
||||
+ && CPU_FEATURE_USABLE (AVX512BW)),
|
||||
+ __strcasecmp_l_evex)
|
||||
IFUNC_IMPL_ADD (array, i, strcasecmp,
|
||||
CPU_FEATURE_USABLE (AVX2),
|
||||
__strcasecmp_l_avx2)
|
||||
@@ -572,6 +580,10 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array,
|
||||
|
||||
/* Support sysdeps/x86_64/multiarch/strncase_l.c. */
|
||||
IFUNC_IMPL (i, name, strncasecmp,
|
||||
+ IFUNC_IMPL_ADD (array, i, strncasecmp,
|
||||
+ (CPU_FEATURE_USABLE (AVX512VL)
|
||||
+ && CPU_FEATURE_USABLE (AVX512BW)),
|
||||
+ __strncasecmp_evex)
|
||||
IFUNC_IMPL_ADD (array, i, strncasecmp,
|
||||
CPU_FEATURE_USABLE (AVX2),
|
||||
__strncasecmp_avx2)
|
||||
@@ -593,6 +605,10 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array,
|
||||
|
||||
/* Support sysdeps/x86_64/multiarch/strncase_l.c. */
|
||||
IFUNC_IMPL (i, name, strncasecmp_l,
|
||||
+ IFUNC_IMPL_ADD (array, i, strncasecmp,
|
||||
+ (CPU_FEATURE_USABLE (AVX512VL)
|
||||
+ && CPU_FEATURE_USABLE (AVX512BW)),
|
||||
+ __strncasecmp_l_evex)
|
||||
IFUNC_IMPL_ADD (array, i, strncasecmp,
|
||||
CPU_FEATURE_USABLE (AVX2),
|
||||
__strncasecmp_l_avx2)
|
||||
diff --git a/sysdeps/x86_64/multiarch/ifunc-strcasecmp.h b/sysdeps/x86_64/multiarch/ifunc-strcasecmp.h
|
||||
index 64d0cd6ef25f73c0..488e99e4997f379b 100644
|
||||
--- a/sysdeps/x86_64/multiarch/ifunc-strcasecmp.h
|
||||
+++ b/sysdeps/x86_64/multiarch/ifunc-strcasecmp.h
|
||||
@@ -25,6 +25,7 @@ extern __typeof (REDIRECT_NAME) OPTIMIZE (sse42) attribute_hidden;
|
||||
extern __typeof (REDIRECT_NAME) OPTIMIZE (avx) attribute_hidden;
|
||||
extern __typeof (REDIRECT_NAME) OPTIMIZE (avx2) attribute_hidden;
|
||||
extern __typeof (REDIRECT_NAME) OPTIMIZE (avx2_rtm) attribute_hidden;
|
||||
+extern __typeof (REDIRECT_NAME) OPTIMIZE (evex) attribute_hidden;
|
||||
|
||||
static inline void *
|
||||
IFUNC_SELECTOR (void)
|
||||
@@ -34,6 +35,10 @@ IFUNC_SELECTOR (void)
|
||||
if (CPU_FEATURE_USABLE_P (cpu_features, AVX2)
|
||||
&& CPU_FEATURES_ARCH_P (cpu_features, AVX_Fast_Unaligned_Load))
|
||||
{
|
||||
+ if (CPU_FEATURE_USABLE_P (cpu_features, AVX512VL)
|
||||
+ && CPU_FEATURE_USABLE_P (cpu_features, AVX512BW))
|
||||
+ return OPTIMIZE (evex);
|
||||
+
|
||||
if (CPU_FEATURE_USABLE_P (cpu_features, RTM))
|
||||
return OPTIMIZE (avx2_rtm);
|
||||
|
||||
diff --git a/sysdeps/x86_64/multiarch/strcasecmp_l-evex.S b/sysdeps/x86_64/multiarch/strcasecmp_l-evex.S
|
||||
new file mode 100644
|
||||
index 0000000000000000..58642db748e3db71
|
||||
--- /dev/null
|
||||
+++ b/sysdeps/x86_64/multiarch/strcasecmp_l-evex.S
|
||||
@@ -0,0 +1,23 @@
|
||||
+/* strcasecmp_l optimized with EVEX.
|
||||
+ Copyright (C) 2017-2022 Free Software Foundation, Inc.
|
||||
+ This file is part of the GNU C Library.
|
||||
+
|
||||
+ The GNU C Library is free software; you can redistribute it and/or
|
||||
+ modify it under the terms of the GNU Lesser General Public
|
||||
+ License as published by the Free Software Foundation; either
|
||||
+ version 2.1 of the License, or (at your option) any later version.
|
||||
+
|
||||
+ The GNU C Library is distributed in the hope that it will be useful,
|
||||
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
|
||||
+ Lesser General Public License for more details.
|
||||
+
|
||||
+ You should have received a copy of the GNU Lesser General Public
|
||||
+ License along with the GNU C Library; if not, see
|
||||
+ <https://www.gnu.org/licenses/>. */
|
||||
+
|
||||
+#ifndef STRCMP
|
||||
+# define STRCMP __strcasecmp_l_evex
|
||||
+#endif
|
||||
+#define USE_AS_STRCASECMP_L
|
||||
+#include "strcmp-evex.S"
|
||||
diff --git a/sysdeps/x86_64/multiarch/strcmp-evex.S b/sysdeps/x86_64/multiarch/strcmp-evex.S
|
||||
index 0dfa62bd149c02b4..b81b57753c38db1f 100644
|
||||
--- a/sysdeps/x86_64/multiarch/strcmp-evex.S
|
||||
+++ b/sysdeps/x86_64/multiarch/strcmp-evex.S
|
||||
@@ -19,6 +19,9 @@
|
||||
#if IS_IN (libc)
|
||||
|
||||
# include <sysdep.h>
|
||||
+# if defined USE_AS_STRCASECMP_L
|
||||
+# include "locale-defines.h"
|
||||
+# endif
|
||||
|
||||
# ifndef STRCMP
|
||||
# define STRCMP __strcmp_evex
|
||||
@@ -34,19 +37,29 @@
|
||||
# define VMOVA vmovdqa64
|
||||
|
||||
# ifdef USE_AS_WCSCMP
|
||||
-# define TESTEQ subl $0xff,
|
||||
+# ifndef OVERFLOW_STRCMP
|
||||
+# define OVERFLOW_STRCMP __wcscmp_evex
|
||||
+# endif
|
||||
+
|
||||
+# define TESTEQ subl $0xff,
|
||||
/* Compare packed dwords. */
|
||||
# define VPCMP vpcmpd
|
||||
# define VPMINU vpminud
|
||||
# define VPTESTM vptestmd
|
||||
+# define VPTESTNM vptestnmd
|
||||
/* 1 dword char == 4 bytes. */
|
||||
# define SIZE_OF_CHAR 4
|
||||
# else
|
||||
+# ifndef OVERFLOW_STRCMP
|
||||
+# define OVERFLOW_STRCMP __strcmp_evex
|
||||
+# endif
|
||||
+
|
||||
# define TESTEQ incl
|
||||
/* Compare packed bytes. */
|
||||
# define VPCMP vpcmpb
|
||||
# define VPMINU vpminub
|
||||
# define VPTESTM vptestmb
|
||||
+# define VPTESTNM vptestnmb
|
||||
/* 1 byte char == 1 byte. */
|
||||
# define SIZE_OF_CHAR 1
|
||||
# endif
|
||||
@@ -73,11 +86,16 @@
|
||||
# define VEC_OFFSET (-VEC_SIZE)
|
||||
# endif
|
||||
|
||||
-# define XMMZERO xmm16
|
||||
# define XMM0 xmm17
|
||||
# define XMM1 xmm18
|
||||
|
||||
-# define YMMZERO ymm16
|
||||
+# define XMM10 xmm27
|
||||
+# define XMM11 xmm28
|
||||
+# define XMM12 xmm29
|
||||
+# define XMM13 xmm30
|
||||
+# define XMM14 xmm31
|
||||
+
|
||||
+
|
||||
# define YMM0 ymm17
|
||||
# define YMM1 ymm18
|
||||
# define YMM2 ymm19
|
||||
@@ -89,6 +107,87 @@
|
||||
# define YMM8 ymm25
|
||||
# define YMM9 ymm26
|
||||
# define YMM10 ymm27
|
||||
+# define YMM11 ymm28
|
||||
+# define YMM12 ymm29
|
||||
+# define YMM13 ymm30
|
||||
+# define YMM14 ymm31
|
||||
+
|
||||
+# ifdef USE_AS_STRCASECMP_L
|
||||
+# define BYTE_LOOP_REG OFFSET_REG
|
||||
+# else
|
||||
+# define BYTE_LOOP_REG ecx
|
||||
+# endif
|
||||
+
|
||||
+# ifdef USE_AS_STRCASECMP_L
|
||||
+# ifdef USE_AS_STRNCMP
|
||||
+# define STRCASECMP __strncasecmp_evex
|
||||
+# define LOCALE_REG rcx
|
||||
+# define LOCALE_REG_LP RCX_LP
|
||||
+# define STRCASECMP_NONASCII __strncasecmp_l_nonascii
|
||||
+# else
|
||||
+# define STRCASECMP __strcasecmp_evex
|
||||
+# define LOCALE_REG rdx
|
||||
+# define LOCALE_REG_LP RDX_LP
|
||||
+# define STRCASECMP_NONASCII __strcasecmp_l_nonascii
|
||||
+# endif
|
||||
+# endif
|
||||
+
|
||||
+# define LCASE_MIN_YMM %YMM12
|
||||
+# define LCASE_MAX_YMM %YMM13
|
||||
+# define CASE_ADD_YMM %YMM14
|
||||
+
|
||||
+# define LCASE_MIN_XMM %XMM12
|
||||
+# define LCASE_MAX_XMM %XMM13
|
||||
+# define CASE_ADD_XMM %XMM14
|
||||
+
|
||||
+ /* NB: wcsncmp uses r11 but strcasecmp is never used in
|
||||
+ conjunction with wcscmp. */
|
||||
+# define TOLOWER_BASE %r11
|
||||
+
|
||||
+# ifdef USE_AS_STRCASECMP_L
|
||||
+# define _REG(x, y) x ## y
|
||||
+# define REG(x, y) _REG(x, y)
|
||||
+# define TOLOWER(reg1, reg2, ext) \
|
||||
+ vpsubb REG(LCASE_MIN_, ext), reg1, REG(%ext, 10); \
|
||||
+ vpsubb REG(LCASE_MIN_, ext), reg2, REG(%ext, 11); \
|
||||
+ vpcmpub $1, REG(LCASE_MAX_, ext), REG(%ext, 10), %k5; \
|
||||
+ vpcmpub $1, REG(LCASE_MAX_, ext), REG(%ext, 11), %k6; \
|
||||
+ vpaddb reg1, REG(CASE_ADD_, ext), reg1{%k5}; \
|
||||
+ vpaddb reg2, REG(CASE_ADD_, ext), reg2{%k6}
|
||||
+
|
||||
+# define TOLOWER_gpr(src, dst) movl (TOLOWER_BASE, src, 4), dst
|
||||
+# define TOLOWER_YMM(...) TOLOWER(__VA_ARGS__, YMM)
|
||||
+# define TOLOWER_XMM(...) TOLOWER(__VA_ARGS__, XMM)
|
||||
+
|
||||
+# define CMP_R1_R2(s1_reg, s2_reg, reg_out, ext) \
|
||||
+ TOLOWER (s1_reg, s2_reg, ext); \
|
||||
+ VPCMP $0, s1_reg, s2_reg, reg_out
|
||||
+
|
||||
+# define CMP_R1_S2(s1_reg, s2_mem, s2_reg, reg_out, ext) \
|
||||
+ VMOVU s2_mem, s2_reg; \
|
||||
+ CMP_R1_R2(s1_reg, s2_reg, reg_out, ext)
|
||||
+
|
||||
+# define CMP_R1_R2_YMM(...) CMP_R1_R2(__VA_ARGS__, YMM)
|
||||
+# define CMP_R1_R2_XMM(...) CMP_R1_R2(__VA_ARGS__, XMM)
|
||||
+
|
||||
+# define CMP_R1_S2_YMM(...) CMP_R1_S2(__VA_ARGS__, YMM)
|
||||
+# define CMP_R1_S2_XMM(...) CMP_R1_S2(__VA_ARGS__, XMM)
|
||||
+
|
||||
+# else
|
||||
+# define TOLOWER_gpr(...)
|
||||
+# define TOLOWER_YMM(...)
|
||||
+# define TOLOWER_XMM(...)
|
||||
+
|
||||
+# define CMP_R1_R2_YMM(s1_reg, s2_reg, reg_out) \
|
||||
+ VPCMP $0, s2_reg, s1_reg, reg_out
|
||||
+
|
||||
+# define CMP_R1_R2_XMM(...) CMP_R1_R2_YMM(__VA_ARGS__)
|
||||
+
|
||||
+# define CMP_R1_S2_YMM(s1_reg, s2_mem, unused, reg_out) \
|
||||
+ VPCMP $0, s2_mem, s1_reg, reg_out
|
||||
+
|
||||
+# define CMP_R1_S2_XMM(...) CMP_R1_S2_YMM(__VA_ARGS__)
|
||||
+# endif
|
||||
|
||||
/* Warning!
|
||||
wcscmp/wcsncmp have to use SIGNED comparison for elements.
|
||||
@@ -112,8 +211,45 @@
|
||||
returned. */
|
||||
|
||||
.section .text.evex, "ax", @progbits
|
||||
-ENTRY(STRCMP)
|
||||
+ .align 16
|
||||
+ .type STRCMP, @function
|
||||
+ .globl STRCMP
|
||||
+ .hidden STRCMP
|
||||
+
|
||||
+# ifdef USE_AS_STRCASECMP_L
|
||||
+ENTRY (STRCASECMP)
|
||||
+ movq __libc_tsd_LOCALE@gottpoff(%rip), %rax
|
||||
+ mov %fs:(%rax), %LOCALE_REG_LP
|
||||
+
|
||||
+ /* Either 1 or 5 bytes (dependeing if CET is enabled). */
|
||||
+ .p2align 4
|
||||
+END (STRCASECMP)
|
||||
+ /* FALLTHROUGH to strcasecmp/strncasecmp_l. */
|
||||
+# endif
|
||||
+
|
||||
+ .p2align 4
|
||||
+STRCMP:
|
||||
+ cfi_startproc
|
||||
+ _CET_ENDBR
|
||||
+ CALL_MCOUNT
|
||||
+
|
||||
+# if defined USE_AS_STRCASECMP_L
|
||||
+ /* We have to fall back on the C implementation for locales with
|
||||
+ encodings not matching ASCII for single bytes. */
|
||||
+# if LOCALE_T___LOCALES != 0 || LC_CTYPE != 0
|
||||
+ mov LOCALE_T___LOCALES + LC_CTYPE * LP_SIZE(%LOCALE_REG), %RAX_LP
|
||||
+# else
|
||||
+ mov (%LOCALE_REG), %RAX_LP
|
||||
+# endif
|
||||
+ testl $1, LOCALE_DATA_VALUES + _NL_CTYPE_NONASCII_CASE * SIZEOF_VALUES(%rax)
|
||||
+ jne STRCASECMP_NONASCII
|
||||
+ leaq _nl_C_LC_CTYPE_tolower + 128 * 4(%rip), TOLOWER_BASE
|
||||
+# endif
|
||||
+
|
||||
# ifdef USE_AS_STRNCMP
|
||||
+ /* Don't overwrite LOCALE_REG (rcx) until we have pass
|
||||
+ L(one_or_less). Otherwise we might use the wrong locale in
|
||||
+ the OVERFLOW_STRCMP (strcasecmp_l). */
|
||||
# ifdef __ILP32__
|
||||
/* Clear the upper 32 bits. */
|
||||
movl %edx, %edx
|
||||
@@ -125,6 +261,32 @@ ENTRY(STRCMP)
|
||||
actually bound the buffer. */
|
||||
jle L(one_or_less)
|
||||
# endif
|
||||
+
|
||||
+# if defined USE_AS_STRCASECMP_L
|
||||
+ .section .rodata.cst32, "aM", @progbits, 32
|
||||
+ .align 32
|
||||
+L(lcase_min):
|
||||
+ .quad 0x4141414141414141
|
||||
+ .quad 0x4141414141414141
|
||||
+ .quad 0x4141414141414141
|
||||
+ .quad 0x4141414141414141
|
||||
+L(lcase_max):
|
||||
+ .quad 0x1a1a1a1a1a1a1a1a
|
||||
+ .quad 0x1a1a1a1a1a1a1a1a
|
||||
+ .quad 0x1a1a1a1a1a1a1a1a
|
||||
+ .quad 0x1a1a1a1a1a1a1a1a
|
||||
+L(case_add):
|
||||
+ .quad 0x2020202020202020
|
||||
+ .quad 0x2020202020202020
|
||||
+ .quad 0x2020202020202020
|
||||
+ .quad 0x2020202020202020
|
||||
+ .previous
|
||||
+
|
||||
+ vmovdqa64 L(lcase_min)(%rip), LCASE_MIN_YMM
|
||||
+ vmovdqa64 L(lcase_max)(%rip), LCASE_MAX_YMM
|
||||
+ vmovdqa64 L(case_add)(%rip), CASE_ADD_YMM
|
||||
+# endif
|
||||
+
|
||||
movl %edi, %eax
|
||||
orl %esi, %eax
|
||||
/* Shift out the bits irrelivant to page boundary ([63:12]). */
|
||||
@@ -139,7 +301,7 @@ L(no_page_cross):
|
||||
VPTESTM %YMM0, %YMM0, %k2
|
||||
/* Each bit cleared in K1 represents a mismatch or a null CHAR
|
||||
in YMM0 and 32 bytes at (%rsi). */
|
||||
- VPCMP $0, (%rsi), %YMM0, %k1{%k2}
|
||||
+ CMP_R1_S2_YMM (%YMM0, (%rsi), %YMM1, %k1){%k2}
|
||||
kmovd %k1, %ecx
|
||||
# ifdef USE_AS_STRNCMP
|
||||
cmpq $CHAR_PER_VEC, %rdx
|
||||
@@ -169,6 +331,8 @@ L(return_vec_0):
|
||||
# else
|
||||
movzbl (%rdi, %rcx), %eax
|
||||
movzbl (%rsi, %rcx), %ecx
|
||||
+ TOLOWER_gpr (%rax, %eax)
|
||||
+ TOLOWER_gpr (%rcx, %ecx)
|
||||
subl %ecx, %eax
|
||||
# endif
|
||||
L(ret0):
|
||||
@@ -188,11 +352,15 @@ L(ret_zero):
|
||||
|
||||
.p2align 4,, 5
|
||||
L(one_or_less):
|
||||
+# ifdef USE_AS_STRCASECMP_L
|
||||
+ /* Set locale argument for strcasecmp. */
|
||||
+ movq %LOCALE_REG, %rdx
|
||||
+# endif
|
||||
jb L(ret_zero)
|
||||
-# ifdef USE_AS_WCSCMP
|
||||
/* 'nbe' covers the case where length is negative (large
|
||||
unsigned). */
|
||||
- jnbe __wcscmp_evex
|
||||
+ jnbe OVERFLOW_STRCMP
|
||||
+# ifdef USE_AS_WCSCMP
|
||||
movl (%rdi), %edx
|
||||
xorl %eax, %eax
|
||||
cmpl (%rsi), %edx
|
||||
@@ -201,11 +369,10 @@ L(one_or_less):
|
||||
negl %eax
|
||||
orl $1, %eax
|
||||
# else
|
||||
- /* 'nbe' covers the case where length is negative (large
|
||||
- unsigned). */
|
||||
- jnbe __strcmp_evex
|
||||
movzbl (%rdi), %eax
|
||||
movzbl (%rsi), %ecx
|
||||
+ TOLOWER_gpr (%rax, %eax)
|
||||
+ TOLOWER_gpr (%rcx, %ecx)
|
||||
subl %ecx, %eax
|
||||
# endif
|
||||
L(ret1):
|
||||
@@ -233,6 +400,8 @@ L(return_vec_1):
|
||||
# else
|
||||
movzbl VEC_SIZE(%rdi, %rcx), %eax
|
||||
movzbl VEC_SIZE(%rsi, %rcx), %ecx
|
||||
+ TOLOWER_gpr (%rax, %eax)
|
||||
+ TOLOWER_gpr (%rcx, %ecx)
|
||||
subl %ecx, %eax
|
||||
# endif
|
||||
L(ret2):
|
||||
@@ -270,6 +439,8 @@ L(return_vec_2):
|
||||
# else
|
||||
movzbl (VEC_SIZE * 2)(%rdi, %rcx), %eax
|
||||
movzbl (VEC_SIZE * 2)(%rsi, %rcx), %ecx
|
||||
+ TOLOWER_gpr (%rax, %eax)
|
||||
+ TOLOWER_gpr (%rcx, %ecx)
|
||||
subl %ecx, %eax
|
||||
# endif
|
||||
L(ret3):
|
||||
@@ -290,6 +461,8 @@ L(return_vec_3):
|
||||
# else
|
||||
movzbl (VEC_SIZE * 3)(%rdi, %rcx), %eax
|
||||
movzbl (VEC_SIZE * 3)(%rsi, %rcx), %ecx
|
||||
+ TOLOWER_gpr (%rax, %eax)
|
||||
+ TOLOWER_gpr (%rcx, %ecx)
|
||||
subl %ecx, %eax
|
||||
# endif
|
||||
L(ret4):
|
||||
@@ -303,7 +476,7 @@ L(more_3x_vec):
|
||||
/* Safe to compare 4x vectors. */
|
||||
VMOVU (VEC_SIZE)(%rdi), %YMM0
|
||||
VPTESTM %YMM0, %YMM0, %k2
|
||||
- VPCMP $0, (VEC_SIZE)(%rsi), %YMM0, %k1{%k2}
|
||||
+ CMP_R1_S2_YMM (%YMM0, VEC_SIZE(%rsi), %YMM1, %k1){%k2}
|
||||
kmovd %k1, %ecx
|
||||
TESTEQ %ecx
|
||||
jnz L(return_vec_1)
|
||||
@@ -315,14 +488,14 @@ L(more_3x_vec):
|
||||
|
||||
VMOVU (VEC_SIZE * 2)(%rdi), %YMM0
|
||||
VPTESTM %YMM0, %YMM0, %k2
|
||||
- VPCMP $0, (VEC_SIZE * 2)(%rsi), %YMM0, %k1{%k2}
|
||||
+ CMP_R1_S2_YMM (%YMM0, (VEC_SIZE * 2)(%rsi), %YMM1, %k1){%k2}
|
||||
kmovd %k1, %ecx
|
||||
TESTEQ %ecx
|
||||
jnz L(return_vec_2)
|
||||
|
||||
VMOVU (VEC_SIZE * 3)(%rdi), %YMM0
|
||||
VPTESTM %YMM0, %YMM0, %k2
|
||||
- VPCMP $0, (VEC_SIZE * 3)(%rsi), %YMM0, %k1{%k2}
|
||||
+ CMP_R1_S2_YMM (%YMM0, (VEC_SIZE * 3)(%rsi), %YMM1, %k1){%k2}
|
||||
kmovd %k1, %ecx
|
||||
TESTEQ %ecx
|
||||
jnz L(return_vec_3)
|
||||
@@ -381,7 +554,6 @@ L(prepare_loop_aligned):
|
||||
subl %esi, %eax
|
||||
andl $(PAGE_SIZE - 1), %eax
|
||||
|
||||
- vpxorq %YMMZERO, %YMMZERO, %YMMZERO
|
||||
|
||||
/* Loop 4x comparisons at a time. */
|
||||
.p2align 4
|
||||
@@ -413,22 +585,35 @@ L(loop_skip_page_cross_check):
|
||||
/* A zero CHAR in YMM9 means that there is a null CHAR. */
|
||||
VPMINU %YMM8, %YMM9, %YMM9
|
||||
|
||||
- /* Each bit set in K1 represents a non-null CHAR in YMM8. */
|
||||
+ /* Each bit set in K1 represents a non-null CHAR in YMM9. */
|
||||
VPTESTM %YMM9, %YMM9, %k1
|
||||
-
|
||||
+# ifndef USE_AS_STRCASECMP_L
|
||||
vpxorq (VEC_SIZE * 0)(%rsi), %YMM0, %YMM1
|
||||
vpxorq (VEC_SIZE * 1)(%rsi), %YMM2, %YMM3
|
||||
vpxorq (VEC_SIZE * 2)(%rsi), %YMM4, %YMM5
|
||||
/* Ternary logic to xor (VEC_SIZE * 3)(%rsi) with YMM6 while
|
||||
oring with YMM1. Result is stored in YMM6. */
|
||||
vpternlogd $0xde, (VEC_SIZE * 3)(%rsi), %YMM1, %YMM6
|
||||
-
|
||||
+# else
|
||||
+ VMOVU (VEC_SIZE * 0)(%rsi), %YMM1
|
||||
+ TOLOWER_YMM (%YMM0, %YMM1)
|
||||
+ VMOVU (VEC_SIZE * 1)(%rsi), %YMM3
|
||||
+ TOLOWER_YMM (%YMM2, %YMM3)
|
||||
+ VMOVU (VEC_SIZE * 2)(%rsi), %YMM5
|
||||
+ TOLOWER_YMM (%YMM4, %YMM5)
|
||||
+ VMOVU (VEC_SIZE * 3)(%rsi), %YMM7
|
||||
+ TOLOWER_YMM (%YMM6, %YMM7)
|
||||
+ vpxorq %YMM0, %YMM1, %YMM1
|
||||
+ vpxorq %YMM2, %YMM3, %YMM3
|
||||
+ vpxorq %YMM4, %YMM5, %YMM5
|
||||
+ vpternlogd $0xde, %YMM7, %YMM1, %YMM6
|
||||
+# endif
|
||||
/* Or together YMM3, YMM5, and YMM6. */
|
||||
vpternlogd $0xfe, %YMM3, %YMM5, %YMM6
|
||||
|
||||
|
||||
/* A non-zero CHAR in YMM6 represents a mismatch. */
|
||||
- VPCMP $0, %YMMZERO, %YMM6, %k0{%k1}
|
||||
+ VPTESTNM %YMM6, %YMM6, %k0{%k1}
|
||||
kmovd %k0, %LOOP_REG
|
||||
|
||||
TESTEQ %LOOP_REG
|
||||
@@ -437,13 +622,13 @@ L(loop_skip_page_cross_check):
|
||||
|
||||
/* Find which VEC has the mismatch of end of string. */
|
||||
VPTESTM %YMM0, %YMM0, %k1
|
||||
- VPCMP $0, %YMMZERO, %YMM1, %k0{%k1}
|
||||
+ VPTESTNM %YMM1, %YMM1, %k0{%k1}
|
||||
kmovd %k0, %ecx
|
||||
TESTEQ %ecx
|
||||
jnz L(return_vec_0_end)
|
||||
|
||||
VPTESTM %YMM2, %YMM2, %k1
|
||||
- VPCMP $0, %YMMZERO, %YMM3, %k0{%k1}
|
||||
+ VPTESTNM %YMM3, %YMM3, %k0{%k1}
|
||||
kmovd %k0, %ecx
|
||||
TESTEQ %ecx
|
||||
jnz L(return_vec_1_end)
|
||||
@@ -457,7 +642,7 @@ L(return_vec_2_3_end):
|
||||
# endif
|
||||
|
||||
VPTESTM %YMM4, %YMM4, %k1
|
||||
- VPCMP $0, %YMMZERO, %YMM5, %k0{%k1}
|
||||
+ VPTESTNM %YMM5, %YMM5, %k0{%k1}
|
||||
kmovd %k0, %ecx
|
||||
TESTEQ %ecx
|
||||
# if CHAR_PER_VEC <= 16
|
||||
@@ -493,6 +678,8 @@ L(return_vec_3_end):
|
||||
# else
|
||||
movzbl (VEC_SIZE * 2)(%rdi, %LOOP_REG64), %eax
|
||||
movzbl (VEC_SIZE * 2)(%rsi, %LOOP_REG64), %ecx
|
||||
+ TOLOWER_gpr (%rax, %eax)
|
||||
+ TOLOWER_gpr (%rcx, %ecx)
|
||||
subl %ecx, %eax
|
||||
xorl %r8d, %eax
|
||||
subl %r8d, %eax
|
||||
@@ -545,6 +732,8 @@ L(return_vec_0_end):
|
||||
# else
|
||||
movzbl (%rdi, %rcx), %eax
|
||||
movzbl (%rsi, %rcx), %ecx
|
||||
+ TOLOWER_gpr (%rax, %eax)
|
||||
+ TOLOWER_gpr (%rcx, %ecx)
|
||||
subl %ecx, %eax
|
||||
/* Flip `eax` if `rdi` and `rsi` where swapped in page cross
|
||||
logic. Subtract `r8d` after xor for zero case. */
|
||||
@@ -569,6 +758,8 @@ L(return_vec_1_end):
|
||||
# else
|
||||
movzbl VEC_SIZE(%rdi, %rcx), %eax
|
||||
movzbl VEC_SIZE(%rsi, %rcx), %ecx
|
||||
+ TOLOWER_gpr (%rax, %eax)
|
||||
+ TOLOWER_gpr (%rcx, %ecx)
|
||||
subl %ecx, %eax
|
||||
xorl %r8d, %eax
|
||||
subl %r8d, %eax
|
||||
@@ -598,7 +789,7 @@ L(page_cross_during_loop):
|
||||
|
||||
VMOVA (%rdi), %YMM0
|
||||
VPTESTM %YMM0, %YMM0, %k2
|
||||
- VPCMP $0, (%rsi), %YMM0, %k1{%k2}
|
||||
+ CMP_R1_S2_YMM (%YMM0, (%rsi), %YMM1, %k1){%k2}
|
||||
kmovd %k1, %ecx
|
||||
TESTEQ %ecx
|
||||
jnz L(return_vec_0_end)
|
||||
@@ -619,8 +810,7 @@ L(less_1x_vec_till_page_cross):
|
||||
been loaded earlier so must be valid. */
|
||||
VMOVU -VEC_SIZE(%rdi, %rax), %YMM0
|
||||
VPTESTM %YMM0, %YMM0, %k2
|
||||
- VPCMP $0, -VEC_SIZE(%rsi, %rax), %YMM0, %k1{%k2}
|
||||
-
|
||||
+ CMP_R1_S2_YMM (%YMM0, -VEC_SIZE(%rsi, %rax), %YMM1, %k1){%k2}
|
||||
/* Mask of potentially valid bits. The lower bits can be out of
|
||||
range comparisons (but safe regarding page crosses). */
|
||||
|
||||
@@ -642,6 +832,8 @@ L(less_1x_vec_till_page_cross):
|
||||
|
||||
# ifdef USE_AS_STRNCMP
|
||||
# ifdef USE_AS_WCSCMP
|
||||
+ /* NB: strcasecmp not used with WCSCMP so this access to r11 is
|
||||
+ safe. */
|
||||
movl %eax, %r11d
|
||||
shrl $2, %r11d
|
||||
cmpq %r11, %rdx
|
||||
@@ -679,6 +871,8 @@ L(return_page_cross_cmp_mem):
|
||||
# else
|
||||
movzbl VEC_OFFSET(%rdi, %rcx), %eax
|
||||
movzbl VEC_OFFSET(%rsi, %rcx), %ecx
|
||||
+ TOLOWER_gpr (%rax, %eax)
|
||||
+ TOLOWER_gpr (%rcx, %ecx)
|
||||
subl %ecx, %eax
|
||||
xorl %r8d, %eax
|
||||
subl %r8d, %eax
|
||||
@@ -709,7 +903,7 @@ L(more_2x_vec_till_page_cross):
|
||||
|
||||
VMOVA VEC_SIZE(%rdi), %YMM0
|
||||
VPTESTM %YMM0, %YMM0, %k2
|
||||
- VPCMP $0, VEC_SIZE(%rsi), %YMM0, %k1{%k2}
|
||||
+ CMP_R1_S2_YMM (%YMM0, VEC_SIZE(%rsi), %YMM1, %k1){%k2}
|
||||
kmovd %k1, %ecx
|
||||
TESTEQ %ecx
|
||||
jnz L(return_vec_1_end)
|
||||
@@ -724,14 +918,14 @@ L(more_2x_vec_till_page_cross):
|
||||
/* Safe to include comparisons from lower bytes. */
|
||||
VMOVU -(VEC_SIZE * 2)(%rdi, %rax), %YMM0
|
||||
VPTESTM %YMM0, %YMM0, %k2
|
||||
- VPCMP $0, -(VEC_SIZE * 2)(%rsi, %rax), %YMM0, %k1{%k2}
|
||||
+ CMP_R1_S2_YMM (%YMM0, -(VEC_SIZE * 2)(%rsi, %rax), %YMM1, %k1){%k2}
|
||||
kmovd %k1, %ecx
|
||||
TESTEQ %ecx
|
||||
jnz L(return_vec_page_cross_0)
|
||||
|
||||
VMOVU -(VEC_SIZE * 1)(%rdi, %rax), %YMM0
|
||||
VPTESTM %YMM0, %YMM0, %k2
|
||||
- VPCMP $0, -(VEC_SIZE * 1)(%rsi, %rax), %YMM0, %k1{%k2}
|
||||
+ CMP_R1_S2_YMM (%YMM0, -(VEC_SIZE * 1)(%rsi, %rax), %YMM1, %k1){%k2}
|
||||
kmovd %k1, %ecx
|
||||
TESTEQ %ecx
|
||||
jnz L(return_vec_page_cross_1)
|
||||
@@ -740,6 +934,8 @@ L(more_2x_vec_till_page_cross):
|
||||
/* Must check length here as length might proclude reading next
|
||||
page. */
|
||||
# ifdef USE_AS_WCSCMP
|
||||
+ /* NB: strcasecmp not used with WCSCMP so this access to r11 is
|
||||
+ safe. */
|
||||
movl %eax, %r11d
|
||||
shrl $2, %r11d
|
||||
cmpq %r11, %rdx
|
||||
@@ -754,12 +950,19 @@ L(more_2x_vec_till_page_cross):
|
||||
VMOVA (VEC_SIZE * 3)(%rdi), %YMM6
|
||||
VPMINU %YMM4, %YMM6, %YMM9
|
||||
VPTESTM %YMM9, %YMM9, %k1
|
||||
-
|
||||
+# ifndef USE_AS_STRCASECMP_L
|
||||
vpxorq (VEC_SIZE * 2)(%rsi), %YMM4, %YMM5
|
||||
/* YMM6 = YMM5 | ((VEC_SIZE * 3)(%rsi) ^ YMM6). */
|
||||
vpternlogd $0xde, (VEC_SIZE * 3)(%rsi), %YMM5, %YMM6
|
||||
-
|
||||
- VPCMP $0, %YMMZERO, %YMM6, %k0{%k1}
|
||||
+# else
|
||||
+ VMOVU (VEC_SIZE * 2)(%rsi), %YMM5
|
||||
+ TOLOWER_YMM (%YMM4, %YMM5)
|
||||
+ VMOVU (VEC_SIZE * 3)(%rsi), %YMM7
|
||||
+ TOLOWER_YMM (%YMM6, %YMM7)
|
||||
+ vpxorq %YMM4, %YMM5, %YMM5
|
||||
+ vpternlogd $0xde, %YMM7, %YMM5, %YMM6
|
||||
+# endif
|
||||
+ VPTESTNM %YMM6, %YMM6, %k0{%k1}
|
||||
kmovd %k0, %LOOP_REG
|
||||
TESTEQ %LOOP_REG
|
||||
jnz L(return_vec_2_3_end)
|
||||
@@ -815,6 +1018,8 @@ L(return_vec_page_cross_1):
|
||||
# else
|
||||
movzbl VEC_OFFSET(%rdi, %rcx), %eax
|
||||
movzbl VEC_OFFSET(%rsi, %rcx), %ecx
|
||||
+ TOLOWER_gpr (%rax, %eax)
|
||||
+ TOLOWER_gpr (%rcx, %ecx)
|
||||
subl %ecx, %eax
|
||||
xorl %r8d, %eax
|
||||
subl %r8d, %eax
|
||||
@@ -871,7 +1076,7 @@ L(page_cross):
|
||||
L(page_cross_loop):
|
||||
VMOVU (%rdi, %OFFSET_REG64, SIZE_OF_CHAR), %YMM0
|
||||
VPTESTM %YMM0, %YMM0, %k2
|
||||
- VPCMP $0, (%rsi, %OFFSET_REG64, SIZE_OF_CHAR), %YMM0, %k1{%k2}
|
||||
+ CMP_R1_S2_YMM (%YMM0, (%rsi, %OFFSET_REG64, SIZE_OF_CHAR), %YMM1, %k1){%k2}
|
||||
kmovd %k1, %ecx
|
||||
TESTEQ %ecx
|
||||
jnz L(check_ret_vec_page_cross)
|
||||
@@ -895,7 +1100,7 @@ L(page_cross_loop):
|
||||
*/
|
||||
VMOVU (%rdi, %OFFSET_REG64, SIZE_OF_CHAR), %YMM0
|
||||
VPTESTM %YMM0, %YMM0, %k2
|
||||
- VPCMP $0, (%rsi, %OFFSET_REG64, SIZE_OF_CHAR), %YMM0, %k1{%k2}
|
||||
+ CMP_R1_S2_YMM (%YMM0, (%rsi, %OFFSET_REG64, SIZE_OF_CHAR), %YMM1, %k1){%k2}
|
||||
|
||||
kmovd %k1, %ecx
|
||||
# ifdef USE_AS_STRNCMP
|
||||
@@ -930,6 +1135,8 @@ L(ret_vec_page_cross_cont):
|
||||
# else
|
||||
movzbl (%rdi, %rcx, SIZE_OF_CHAR), %eax
|
||||
movzbl (%rsi, %rcx, SIZE_OF_CHAR), %ecx
|
||||
+ TOLOWER_gpr (%rax, %eax)
|
||||
+ TOLOWER_gpr (%rcx, %ecx)
|
||||
subl %ecx, %eax
|
||||
xorl %r8d, %eax
|
||||
subl %r8d, %eax
|
||||
@@ -989,7 +1196,7 @@ L(less_1x_vec_till_page):
|
||||
/* Use 16 byte comparison. */
|
||||
vmovdqu (%rdi), %xmm0
|
||||
VPTESTM %xmm0, %xmm0, %k2
|
||||
- VPCMP $0, (%rsi), %xmm0, %k1{%k2}
|
||||
+ CMP_R1_S2_XMM (%xmm0, (%rsi), %xmm1, %k1){%k2}
|
||||
kmovd %k1, %ecx
|
||||
# ifdef USE_AS_WCSCMP
|
||||
subl $0xf, %ecx
|
||||
@@ -1009,7 +1216,7 @@ L(less_1x_vec_till_page):
|
||||
# endif
|
||||
vmovdqu (%rdi, %OFFSET_REG64, SIZE_OF_CHAR), %xmm0
|
||||
VPTESTM %xmm0, %xmm0, %k2
|
||||
- VPCMP $0, (%rsi, %OFFSET_REG64, SIZE_OF_CHAR), %xmm0, %k1{%k2}
|
||||
+ CMP_R1_S2_XMM (%xmm0, (%rsi, %OFFSET_REG64, SIZE_OF_CHAR), %xmm1, %k1){%k2}
|
||||
kmovd %k1, %ecx
|
||||
# ifdef USE_AS_WCSCMP
|
||||
subl $0xf, %ecx
|
||||
@@ -1048,7 +1255,7 @@ L(less_16_till_page):
|
||||
vmovq (%rdi), %xmm0
|
||||
vmovq (%rsi), %xmm1
|
||||
VPTESTM %xmm0, %xmm0, %k2
|
||||
- VPCMP $0, %xmm1, %xmm0, %k1{%k2}
|
||||
+ CMP_R1_R2_XMM (%xmm0, %xmm1, %k1){%k2}
|
||||
kmovd %k1, %ecx
|
||||
# ifdef USE_AS_WCSCMP
|
||||
subl $0x3, %ecx
|
||||
@@ -1068,7 +1275,7 @@ L(less_16_till_page):
|
||||
vmovq (%rdi, %OFFSET_REG64, SIZE_OF_CHAR), %xmm0
|
||||
vmovq (%rsi, %OFFSET_REG64, SIZE_OF_CHAR), %xmm1
|
||||
VPTESTM %xmm0, %xmm0, %k2
|
||||
- VPCMP $0, %xmm1, %xmm0, %k1{%k2}
|
||||
+ CMP_R1_R2_XMM (%xmm0, %xmm1, %k1){%k2}
|
||||
kmovd %k1, %ecx
|
||||
# ifdef USE_AS_WCSCMP
|
||||
subl $0x3, %ecx
|
||||
@@ -1128,7 +1335,7 @@ L(ret_less_8_wcs):
|
||||
vmovd (%rdi), %xmm0
|
||||
vmovd (%rsi), %xmm1
|
||||
VPTESTM %xmm0, %xmm0, %k2
|
||||
- VPCMP $0, %xmm1, %xmm0, %k1{%k2}
|
||||
+ CMP_R1_R2_XMM (%xmm0, %xmm1, %k1){%k2}
|
||||
kmovd %k1, %ecx
|
||||
subl $0xf, %ecx
|
||||
jnz L(check_ret_vec_page_cross)
|
||||
@@ -1143,7 +1350,7 @@ L(ret_less_8_wcs):
|
||||
vmovd (%rdi, %OFFSET_REG64, SIZE_OF_CHAR), %xmm0
|
||||
vmovd (%rsi, %OFFSET_REG64, SIZE_OF_CHAR), %xmm1
|
||||
VPTESTM %xmm0, %xmm0, %k2
|
||||
- VPCMP $0, %xmm1, %xmm0, %k1{%k2}
|
||||
+ CMP_R1_R2_XMM (%xmm0, %xmm1, %k1){%k2}
|
||||
kmovd %k1, %ecx
|
||||
subl $0xf, %ecx
|
||||
jnz L(check_ret_vec_page_cross)
|
||||
@@ -1176,7 +1383,9 @@ L(less_4_till_page):
|
||||
L(less_4_loop):
|
||||
movzbl (%rdi), %eax
|
||||
movzbl (%rsi, %rdi), %ecx
|
||||
- subl %ecx, %eax
|
||||
+ TOLOWER_gpr (%rax, %eax)
|
||||
+ TOLOWER_gpr (%rcx, %BYTE_LOOP_REG)
|
||||
+ subl %BYTE_LOOP_REG, %eax
|
||||
jnz L(ret_less_4_loop)
|
||||
testl %ecx, %ecx
|
||||
jz L(ret_zero_4_loop)
|
||||
@@ -1203,5 +1412,6 @@ L(ret_less_4_loop):
|
||||
subl %r8d, %eax
|
||||
ret
|
||||
# endif
|
||||
-END(STRCMP)
|
||||
+ cfi_endproc
|
||||
+ .size STRCMP, .-STRCMP
|
||||
#endif
|
||||
diff --git a/sysdeps/x86_64/multiarch/strncase_l-evex.S b/sysdeps/x86_64/multiarch/strncase_l-evex.S
|
||||
new file mode 100644
|
||||
index 0000000000000000..8a5af3695cb8cfff
|
||||
--- /dev/null
|
||||
+++ b/sysdeps/x86_64/multiarch/strncase_l-evex.S
|
||||
@@ -0,0 +1,25 @@
|
||||
+/* strncasecmp_l optimized with EVEX.
|
||||
+ Copyright (C) 2017-2022 Free Software Foundation, Inc.
|
||||
+ This file is part of the GNU C Library.
|
||||
+
|
||||
+ The GNU C Library is free software; you can redistribute it and/or
|
||||
+ modify it under the terms of the GNU Lesser General Public
|
||||
+ License as published by the Free Software Foundation; either
|
||||
+ version 2.1 of the License, or (at your option) any later version.
|
||||
+
|
||||
+ The GNU C Library is distributed in the hope that it will be useful,
|
||||
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
|
||||
+ Lesser General Public License for more details.
|
||||
+
|
||||
+ You should have received a copy of the GNU Lesser General Public
|
||||
+ License along with the GNU C Library; if not, see
|
||||
+ <https://www.gnu.org/licenses/>. */
|
||||
+
|
||||
+#ifndef STRCMP
|
||||
+# define STRCMP __strncasecmp_l_evex
|
||||
+#endif
|
||||
+#define OVERFLOW_STRCMP __strcasecmp_l_evex
|
||||
+#define USE_AS_STRCASECMP_L
|
||||
+#define USE_AS_STRNCMP
|
||||
+#include "strcmp-evex.S"
|
902
glibc-upstream-2.34-229.patch
Normal file
902
glibc-upstream-2.34-229.patch
Normal file
@ -0,0 +1,902 @@
|
||||
commit 80883f43545f4f9afcb26beef9358dfdcd021bd6
|
||||
Author: Noah Goldstein <goldstein.w.n@gmail.com>
|
||||
Date: Wed Mar 23 16:57:46 2022 -0500
|
||||
|
||||
x86: Remove AVX str{n}casecmp
|
||||
|
||||
The rational is:
|
||||
|
||||
1. SSE42 has nearly identical logic so any benefit is minimal (3.4%
|
||||
regression on Tigerlake using SSE42 versus AVX across the
|
||||
benchtest suite).
|
||||
2. AVX2 version covers the majority of targets that previously
|
||||
prefered it.
|
||||
3. The targets where AVX would still be best (SnB and IVB) are
|
||||
becoming outdated.
|
||||
|
||||
All in all the saving the code size is worth it.
|
||||
|
||||
All string/memory tests pass.
|
||||
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
|
||||
|
||||
(cherry picked from commit 305769b2a15c2e96f9e1b5195d3c4e0d6f0f4b68)
|
||||
|
||||
diff --git a/sysdeps/x86_64/multiarch/Makefile b/sysdeps/x86_64/multiarch/Makefile
|
||||
index 359712c1491a2431..bca82e38d86cc440 100644
|
||||
--- a/sysdeps/x86_64/multiarch/Makefile
|
||||
+++ b/sysdeps/x86_64/multiarch/Makefile
|
||||
@@ -50,7 +50,6 @@ sysdep_routines += \
|
||||
stpncpy-evex \
|
||||
stpncpy-sse2-unaligned \
|
||||
stpncpy-ssse3 \
|
||||
- strcasecmp_l-avx \
|
||||
strcasecmp_l-avx2 \
|
||||
strcasecmp_l-avx2-rtm \
|
||||
strcasecmp_l-evex \
|
||||
@@ -91,7 +90,6 @@ sysdep_routines += \
|
||||
strlen-avx2-rtm \
|
||||
strlen-evex \
|
||||
strlen-sse2 \
|
||||
- strncase_l-avx \
|
||||
strncase_l-avx2 \
|
||||
strncase_l-avx2-rtm \
|
||||
strncase_l-evex \
|
||||
diff --git a/sysdeps/x86_64/multiarch/ifunc-impl-list.c b/sysdeps/x86_64/multiarch/ifunc-impl-list.c
|
||||
index f6994e5406933d53..4c7834dd0b951fa4 100644
|
||||
--- a/sysdeps/x86_64/multiarch/ifunc-impl-list.c
|
||||
+++ b/sysdeps/x86_64/multiarch/ifunc-impl-list.c
|
||||
@@ -429,9 +429,6 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array,
|
||||
(CPU_FEATURE_USABLE (AVX2)
|
||||
&& CPU_FEATURE_USABLE (RTM)),
|
||||
__strcasecmp_avx2_rtm)
|
||||
- IFUNC_IMPL_ADD (array, i, strcasecmp,
|
||||
- CPU_FEATURE_USABLE (AVX),
|
||||
- __strcasecmp_avx)
|
||||
IFUNC_IMPL_ADD (array, i, strcasecmp,
|
||||
CPU_FEATURE_USABLE (SSE4_2),
|
||||
__strcasecmp_sse42)
|
||||
@@ -453,9 +450,6 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array,
|
||||
(CPU_FEATURE_USABLE (AVX2)
|
||||
&& CPU_FEATURE_USABLE (RTM)),
|
||||
__strcasecmp_l_avx2_rtm)
|
||||
- IFUNC_IMPL_ADD (array, i, strcasecmp_l,
|
||||
- CPU_FEATURE_USABLE (AVX),
|
||||
- __strcasecmp_l_avx)
|
||||
IFUNC_IMPL_ADD (array, i, strcasecmp_l,
|
||||
CPU_FEATURE_USABLE (SSE4_2),
|
||||
__strcasecmp_l_sse42)
|
||||
@@ -591,9 +585,6 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array,
|
||||
(CPU_FEATURE_USABLE (AVX2)
|
||||
&& CPU_FEATURE_USABLE (RTM)),
|
||||
__strncasecmp_avx2_rtm)
|
||||
- IFUNC_IMPL_ADD (array, i, strncasecmp,
|
||||
- CPU_FEATURE_USABLE (AVX),
|
||||
- __strncasecmp_avx)
|
||||
IFUNC_IMPL_ADD (array, i, strncasecmp,
|
||||
CPU_FEATURE_USABLE (SSE4_2),
|
||||
__strncasecmp_sse42)
|
||||
@@ -616,9 +607,6 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array,
|
||||
(CPU_FEATURE_USABLE (AVX2)
|
||||
&& CPU_FEATURE_USABLE (RTM)),
|
||||
__strncasecmp_l_avx2_rtm)
|
||||
- IFUNC_IMPL_ADD (array, i, strncasecmp_l,
|
||||
- CPU_FEATURE_USABLE (AVX),
|
||||
- __strncasecmp_l_avx)
|
||||
IFUNC_IMPL_ADD (array, i, strncasecmp_l,
|
||||
CPU_FEATURE_USABLE (SSE4_2),
|
||||
__strncasecmp_l_sse42)
|
||||
diff --git a/sysdeps/x86_64/multiarch/ifunc-strcasecmp.h b/sysdeps/x86_64/multiarch/ifunc-strcasecmp.h
|
||||
index 488e99e4997f379b..40819caf5ab10337 100644
|
||||
--- a/sysdeps/x86_64/multiarch/ifunc-strcasecmp.h
|
||||
+++ b/sysdeps/x86_64/multiarch/ifunc-strcasecmp.h
|
||||
@@ -22,7 +22,6 @@
|
||||
extern __typeof (REDIRECT_NAME) OPTIMIZE (sse2) attribute_hidden;
|
||||
extern __typeof (REDIRECT_NAME) OPTIMIZE (ssse3) attribute_hidden;
|
||||
extern __typeof (REDIRECT_NAME) OPTIMIZE (sse42) attribute_hidden;
|
||||
-extern __typeof (REDIRECT_NAME) OPTIMIZE (avx) attribute_hidden;
|
||||
extern __typeof (REDIRECT_NAME) OPTIMIZE (avx2) attribute_hidden;
|
||||
extern __typeof (REDIRECT_NAME) OPTIMIZE (avx2_rtm) attribute_hidden;
|
||||
extern __typeof (REDIRECT_NAME) OPTIMIZE (evex) attribute_hidden;
|
||||
@@ -46,9 +45,6 @@ IFUNC_SELECTOR (void)
|
||||
return OPTIMIZE (avx2);
|
||||
}
|
||||
|
||||
- if (CPU_FEATURE_USABLE_P (cpu_features, AVX))
|
||||
- return OPTIMIZE (avx);
|
||||
-
|
||||
if (CPU_FEATURE_USABLE_P (cpu_features, SSE4_2)
|
||||
&& !CPU_FEATURES_ARCH_P (cpu_features, Slow_SSE4_2))
|
||||
return OPTIMIZE (sse42);
|
||||
diff --git a/sysdeps/x86_64/multiarch/strcasecmp_l-avx.S b/sysdeps/x86_64/multiarch/strcasecmp_l-avx.S
|
||||
deleted file mode 100644
|
||||
index 647aa05714d7a36c..0000000000000000
|
||||
--- a/sysdeps/x86_64/multiarch/strcasecmp_l-avx.S
|
||||
+++ /dev/null
|
||||
@@ -1,22 +0,0 @@
|
||||
-/* strcasecmp_l optimized with AVX.
|
||||
- Copyright (C) 2017-2021 Free Software Foundation, Inc.
|
||||
- This file is part of the GNU C Library.
|
||||
-
|
||||
- The GNU C Library is free software; you can redistribute it and/or
|
||||
- modify it under the terms of the GNU Lesser General Public
|
||||
- License as published by the Free Software Foundation; either
|
||||
- version 2.1 of the License, or (at your option) any later version.
|
||||
-
|
||||
- The GNU C Library is distributed in the hope that it will be useful,
|
||||
- but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||
- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
|
||||
- Lesser General Public License for more details.
|
||||
-
|
||||
- You should have received a copy of the GNU Lesser General Public
|
||||
- License along with the GNU C Library; if not, see
|
||||
- <https://www.gnu.org/licenses/>. */
|
||||
-
|
||||
-#define STRCMP_SSE42 __strcasecmp_l_avx
|
||||
-#define USE_AVX 1
|
||||
-#define USE_AS_STRCASECMP_L
|
||||
-#include "strcmp-sse42.S"
|
||||
diff --git a/sysdeps/x86_64/multiarch/strcmp-sse42.S b/sysdeps/x86_64/multiarch/strcmp-sse42.S
|
||||
index a6825de8195ad8c6..466c6a92a612ebcb 100644
|
||||
--- a/sysdeps/x86_64/multiarch/strcmp-sse42.S
|
||||
+++ b/sysdeps/x86_64/multiarch/strcmp-sse42.S
|
||||
@@ -42,13 +42,8 @@
|
||||
# define UPDATE_STRNCMP_COUNTER
|
||||
#endif
|
||||
|
||||
-#ifdef USE_AVX
|
||||
-# define SECTION avx
|
||||
-# define GLABEL(l) l##_avx
|
||||
-#else
|
||||
-# define SECTION sse4.2
|
||||
-# define GLABEL(l) l##_sse42
|
||||
-#endif
|
||||
+#define SECTION sse4.2
|
||||
+#define GLABEL(l) l##_sse42
|
||||
|
||||
#define LABEL(l) .L##l
|
||||
|
||||
@@ -106,21 +101,7 @@ END (GLABEL(__strncasecmp))
|
||||
#endif
|
||||
|
||||
|
||||
-#ifdef USE_AVX
|
||||
-# define movdqa vmovdqa
|
||||
-# define movdqu vmovdqu
|
||||
-# define pmovmskb vpmovmskb
|
||||
-# define pcmpistri vpcmpistri
|
||||
-# define psubb vpsubb
|
||||
-# define pcmpeqb vpcmpeqb
|
||||
-# define psrldq vpsrldq
|
||||
-# define pslldq vpslldq
|
||||
-# define palignr vpalignr
|
||||
-# define pxor vpxor
|
||||
-# define D(arg) arg, arg
|
||||
-#else
|
||||
-# define D(arg) arg
|
||||
-#endif
|
||||
+#define arg arg
|
||||
|
||||
STRCMP_SSE42:
|
||||
cfi_startproc
|
||||
@@ -192,18 +173,7 @@ LABEL(case_add):
|
||||
movdqu (%rdi), %xmm1
|
||||
movdqu (%rsi), %xmm2
|
||||
#if defined USE_AS_STRCASECMP_L || defined USE_AS_STRNCASECMP_L
|
||||
-# ifdef USE_AVX
|
||||
-# define TOLOWER(reg1, reg2) \
|
||||
- vpaddb LCASE_MIN_reg, reg1, %xmm7; \
|
||||
- vpaddb LCASE_MIN_reg, reg2, %xmm8; \
|
||||
- vpcmpgtb LCASE_MAX_reg, %xmm7, %xmm7; \
|
||||
- vpcmpgtb LCASE_MAX_reg, %xmm8, %xmm8; \
|
||||
- vpandn CASE_ADD_reg, %xmm7, %xmm7; \
|
||||
- vpandn CASE_ADD_reg, %xmm8, %xmm8; \
|
||||
- vpaddb %xmm7, reg1, reg1; \
|
||||
- vpaddb %xmm8, reg2, reg2
|
||||
-# else
|
||||
-# define TOLOWER(reg1, reg2) \
|
||||
+# define TOLOWER(reg1, reg2) \
|
||||
movdqa LCASE_MIN_reg, %xmm7; \
|
||||
movdqa LCASE_MIN_reg, %xmm8; \
|
||||
paddb reg1, %xmm7; \
|
||||
@@ -214,15 +184,15 @@ LABEL(case_add):
|
||||
pandn CASE_ADD_reg, %xmm8; \
|
||||
paddb %xmm7, reg1; \
|
||||
paddb %xmm8, reg2
|
||||
-# endif
|
||||
+
|
||||
TOLOWER (%xmm1, %xmm2)
|
||||
#else
|
||||
# define TOLOWER(reg1, reg2)
|
||||
#endif
|
||||
- pxor %xmm0, D(%xmm0) /* clear %xmm0 for null char checks */
|
||||
- pcmpeqb %xmm1, D(%xmm0) /* Any null chars? */
|
||||
- pcmpeqb %xmm2, D(%xmm1) /* compare first 16 bytes for equality */
|
||||
- psubb %xmm0, D(%xmm1) /* packed sub of comparison results*/
|
||||
+ pxor %xmm0, %xmm0 /* clear %xmm0 for null char checks */
|
||||
+ pcmpeqb %xmm1, %xmm0 /* Any null chars? */
|
||||
+ pcmpeqb %xmm2, %xmm1 /* compare first 16 bytes for equality */
|
||||
+ psubb %xmm0, %xmm1 /* packed sub of comparison results*/
|
||||
pmovmskb %xmm1, %edx
|
||||
sub $0xffff, %edx /* if first 16 bytes are same, edx == 0xffff */
|
||||
jnz LABEL(less16bytes)/* If not, find different value or null char */
|
||||
@@ -246,7 +216,7 @@ LABEL(crosscache):
|
||||
xor %r8d, %r8d
|
||||
and $0xf, %ecx /* offset of rsi */
|
||||
and $0xf, %eax /* offset of rdi */
|
||||
- pxor %xmm0, D(%xmm0) /* clear %xmm0 for null char check */
|
||||
+ pxor %xmm0, %xmm0 /* clear %xmm0 for null char check */
|
||||
cmp %eax, %ecx
|
||||
je LABEL(ashr_0) /* rsi and rdi relative offset same */
|
||||
ja LABEL(bigger)
|
||||
@@ -260,7 +230,7 @@ LABEL(bigger):
|
||||
sub %rcx, %r9
|
||||
lea LABEL(unaligned_table)(%rip), %r10
|
||||
movslq (%r10, %r9,4), %r9
|
||||
- pcmpeqb %xmm1, D(%xmm0) /* Any null chars? */
|
||||
+ pcmpeqb %xmm1, %xmm0 /* Any null chars? */
|
||||
lea (%r10, %r9), %r10
|
||||
_CET_NOTRACK jmp *%r10 /* jump to corresponding case */
|
||||
|
||||
@@ -273,15 +243,15 @@ LABEL(bigger):
|
||||
LABEL(ashr_0):
|
||||
|
||||
movdqa (%rsi), %xmm1
|
||||
- pcmpeqb %xmm1, D(%xmm0) /* Any null chars? */
|
||||
+ pcmpeqb %xmm1, %xmm0 /* Any null chars? */
|
||||
#if !defined USE_AS_STRCASECMP_L && !defined USE_AS_STRNCASECMP_L
|
||||
- pcmpeqb (%rdi), D(%xmm1) /* compare 16 bytes for equality */
|
||||
+ pcmpeqb (%rdi), %xmm1 /* compare 16 bytes for equality */
|
||||
#else
|
||||
movdqa (%rdi), %xmm2
|
||||
TOLOWER (%xmm1, %xmm2)
|
||||
- pcmpeqb %xmm2, D(%xmm1) /* compare 16 bytes for equality */
|
||||
+ pcmpeqb %xmm2, %xmm1 /* compare 16 bytes for equality */
|
||||
#endif
|
||||
- psubb %xmm0, D(%xmm1) /* packed sub of comparison results*/
|
||||
+ psubb %xmm0, %xmm1 /* packed sub of comparison results*/
|
||||
pmovmskb %xmm1, %r9d
|
||||
shr %cl, %edx /* adjust 0xffff for offset */
|
||||
shr %cl, %r9d /* adjust for 16-byte offset */
|
||||
@@ -361,10 +331,10 @@ LABEL(ashr_0_exit_use):
|
||||
*/
|
||||
.p2align 4
|
||||
LABEL(ashr_1):
|
||||
- pslldq $15, D(%xmm2) /* shift first string to align with second */
|
||||
+ pslldq $15, %xmm2 /* shift first string to align with second */
|
||||
TOLOWER (%xmm1, %xmm2)
|
||||
- pcmpeqb %xmm1, D(%xmm2) /* compare 16 bytes for equality */
|
||||
- psubb %xmm0, D(%xmm2) /* packed sub of comparison results*/
|
||||
+ pcmpeqb %xmm1, %xmm2 /* compare 16 bytes for equality */
|
||||
+ psubb %xmm0, %xmm2 /* packed sub of comparison results*/
|
||||
pmovmskb %xmm2, %r9d
|
||||
shr %cl, %edx /* adjust 0xffff for offset */
|
||||
shr %cl, %r9d /* adjust for 16-byte offset */
|
||||
@@ -392,7 +362,7 @@ LABEL(loop_ashr_1_use):
|
||||
|
||||
LABEL(nibble_ashr_1_restart_use):
|
||||
movdqa (%rdi, %rdx), %xmm0
|
||||
- palignr $1, -16(%rdi, %rdx), D(%xmm0)
|
||||
+ palignr $1, -16(%rdi, %rdx), %xmm0
|
||||
#if !defined USE_AS_STRCASECMP_L && !defined USE_AS_STRNCASECMP_L
|
||||
pcmpistri $0x1a,(%rsi,%rdx), %xmm0
|
||||
#else
|
||||
@@ -411,7 +381,7 @@ LABEL(nibble_ashr_1_restart_use):
|
||||
jg LABEL(nibble_ashr_1_use)
|
||||
|
||||
movdqa (%rdi, %rdx), %xmm0
|
||||
- palignr $1, -16(%rdi, %rdx), D(%xmm0)
|
||||
+ palignr $1, -16(%rdi, %rdx), %xmm0
|
||||
#if !defined USE_AS_STRCASECMP_L && !defined USE_AS_STRNCASECMP_L
|
||||
pcmpistri $0x1a,(%rsi,%rdx), %xmm0
|
||||
#else
|
||||
@@ -431,7 +401,7 @@ LABEL(nibble_ashr_1_restart_use):
|
||||
LABEL(nibble_ashr_1_use):
|
||||
sub $0x1000, %r10
|
||||
movdqa -16(%rdi, %rdx), %xmm0
|
||||
- psrldq $1, D(%xmm0)
|
||||
+ psrldq $1, %xmm0
|
||||
pcmpistri $0x3a,%xmm0, %xmm0
|
||||
#if defined USE_AS_STRNCMP || defined USE_AS_STRNCASECMP_L
|
||||
cmp %r11, %rcx
|
||||
@@ -449,10 +419,10 @@ LABEL(nibble_ashr_1_use):
|
||||
*/
|
||||
.p2align 4
|
||||
LABEL(ashr_2):
|
||||
- pslldq $14, D(%xmm2)
|
||||
+ pslldq $14, %xmm2
|
||||
TOLOWER (%xmm1, %xmm2)
|
||||
- pcmpeqb %xmm1, D(%xmm2)
|
||||
- psubb %xmm0, D(%xmm2)
|
||||
+ pcmpeqb %xmm1, %xmm2
|
||||
+ psubb %xmm0, %xmm2
|
||||
pmovmskb %xmm2, %r9d
|
||||
shr %cl, %edx
|
||||
shr %cl, %r9d
|
||||
@@ -480,7 +450,7 @@ LABEL(loop_ashr_2_use):
|
||||
|
||||
LABEL(nibble_ashr_2_restart_use):
|
||||
movdqa (%rdi, %rdx), %xmm0
|
||||
- palignr $2, -16(%rdi, %rdx), D(%xmm0)
|
||||
+ palignr $2, -16(%rdi, %rdx), %xmm0
|
||||
#if !defined USE_AS_STRCASECMP_L && !defined USE_AS_STRNCASECMP_L
|
||||
pcmpistri $0x1a,(%rsi,%rdx), %xmm0
|
||||
#else
|
||||
@@ -499,7 +469,7 @@ LABEL(nibble_ashr_2_restart_use):
|
||||
jg LABEL(nibble_ashr_2_use)
|
||||
|
||||
movdqa (%rdi, %rdx), %xmm0
|
||||
- palignr $2, -16(%rdi, %rdx), D(%xmm0)
|
||||
+ palignr $2, -16(%rdi, %rdx), %xmm0
|
||||
#if !defined USE_AS_STRCASECMP_L && !defined USE_AS_STRNCASECMP_L
|
||||
pcmpistri $0x1a,(%rsi,%rdx), %xmm0
|
||||
#else
|
||||
@@ -519,7 +489,7 @@ LABEL(nibble_ashr_2_restart_use):
|
||||
LABEL(nibble_ashr_2_use):
|
||||
sub $0x1000, %r10
|
||||
movdqa -16(%rdi, %rdx), %xmm0
|
||||
- psrldq $2, D(%xmm0)
|
||||
+ psrldq $2, %xmm0
|
||||
pcmpistri $0x3a,%xmm0, %xmm0
|
||||
#if defined USE_AS_STRNCMP || defined USE_AS_STRNCASECMP_L
|
||||
cmp %r11, %rcx
|
||||
@@ -537,10 +507,10 @@ LABEL(nibble_ashr_2_use):
|
||||
*/
|
||||
.p2align 4
|
||||
LABEL(ashr_3):
|
||||
- pslldq $13, D(%xmm2)
|
||||
+ pslldq $13, %xmm2
|
||||
TOLOWER (%xmm1, %xmm2)
|
||||
- pcmpeqb %xmm1, D(%xmm2)
|
||||
- psubb %xmm0, D(%xmm2)
|
||||
+ pcmpeqb %xmm1, %xmm2
|
||||
+ psubb %xmm0, %xmm2
|
||||
pmovmskb %xmm2, %r9d
|
||||
shr %cl, %edx
|
||||
shr %cl, %r9d
|
||||
@@ -568,7 +538,7 @@ LABEL(loop_ashr_3_use):
|
||||
|
||||
LABEL(nibble_ashr_3_restart_use):
|
||||
movdqa (%rdi, %rdx), %xmm0
|
||||
- palignr $3, -16(%rdi, %rdx), D(%xmm0)
|
||||
+ palignr $3, -16(%rdi, %rdx), %xmm0
|
||||
#if !defined USE_AS_STRCASECMP_L && !defined USE_AS_STRNCASECMP_L
|
||||
pcmpistri $0x1a,(%rsi,%rdx), %xmm0
|
||||
#else
|
||||
@@ -587,7 +557,7 @@ LABEL(nibble_ashr_3_restart_use):
|
||||
jg LABEL(nibble_ashr_3_use)
|
||||
|
||||
movdqa (%rdi, %rdx), %xmm0
|
||||
- palignr $3, -16(%rdi, %rdx), D(%xmm0)
|
||||
+ palignr $3, -16(%rdi, %rdx), %xmm0
|
||||
#if !defined USE_AS_STRCASECMP_L && !defined USE_AS_STRNCASECMP_L
|
||||
pcmpistri $0x1a,(%rsi,%rdx), %xmm0
|
||||
#else
|
||||
@@ -607,7 +577,7 @@ LABEL(nibble_ashr_3_restart_use):
|
||||
LABEL(nibble_ashr_3_use):
|
||||
sub $0x1000, %r10
|
||||
movdqa -16(%rdi, %rdx), %xmm0
|
||||
- psrldq $3, D(%xmm0)
|
||||
+ psrldq $3, %xmm0
|
||||
pcmpistri $0x3a,%xmm0, %xmm0
|
||||
#if defined USE_AS_STRNCMP || defined USE_AS_STRNCASECMP_L
|
||||
cmp %r11, %rcx
|
||||
@@ -625,10 +595,10 @@ LABEL(nibble_ashr_3_use):
|
||||
*/
|
||||
.p2align 4
|
||||
LABEL(ashr_4):
|
||||
- pslldq $12, D(%xmm2)
|
||||
+ pslldq $12, %xmm2
|
||||
TOLOWER (%xmm1, %xmm2)
|
||||
- pcmpeqb %xmm1, D(%xmm2)
|
||||
- psubb %xmm0, D(%xmm2)
|
||||
+ pcmpeqb %xmm1, %xmm2
|
||||
+ psubb %xmm0, %xmm2
|
||||
pmovmskb %xmm2, %r9d
|
||||
shr %cl, %edx
|
||||
shr %cl, %r9d
|
||||
@@ -657,7 +627,7 @@ LABEL(loop_ashr_4_use):
|
||||
|
||||
LABEL(nibble_ashr_4_restart_use):
|
||||
movdqa (%rdi, %rdx), %xmm0
|
||||
- palignr $4, -16(%rdi, %rdx), D(%xmm0)
|
||||
+ palignr $4, -16(%rdi, %rdx), %xmm0
|
||||
#if !defined USE_AS_STRCASECMP_L && !defined USE_AS_STRNCASECMP_L
|
||||
pcmpistri $0x1a,(%rsi,%rdx), %xmm0
|
||||
#else
|
||||
@@ -676,7 +646,7 @@ LABEL(nibble_ashr_4_restart_use):
|
||||
jg LABEL(nibble_ashr_4_use)
|
||||
|
||||
movdqa (%rdi, %rdx), %xmm0
|
||||
- palignr $4, -16(%rdi, %rdx), D(%xmm0)
|
||||
+ palignr $4, -16(%rdi, %rdx), %xmm0
|
||||
#if !defined USE_AS_STRCASECMP_L && !defined USE_AS_STRNCASECMP_L
|
||||
pcmpistri $0x1a,(%rsi,%rdx), %xmm0
|
||||
#else
|
||||
@@ -696,7 +666,7 @@ LABEL(nibble_ashr_4_restart_use):
|
||||
LABEL(nibble_ashr_4_use):
|
||||
sub $0x1000, %r10
|
||||
movdqa -16(%rdi, %rdx), %xmm0
|
||||
- psrldq $4, D(%xmm0)
|
||||
+ psrldq $4, %xmm0
|
||||
pcmpistri $0x3a,%xmm0, %xmm0
|
||||
#if defined USE_AS_STRNCMP || defined USE_AS_STRNCASECMP_L
|
||||
cmp %r11, %rcx
|
||||
@@ -714,10 +684,10 @@ LABEL(nibble_ashr_4_use):
|
||||
*/
|
||||
.p2align 4
|
||||
LABEL(ashr_5):
|
||||
- pslldq $11, D(%xmm2)
|
||||
+ pslldq $11, %xmm2
|
||||
TOLOWER (%xmm1, %xmm2)
|
||||
- pcmpeqb %xmm1, D(%xmm2)
|
||||
- psubb %xmm0, D(%xmm2)
|
||||
+ pcmpeqb %xmm1, %xmm2
|
||||
+ psubb %xmm0, %xmm2
|
||||
pmovmskb %xmm2, %r9d
|
||||
shr %cl, %edx
|
||||
shr %cl, %r9d
|
||||
@@ -746,7 +716,7 @@ LABEL(loop_ashr_5_use):
|
||||
|
||||
LABEL(nibble_ashr_5_restart_use):
|
||||
movdqa (%rdi, %rdx), %xmm0
|
||||
- palignr $5, -16(%rdi, %rdx), D(%xmm0)
|
||||
+ palignr $5, -16(%rdi, %rdx), %xmm0
|
||||
#if !defined USE_AS_STRCASECMP_L && !defined USE_AS_STRNCASECMP_L
|
||||
pcmpistri $0x1a,(%rsi,%rdx), %xmm0
|
||||
#else
|
||||
@@ -766,7 +736,7 @@ LABEL(nibble_ashr_5_restart_use):
|
||||
|
||||
movdqa (%rdi, %rdx), %xmm0
|
||||
|
||||
- palignr $5, -16(%rdi, %rdx), D(%xmm0)
|
||||
+ palignr $5, -16(%rdi, %rdx), %xmm0
|
||||
#if !defined USE_AS_STRCASECMP_L && !defined USE_AS_STRNCASECMP_L
|
||||
pcmpistri $0x1a,(%rsi,%rdx), %xmm0
|
||||
#else
|
||||
@@ -786,7 +756,7 @@ LABEL(nibble_ashr_5_restart_use):
|
||||
LABEL(nibble_ashr_5_use):
|
||||
sub $0x1000, %r10
|
||||
movdqa -16(%rdi, %rdx), %xmm0
|
||||
- psrldq $5, D(%xmm0)
|
||||
+ psrldq $5, %xmm0
|
||||
pcmpistri $0x3a,%xmm0, %xmm0
|
||||
#if defined USE_AS_STRNCMP || defined USE_AS_STRNCASECMP_L
|
||||
cmp %r11, %rcx
|
||||
@@ -804,10 +774,10 @@ LABEL(nibble_ashr_5_use):
|
||||
*/
|
||||
.p2align 4
|
||||
LABEL(ashr_6):
|
||||
- pslldq $10, D(%xmm2)
|
||||
+ pslldq $10, %xmm2
|
||||
TOLOWER (%xmm1, %xmm2)
|
||||
- pcmpeqb %xmm1, D(%xmm2)
|
||||
- psubb %xmm0, D(%xmm2)
|
||||
+ pcmpeqb %xmm1, %xmm2
|
||||
+ psubb %xmm0, %xmm2
|
||||
pmovmskb %xmm2, %r9d
|
||||
shr %cl, %edx
|
||||
shr %cl, %r9d
|
||||
@@ -836,7 +806,7 @@ LABEL(loop_ashr_6_use):
|
||||
|
||||
LABEL(nibble_ashr_6_restart_use):
|
||||
movdqa (%rdi, %rdx), %xmm0
|
||||
- palignr $6, -16(%rdi, %rdx), D(%xmm0)
|
||||
+ palignr $6, -16(%rdi, %rdx), %xmm0
|
||||
#if !defined USE_AS_STRCASECMP_L && !defined USE_AS_STRNCASECMP_L
|
||||
pcmpistri $0x1a,(%rsi,%rdx), %xmm0
|
||||
#else
|
||||
@@ -855,7 +825,7 @@ LABEL(nibble_ashr_6_restart_use):
|
||||
jg LABEL(nibble_ashr_6_use)
|
||||
|
||||
movdqa (%rdi, %rdx), %xmm0
|
||||
- palignr $6, -16(%rdi, %rdx), D(%xmm0)
|
||||
+ palignr $6, -16(%rdi, %rdx), %xmm0
|
||||
#if !defined USE_AS_STRCASECMP_L && !defined USE_AS_STRNCASECMP_L
|
||||
pcmpistri $0x1a,(%rsi,%rdx), %xmm0
|
||||
#else
|
||||
@@ -875,7 +845,7 @@ LABEL(nibble_ashr_6_restart_use):
|
||||
LABEL(nibble_ashr_6_use):
|
||||
sub $0x1000, %r10
|
||||
movdqa -16(%rdi, %rdx), %xmm0
|
||||
- psrldq $6, D(%xmm0)
|
||||
+ psrldq $6, %xmm0
|
||||
pcmpistri $0x3a,%xmm0, %xmm0
|
||||
#if defined USE_AS_STRNCMP || defined USE_AS_STRNCASECMP_L
|
||||
cmp %r11, %rcx
|
||||
@@ -893,10 +863,10 @@ LABEL(nibble_ashr_6_use):
|
||||
*/
|
||||
.p2align 4
|
||||
LABEL(ashr_7):
|
||||
- pslldq $9, D(%xmm2)
|
||||
+ pslldq $9, %xmm2
|
||||
TOLOWER (%xmm1, %xmm2)
|
||||
- pcmpeqb %xmm1, D(%xmm2)
|
||||
- psubb %xmm0, D(%xmm2)
|
||||
+ pcmpeqb %xmm1, %xmm2
|
||||
+ psubb %xmm0, %xmm2
|
||||
pmovmskb %xmm2, %r9d
|
||||
shr %cl, %edx
|
||||
shr %cl, %r9d
|
||||
@@ -925,7 +895,7 @@ LABEL(loop_ashr_7_use):
|
||||
|
||||
LABEL(nibble_ashr_7_restart_use):
|
||||
movdqa (%rdi, %rdx), %xmm0
|
||||
- palignr $7, -16(%rdi, %rdx), D(%xmm0)
|
||||
+ palignr $7, -16(%rdi, %rdx), %xmm0
|
||||
#if !defined USE_AS_STRCASECMP_L && !defined USE_AS_STRNCASECMP_L
|
||||
pcmpistri $0x1a,(%rsi,%rdx), %xmm0
|
||||
#else
|
||||
@@ -944,7 +914,7 @@ LABEL(nibble_ashr_7_restart_use):
|
||||
jg LABEL(nibble_ashr_7_use)
|
||||
|
||||
movdqa (%rdi, %rdx), %xmm0
|
||||
- palignr $7, -16(%rdi, %rdx), D(%xmm0)
|
||||
+ palignr $7, -16(%rdi, %rdx), %xmm0
|
||||
#if !defined USE_AS_STRCASECMP_L && !defined USE_AS_STRNCASECMP_L
|
||||
pcmpistri $0x1a,(%rsi,%rdx), %xmm0
|
||||
#else
|
||||
@@ -964,7 +934,7 @@ LABEL(nibble_ashr_7_restart_use):
|
||||
LABEL(nibble_ashr_7_use):
|
||||
sub $0x1000, %r10
|
||||
movdqa -16(%rdi, %rdx), %xmm0
|
||||
- psrldq $7, D(%xmm0)
|
||||
+ psrldq $7, %xmm0
|
||||
pcmpistri $0x3a,%xmm0, %xmm0
|
||||
#if defined USE_AS_STRNCMP || defined USE_AS_STRNCASECMP_L
|
||||
cmp %r11, %rcx
|
||||
@@ -982,10 +952,10 @@ LABEL(nibble_ashr_7_use):
|
||||
*/
|
||||
.p2align 4
|
||||
LABEL(ashr_8):
|
||||
- pslldq $8, D(%xmm2)
|
||||
+ pslldq $8, %xmm2
|
||||
TOLOWER (%xmm1, %xmm2)
|
||||
- pcmpeqb %xmm1, D(%xmm2)
|
||||
- psubb %xmm0, D(%xmm2)
|
||||
+ pcmpeqb %xmm1, %xmm2
|
||||
+ psubb %xmm0, %xmm2
|
||||
pmovmskb %xmm2, %r9d
|
||||
shr %cl, %edx
|
||||
shr %cl, %r9d
|
||||
@@ -1014,7 +984,7 @@ LABEL(loop_ashr_8_use):
|
||||
|
||||
LABEL(nibble_ashr_8_restart_use):
|
||||
movdqa (%rdi, %rdx), %xmm0
|
||||
- palignr $8, -16(%rdi, %rdx), D(%xmm0)
|
||||
+ palignr $8, -16(%rdi, %rdx), %xmm0
|
||||
#if !defined USE_AS_STRCASECMP_L && !defined USE_AS_STRNCASECMP_L
|
||||
pcmpistri $0x1a, (%rsi,%rdx), %xmm0
|
||||
#else
|
||||
@@ -1033,7 +1003,7 @@ LABEL(nibble_ashr_8_restart_use):
|
||||
jg LABEL(nibble_ashr_8_use)
|
||||
|
||||
movdqa (%rdi, %rdx), %xmm0
|
||||
- palignr $8, -16(%rdi, %rdx), D(%xmm0)
|
||||
+ palignr $8, -16(%rdi, %rdx), %xmm0
|
||||
#if !defined USE_AS_STRCASECMP_L && !defined USE_AS_STRNCASECMP_L
|
||||
pcmpistri $0x1a, (%rsi,%rdx), %xmm0
|
||||
#else
|
||||
@@ -1053,7 +1023,7 @@ LABEL(nibble_ashr_8_restart_use):
|
||||
LABEL(nibble_ashr_8_use):
|
||||
sub $0x1000, %r10
|
||||
movdqa -16(%rdi, %rdx), %xmm0
|
||||
- psrldq $8, D(%xmm0)
|
||||
+ psrldq $8, %xmm0
|
||||
pcmpistri $0x3a,%xmm0, %xmm0
|
||||
#if defined USE_AS_STRNCMP || defined USE_AS_STRNCASECMP_L
|
||||
cmp %r11, %rcx
|
||||
@@ -1071,10 +1041,10 @@ LABEL(nibble_ashr_8_use):
|
||||
*/
|
||||
.p2align 4
|
||||
LABEL(ashr_9):
|
||||
- pslldq $7, D(%xmm2)
|
||||
+ pslldq $7, %xmm2
|
||||
TOLOWER (%xmm1, %xmm2)
|
||||
- pcmpeqb %xmm1, D(%xmm2)
|
||||
- psubb %xmm0, D(%xmm2)
|
||||
+ pcmpeqb %xmm1, %xmm2
|
||||
+ psubb %xmm0, %xmm2
|
||||
pmovmskb %xmm2, %r9d
|
||||
shr %cl, %edx
|
||||
shr %cl, %r9d
|
||||
@@ -1104,7 +1074,7 @@ LABEL(loop_ashr_9_use):
|
||||
LABEL(nibble_ashr_9_restart_use):
|
||||
movdqa (%rdi, %rdx), %xmm0
|
||||
|
||||
- palignr $9, -16(%rdi, %rdx), D(%xmm0)
|
||||
+ palignr $9, -16(%rdi, %rdx), %xmm0
|
||||
#if !defined USE_AS_STRCASECMP_L && !defined USE_AS_STRNCASECMP_L
|
||||
pcmpistri $0x1a, (%rsi,%rdx), %xmm0
|
||||
#else
|
||||
@@ -1123,7 +1093,7 @@ LABEL(nibble_ashr_9_restart_use):
|
||||
jg LABEL(nibble_ashr_9_use)
|
||||
|
||||
movdqa (%rdi, %rdx), %xmm0
|
||||
- palignr $9, -16(%rdi, %rdx), D(%xmm0)
|
||||
+ palignr $9, -16(%rdi, %rdx), %xmm0
|
||||
#if !defined USE_AS_STRCASECMP_L && !defined USE_AS_STRNCASECMP_L
|
||||
pcmpistri $0x1a, (%rsi,%rdx), %xmm0
|
||||
#else
|
||||
@@ -1143,7 +1113,7 @@ LABEL(nibble_ashr_9_restart_use):
|
||||
LABEL(nibble_ashr_9_use):
|
||||
sub $0x1000, %r10
|
||||
movdqa -16(%rdi, %rdx), %xmm0
|
||||
- psrldq $9, D(%xmm0)
|
||||
+ psrldq $9, %xmm0
|
||||
pcmpistri $0x3a,%xmm0, %xmm0
|
||||
#if defined USE_AS_STRNCMP || defined USE_AS_STRNCASECMP_L
|
||||
cmp %r11, %rcx
|
||||
@@ -1161,10 +1131,10 @@ LABEL(nibble_ashr_9_use):
|
||||
*/
|
||||
.p2align 4
|
||||
LABEL(ashr_10):
|
||||
- pslldq $6, D(%xmm2)
|
||||
+ pslldq $6, %xmm2
|
||||
TOLOWER (%xmm1, %xmm2)
|
||||
- pcmpeqb %xmm1, D(%xmm2)
|
||||
- psubb %xmm0, D(%xmm2)
|
||||
+ pcmpeqb %xmm1, %xmm2
|
||||
+ psubb %xmm0, %xmm2
|
||||
pmovmskb %xmm2, %r9d
|
||||
shr %cl, %edx
|
||||
shr %cl, %r9d
|
||||
@@ -1193,7 +1163,7 @@ LABEL(loop_ashr_10_use):
|
||||
|
||||
LABEL(nibble_ashr_10_restart_use):
|
||||
movdqa (%rdi, %rdx), %xmm0
|
||||
- palignr $10, -16(%rdi, %rdx), D(%xmm0)
|
||||
+ palignr $10, -16(%rdi, %rdx), %xmm0
|
||||
#if !defined USE_AS_STRCASECMP_L && !defined USE_AS_STRNCASECMP_L
|
||||
pcmpistri $0x1a, (%rsi,%rdx), %xmm0
|
||||
#else
|
||||
@@ -1212,7 +1182,7 @@ LABEL(nibble_ashr_10_restart_use):
|
||||
jg LABEL(nibble_ashr_10_use)
|
||||
|
||||
movdqa (%rdi, %rdx), %xmm0
|
||||
- palignr $10, -16(%rdi, %rdx), D(%xmm0)
|
||||
+ palignr $10, -16(%rdi, %rdx), %xmm0
|
||||
#if !defined USE_AS_STRCASECMP_L && !defined USE_AS_STRNCASECMP_L
|
||||
pcmpistri $0x1a, (%rsi,%rdx), %xmm0
|
||||
#else
|
||||
@@ -1232,7 +1202,7 @@ LABEL(nibble_ashr_10_restart_use):
|
||||
LABEL(nibble_ashr_10_use):
|
||||
sub $0x1000, %r10
|
||||
movdqa -16(%rdi, %rdx), %xmm0
|
||||
- psrldq $10, D(%xmm0)
|
||||
+ psrldq $10, %xmm0
|
||||
pcmpistri $0x3a,%xmm0, %xmm0
|
||||
#if defined USE_AS_STRNCMP || defined USE_AS_STRNCASECMP_L
|
||||
cmp %r11, %rcx
|
||||
@@ -1250,10 +1220,10 @@ LABEL(nibble_ashr_10_use):
|
||||
*/
|
||||
.p2align 4
|
||||
LABEL(ashr_11):
|
||||
- pslldq $5, D(%xmm2)
|
||||
+ pslldq $5, %xmm2
|
||||
TOLOWER (%xmm1, %xmm2)
|
||||
- pcmpeqb %xmm1, D(%xmm2)
|
||||
- psubb %xmm0, D(%xmm2)
|
||||
+ pcmpeqb %xmm1, %xmm2
|
||||
+ psubb %xmm0, %xmm2
|
||||
pmovmskb %xmm2, %r9d
|
||||
shr %cl, %edx
|
||||
shr %cl, %r9d
|
||||
@@ -1282,7 +1252,7 @@ LABEL(loop_ashr_11_use):
|
||||
|
||||
LABEL(nibble_ashr_11_restart_use):
|
||||
movdqa (%rdi, %rdx), %xmm0
|
||||
- palignr $11, -16(%rdi, %rdx), D(%xmm0)
|
||||
+ palignr $11, -16(%rdi, %rdx), %xmm0
|
||||
#if !defined USE_AS_STRCASECMP_L && !defined USE_AS_STRNCASECMP_L
|
||||
pcmpistri $0x1a, (%rsi,%rdx), %xmm0
|
||||
#else
|
||||
@@ -1301,7 +1271,7 @@ LABEL(nibble_ashr_11_restart_use):
|
||||
jg LABEL(nibble_ashr_11_use)
|
||||
|
||||
movdqa (%rdi, %rdx), %xmm0
|
||||
- palignr $11, -16(%rdi, %rdx), D(%xmm0)
|
||||
+ palignr $11, -16(%rdi, %rdx), %xmm0
|
||||
#if !defined USE_AS_STRCASECMP_L && !defined USE_AS_STRNCASECMP_L
|
||||
pcmpistri $0x1a, (%rsi,%rdx), %xmm0
|
||||
#else
|
||||
@@ -1321,7 +1291,7 @@ LABEL(nibble_ashr_11_restart_use):
|
||||
LABEL(nibble_ashr_11_use):
|
||||
sub $0x1000, %r10
|
||||
movdqa -16(%rdi, %rdx), %xmm0
|
||||
- psrldq $11, D(%xmm0)
|
||||
+ psrldq $11, %xmm0
|
||||
pcmpistri $0x3a,%xmm0, %xmm0
|
||||
#if defined USE_AS_STRNCMP || defined USE_AS_STRNCASECMP_L
|
||||
cmp %r11, %rcx
|
||||
@@ -1339,10 +1309,10 @@ LABEL(nibble_ashr_11_use):
|
||||
*/
|
||||
.p2align 4
|
||||
LABEL(ashr_12):
|
||||
- pslldq $4, D(%xmm2)
|
||||
+ pslldq $4, %xmm2
|
||||
TOLOWER (%xmm1, %xmm2)
|
||||
- pcmpeqb %xmm1, D(%xmm2)
|
||||
- psubb %xmm0, D(%xmm2)
|
||||
+ pcmpeqb %xmm1, %xmm2
|
||||
+ psubb %xmm0, %xmm2
|
||||
pmovmskb %xmm2, %r9d
|
||||
shr %cl, %edx
|
||||
shr %cl, %r9d
|
||||
@@ -1371,7 +1341,7 @@ LABEL(loop_ashr_12_use):
|
||||
|
||||
LABEL(nibble_ashr_12_restart_use):
|
||||
movdqa (%rdi, %rdx), %xmm0
|
||||
- palignr $12, -16(%rdi, %rdx), D(%xmm0)
|
||||
+ palignr $12, -16(%rdi, %rdx), %xmm0
|
||||
#if !defined USE_AS_STRCASECMP_L && !defined USE_AS_STRNCASECMP_L
|
||||
pcmpistri $0x1a, (%rsi,%rdx), %xmm0
|
||||
#else
|
||||
@@ -1390,7 +1360,7 @@ LABEL(nibble_ashr_12_restart_use):
|
||||
jg LABEL(nibble_ashr_12_use)
|
||||
|
||||
movdqa (%rdi, %rdx), %xmm0
|
||||
- palignr $12, -16(%rdi, %rdx), D(%xmm0)
|
||||
+ palignr $12, -16(%rdi, %rdx), %xmm0
|
||||
#if !defined USE_AS_STRCASECMP_L && !defined USE_AS_STRNCASECMP_L
|
||||
pcmpistri $0x1a, (%rsi,%rdx), %xmm0
|
||||
#else
|
||||
@@ -1410,7 +1380,7 @@ LABEL(nibble_ashr_12_restart_use):
|
||||
LABEL(nibble_ashr_12_use):
|
||||
sub $0x1000, %r10
|
||||
movdqa -16(%rdi, %rdx), %xmm0
|
||||
- psrldq $12, D(%xmm0)
|
||||
+ psrldq $12, %xmm0
|
||||
pcmpistri $0x3a,%xmm0, %xmm0
|
||||
#if defined USE_AS_STRNCMP || defined USE_AS_STRNCASECMP_L
|
||||
cmp %r11, %rcx
|
||||
@@ -1428,10 +1398,10 @@ LABEL(nibble_ashr_12_use):
|
||||
*/
|
||||
.p2align 4
|
||||
LABEL(ashr_13):
|
||||
- pslldq $3, D(%xmm2)
|
||||
+ pslldq $3, %xmm2
|
||||
TOLOWER (%xmm1, %xmm2)
|
||||
- pcmpeqb %xmm1, D(%xmm2)
|
||||
- psubb %xmm0, D(%xmm2)
|
||||
+ pcmpeqb %xmm1, %xmm2
|
||||
+ psubb %xmm0, %xmm2
|
||||
pmovmskb %xmm2, %r9d
|
||||
shr %cl, %edx
|
||||
shr %cl, %r9d
|
||||
@@ -1461,7 +1431,7 @@ LABEL(loop_ashr_13_use):
|
||||
|
||||
LABEL(nibble_ashr_13_restart_use):
|
||||
movdqa (%rdi, %rdx), %xmm0
|
||||
- palignr $13, -16(%rdi, %rdx), D(%xmm0)
|
||||
+ palignr $13, -16(%rdi, %rdx), %xmm0
|
||||
#if !defined USE_AS_STRCASECMP_L && !defined USE_AS_STRNCASECMP_L
|
||||
pcmpistri $0x1a, (%rsi,%rdx), %xmm0
|
||||
#else
|
||||
@@ -1480,7 +1450,7 @@ LABEL(nibble_ashr_13_restart_use):
|
||||
jg LABEL(nibble_ashr_13_use)
|
||||
|
||||
movdqa (%rdi, %rdx), %xmm0
|
||||
- palignr $13, -16(%rdi, %rdx), D(%xmm0)
|
||||
+ palignr $13, -16(%rdi, %rdx), %xmm0
|
||||
#if !defined USE_AS_STRCASECMP_L && !defined USE_AS_STRNCASECMP_L
|
||||
pcmpistri $0x1a, (%rsi,%rdx), %xmm0
|
||||
#else
|
||||
@@ -1500,7 +1470,7 @@ LABEL(nibble_ashr_13_restart_use):
|
||||
LABEL(nibble_ashr_13_use):
|
||||
sub $0x1000, %r10
|
||||
movdqa -16(%rdi, %rdx), %xmm0
|
||||
- psrldq $13, D(%xmm0)
|
||||
+ psrldq $13, %xmm0
|
||||
pcmpistri $0x3a,%xmm0, %xmm0
|
||||
#if defined USE_AS_STRNCMP || defined USE_AS_STRNCASECMP_L
|
||||
cmp %r11, %rcx
|
||||
@@ -1518,10 +1488,10 @@ LABEL(nibble_ashr_13_use):
|
||||
*/
|
||||
.p2align 4
|
||||
LABEL(ashr_14):
|
||||
- pslldq $2, D(%xmm2)
|
||||
+ pslldq $2, %xmm2
|
||||
TOLOWER (%xmm1, %xmm2)
|
||||
- pcmpeqb %xmm1, D(%xmm2)
|
||||
- psubb %xmm0, D(%xmm2)
|
||||
+ pcmpeqb %xmm1, %xmm2
|
||||
+ psubb %xmm0, %xmm2
|
||||
pmovmskb %xmm2, %r9d
|
||||
shr %cl, %edx
|
||||
shr %cl, %r9d
|
||||
@@ -1551,7 +1521,7 @@ LABEL(loop_ashr_14_use):
|
||||
|
||||
LABEL(nibble_ashr_14_restart_use):
|
||||
movdqa (%rdi, %rdx), %xmm0
|
||||
- palignr $14, -16(%rdi, %rdx), D(%xmm0)
|
||||
+ palignr $14, -16(%rdi, %rdx), %xmm0
|
||||
#if !defined USE_AS_STRCASECMP_L && !defined USE_AS_STRNCASECMP_L
|
||||
pcmpistri $0x1a, (%rsi,%rdx), %xmm0
|
||||
#else
|
||||
@@ -1570,7 +1540,7 @@ LABEL(nibble_ashr_14_restart_use):
|
||||
jg LABEL(nibble_ashr_14_use)
|
||||
|
||||
movdqa (%rdi, %rdx), %xmm0
|
||||
- palignr $14, -16(%rdi, %rdx), D(%xmm0)
|
||||
+ palignr $14, -16(%rdi, %rdx), %xmm0
|
||||
#if !defined USE_AS_STRCASECMP_L && !defined USE_AS_STRNCASECMP_L
|
||||
pcmpistri $0x1a, (%rsi,%rdx), %xmm0
|
||||
#else
|
||||
@@ -1590,7 +1560,7 @@ LABEL(nibble_ashr_14_restart_use):
|
||||
LABEL(nibble_ashr_14_use):
|
||||
sub $0x1000, %r10
|
||||
movdqa -16(%rdi, %rdx), %xmm0
|
||||
- psrldq $14, D(%xmm0)
|
||||
+ psrldq $14, %xmm0
|
||||
pcmpistri $0x3a,%xmm0, %xmm0
|
||||
#if defined USE_AS_STRNCMP || defined USE_AS_STRNCASECMP_L
|
||||
cmp %r11, %rcx
|
||||
@@ -1608,10 +1578,10 @@ LABEL(nibble_ashr_14_use):
|
||||
*/
|
||||
.p2align 4
|
||||
LABEL(ashr_15):
|
||||
- pslldq $1, D(%xmm2)
|
||||
+ pslldq $1, %xmm2
|
||||
TOLOWER (%xmm1, %xmm2)
|
||||
- pcmpeqb %xmm1, D(%xmm2)
|
||||
- psubb %xmm0, D(%xmm2)
|
||||
+ pcmpeqb %xmm1, %xmm2
|
||||
+ psubb %xmm0, %xmm2
|
||||
pmovmskb %xmm2, %r9d
|
||||
shr %cl, %edx
|
||||
shr %cl, %r9d
|
||||
@@ -1643,7 +1613,7 @@ LABEL(loop_ashr_15_use):
|
||||
|
||||
LABEL(nibble_ashr_15_restart_use):
|
||||
movdqa (%rdi, %rdx), %xmm0
|
||||
- palignr $15, -16(%rdi, %rdx), D(%xmm0)
|
||||
+ palignr $15, -16(%rdi, %rdx), %xmm0
|
||||
#if !defined USE_AS_STRCASECMP_L && !defined USE_AS_STRNCASECMP_L
|
||||
pcmpistri $0x1a, (%rsi,%rdx), %xmm0
|
||||
#else
|
||||
@@ -1662,7 +1632,7 @@ LABEL(nibble_ashr_15_restart_use):
|
||||
jg LABEL(nibble_ashr_15_use)
|
||||
|
||||
movdqa (%rdi, %rdx), %xmm0
|
||||
- palignr $15, -16(%rdi, %rdx), D(%xmm0)
|
||||
+ palignr $15, -16(%rdi, %rdx), %xmm0
|
||||
#if !defined USE_AS_STRCASECMP_L && !defined USE_AS_STRNCASECMP_L
|
||||
pcmpistri $0x1a, (%rsi,%rdx), %xmm0
|
||||
#else
|
||||
@@ -1682,7 +1652,7 @@ LABEL(nibble_ashr_15_restart_use):
|
||||
LABEL(nibble_ashr_15_use):
|
||||
sub $0x1000, %r10
|
||||
movdqa -16(%rdi, %rdx), %xmm0
|
||||
- psrldq $15, D(%xmm0)
|
||||
+ psrldq $15, %xmm0
|
||||
pcmpistri $0x3a,%xmm0, %xmm0
|
||||
#if defined USE_AS_STRNCMP || defined USE_AS_STRNCASECMP_L
|
||||
cmp %r11, %rcx
|
||||
diff --git a/sysdeps/x86_64/multiarch/strncase_l-avx.S b/sysdeps/x86_64/multiarch/strncase_l-avx.S
|
||||
deleted file mode 100644
|
||||
index f1d3fefdd94674b8..0000000000000000
|
||||
--- a/sysdeps/x86_64/multiarch/strncase_l-avx.S
|
||||
+++ /dev/null
|
||||
@@ -1,22 +0,0 @@
|
||||
-/* strncasecmp_l optimized with AVX.
|
||||
- Copyright (C) 2017-2021 Free Software Foundation, Inc.
|
||||
- This file is part of the GNU C Library.
|
||||
-
|
||||
- The GNU C Library is free software; you can redistribute it and/or
|
||||
- modify it under the terms of the GNU Lesser General Public
|
||||
- License as published by the Free Software Foundation; either
|
||||
- version 2.1 of the License, or (at your option) any later version.
|
||||
-
|
||||
- The GNU C Library is distributed in the hope that it will be useful,
|
||||
- but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||
- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
|
||||
- Lesser General Public License for more details.
|
||||
-
|
||||
- You should have received a copy of the GNU Lesser General Public
|
||||
- License along with the GNU C Library; if not, see
|
||||
- <https://www.gnu.org/licenses/>. */
|
||||
-
|
||||
-#define STRCMP_SSE42 __strncasecmp_l_avx
|
||||
-#define USE_AVX 1
|
||||
-#define USE_AS_STRNCASECMP_L
|
||||
-#include "strcmp-sse42.S"
|
253
glibc-upstream-2.34-230.patch
Normal file
253
glibc-upstream-2.34-230.patch
Normal file
@ -0,0 +1,253 @@
|
||||
commit 4ff6ae069b7caacd5f99088abd755717b994f660
|
||||
Author: Noah Goldstein <goldstein.w.n@gmail.com>
|
||||
Date: Fri Mar 25 17:13:33 2022 -0500
|
||||
|
||||
x86: Small improvements for wcslen
|
||||
|
||||
Just a few QOL changes.
|
||||
1. Prefer `add` > `lea` as it has high execution units it can run
|
||||
on.
|
||||
2. Don't break macro-fusion between `test` and `jcc`
|
||||
3. Reduce code size by removing gratuitous padding bytes (-90
|
||||
bytes).
|
||||
|
||||
geometric_mean(N=20) of all benchmarks New / Original: 0.959
|
||||
|
||||
All string/memory tests pass.
|
||||
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
|
||||
|
||||
(cherry picked from commit 244b415d386487521882debb845a040a4758cb18)
|
||||
|
||||
diff --git a/sysdeps/x86_64/wcslen.S b/sysdeps/x86_64/wcslen.S
|
||||
index 61edea1d14d454c6..ad066863a44ea0a5 100644
|
||||
--- a/sysdeps/x86_64/wcslen.S
|
||||
+++ b/sysdeps/x86_64/wcslen.S
|
||||
@@ -41,82 +41,82 @@ ENTRY (__wcslen)
|
||||
pxor %xmm0, %xmm0
|
||||
|
||||
lea 32(%rdi), %rax
|
||||
- lea 16(%rdi), %rcx
|
||||
+ addq $16, %rdi
|
||||
and $-16, %rax
|
||||
|
||||
pcmpeqd (%rax), %xmm0
|
||||
pmovmskb %xmm0, %edx
|
||||
pxor %xmm1, %xmm1
|
||||
+ addq $16, %rax
|
||||
test %edx, %edx
|
||||
- lea 16(%rax), %rax
|
||||
jnz L(exit)
|
||||
|
||||
pcmpeqd (%rax), %xmm1
|
||||
pmovmskb %xmm1, %edx
|
||||
pxor %xmm2, %xmm2
|
||||
+ addq $16, %rax
|
||||
test %edx, %edx
|
||||
- lea 16(%rax), %rax
|
||||
jnz L(exit)
|
||||
|
||||
pcmpeqd (%rax), %xmm2
|
||||
pmovmskb %xmm2, %edx
|
||||
pxor %xmm3, %xmm3
|
||||
+ addq $16, %rax
|
||||
test %edx, %edx
|
||||
- lea 16(%rax), %rax
|
||||
jnz L(exit)
|
||||
|
||||
pcmpeqd (%rax), %xmm3
|
||||
pmovmskb %xmm3, %edx
|
||||
+ addq $16, %rax
|
||||
test %edx, %edx
|
||||
- lea 16(%rax), %rax
|
||||
jnz L(exit)
|
||||
|
||||
pcmpeqd (%rax), %xmm0
|
||||
pmovmskb %xmm0, %edx
|
||||
+ addq $16, %rax
|
||||
test %edx, %edx
|
||||
- lea 16(%rax), %rax
|
||||
jnz L(exit)
|
||||
|
||||
pcmpeqd (%rax), %xmm1
|
||||
pmovmskb %xmm1, %edx
|
||||
+ addq $16, %rax
|
||||
test %edx, %edx
|
||||
- lea 16(%rax), %rax
|
||||
jnz L(exit)
|
||||
|
||||
pcmpeqd (%rax), %xmm2
|
||||
pmovmskb %xmm2, %edx
|
||||
+ addq $16, %rax
|
||||
test %edx, %edx
|
||||
- lea 16(%rax), %rax
|
||||
jnz L(exit)
|
||||
|
||||
pcmpeqd (%rax), %xmm3
|
||||
pmovmskb %xmm3, %edx
|
||||
+ addq $16, %rax
|
||||
test %edx, %edx
|
||||
- lea 16(%rax), %rax
|
||||
jnz L(exit)
|
||||
|
||||
pcmpeqd (%rax), %xmm0
|
||||
pmovmskb %xmm0, %edx
|
||||
+ addq $16, %rax
|
||||
test %edx, %edx
|
||||
- lea 16(%rax), %rax
|
||||
jnz L(exit)
|
||||
|
||||
pcmpeqd (%rax), %xmm1
|
||||
pmovmskb %xmm1, %edx
|
||||
+ addq $16, %rax
|
||||
test %edx, %edx
|
||||
- lea 16(%rax), %rax
|
||||
jnz L(exit)
|
||||
|
||||
pcmpeqd (%rax), %xmm2
|
||||
pmovmskb %xmm2, %edx
|
||||
+ addq $16, %rax
|
||||
test %edx, %edx
|
||||
- lea 16(%rax), %rax
|
||||
jnz L(exit)
|
||||
|
||||
pcmpeqd (%rax), %xmm3
|
||||
pmovmskb %xmm3, %edx
|
||||
+ addq $16, %rax
|
||||
test %edx, %edx
|
||||
- lea 16(%rax), %rax
|
||||
jnz L(exit)
|
||||
|
||||
and $-0x40, %rax
|
||||
@@ -133,104 +133,100 @@ L(aligned_64_loop):
|
||||
pminub %xmm0, %xmm2
|
||||
pcmpeqd %xmm3, %xmm2
|
||||
pmovmskb %xmm2, %edx
|
||||
+ addq $64, %rax
|
||||
test %edx, %edx
|
||||
- lea 64(%rax), %rax
|
||||
jz L(aligned_64_loop)
|
||||
|
||||
pcmpeqd -64(%rax), %xmm3
|
||||
pmovmskb %xmm3, %edx
|
||||
+ addq $48, %rdi
|
||||
test %edx, %edx
|
||||
- lea 48(%rcx), %rcx
|
||||
jnz L(exit)
|
||||
|
||||
pcmpeqd %xmm1, %xmm3
|
||||
pmovmskb %xmm3, %edx
|
||||
+ addq $-16, %rdi
|
||||
test %edx, %edx
|
||||
- lea -16(%rcx), %rcx
|
||||
jnz L(exit)
|
||||
|
||||
pcmpeqd -32(%rax), %xmm3
|
||||
pmovmskb %xmm3, %edx
|
||||
+ addq $-16, %rdi
|
||||
test %edx, %edx
|
||||
- lea -16(%rcx), %rcx
|
||||
jnz L(exit)
|
||||
|
||||
pcmpeqd %xmm6, %xmm3
|
||||
pmovmskb %xmm3, %edx
|
||||
+ addq $-16, %rdi
|
||||
test %edx, %edx
|
||||
- lea -16(%rcx), %rcx
|
||||
- jnz L(exit)
|
||||
-
|
||||
- jmp L(aligned_64_loop)
|
||||
+ jz L(aligned_64_loop)
|
||||
|
||||
.p2align 4
|
||||
L(exit):
|
||||
- sub %rcx, %rax
|
||||
+ sub %rdi, %rax
|
||||
shr $2, %rax
|
||||
test %dl, %dl
|
||||
jz L(exit_high)
|
||||
|
||||
- mov %dl, %cl
|
||||
- and $15, %cl
|
||||
+ andl $15, %edx
|
||||
jz L(exit_1)
|
||||
ret
|
||||
|
||||
- .p2align 4
|
||||
+ /* No align here. Naturally aligned % 16 == 1. */
|
||||
L(exit_high):
|
||||
- mov %dh, %ch
|
||||
- and $15, %ch
|
||||
+ andl $(15 << 8), %edx
|
||||
jz L(exit_3)
|
||||
add $2, %rax
|
||||
ret
|
||||
|
||||
- .p2align 4
|
||||
+ .p2align 3
|
||||
L(exit_1):
|
||||
add $1, %rax
|
||||
ret
|
||||
|
||||
- .p2align 4
|
||||
+ .p2align 3
|
||||
L(exit_3):
|
||||
add $3, %rax
|
||||
ret
|
||||
|
||||
- .p2align 4
|
||||
+ .p2align 3
|
||||
L(exit_tail0):
|
||||
- xor %rax, %rax
|
||||
+ xorl %eax, %eax
|
||||
ret
|
||||
|
||||
- .p2align 4
|
||||
+ .p2align 3
|
||||
L(exit_tail1):
|
||||
- mov $1, %rax
|
||||
+ movl $1, %eax
|
||||
ret
|
||||
|
||||
- .p2align 4
|
||||
+ .p2align 3
|
||||
L(exit_tail2):
|
||||
- mov $2, %rax
|
||||
+ movl $2, %eax
|
||||
ret
|
||||
|
||||
- .p2align 4
|
||||
+ .p2align 3
|
||||
L(exit_tail3):
|
||||
- mov $3, %rax
|
||||
+ movl $3, %eax
|
||||
ret
|
||||
|
||||
- .p2align 4
|
||||
+ .p2align 3
|
||||
L(exit_tail4):
|
||||
- mov $4, %rax
|
||||
+ movl $4, %eax
|
||||
ret
|
||||
|
||||
- .p2align 4
|
||||
+ .p2align 3
|
||||
L(exit_tail5):
|
||||
- mov $5, %rax
|
||||
+ movl $5, %eax
|
||||
ret
|
||||
|
||||
- .p2align 4
|
||||
+ .p2align 3
|
||||
L(exit_tail6):
|
||||
- mov $6, %rax
|
||||
+ movl $6, %eax
|
||||
ret
|
||||
|
||||
- .p2align 4
|
||||
+ .p2align 3
|
||||
L(exit_tail7):
|
||||
- mov $7, %rax
|
||||
+ movl $7, %eax
|
||||
ret
|
||||
|
||||
END (__wcslen)
|
956
glibc-upstream-2.34-231.patch
Normal file
956
glibc-upstream-2.34-231.patch
Normal file
@ -0,0 +1,956 @@
|
||||
commit ffe75982cc0bb2d25d55ed566a3731b9c3017e6f
|
||||
Author: Noah Goldstein <goldstein.w.n@gmail.com>
|
||||
Date: Fri Apr 15 12:28:00 2022 -0500
|
||||
|
||||
x86: Remove memcmp-sse4.S
|
||||
|
||||
Code didn't actually use any sse4 instructions since `ptest` was
|
||||
removed in:
|
||||
|
||||
commit 2f9062d7171850451e6044ef78d91ff8c017b9c0
|
||||
Author: Noah Goldstein <goldstein.w.n@gmail.com>
|
||||
Date: Wed Nov 10 16:18:56 2021 -0600
|
||||
|
||||
x86: Shrink memcmp-sse4.S code size
|
||||
|
||||
The new memcmp-sse2 implementation is also faster.
|
||||
|
||||
geometric_mean(N=20) of page cross cases SSE2 / SSE4: 0.905
|
||||
|
||||
Note there are two regressions preferring SSE2 for Size = 1 and Size =
|
||||
65.
|
||||
|
||||
Size = 1:
|
||||
size, align0, align1, ret, New Time/Old Time
|
||||
1, 1, 1, 0, 1.2
|
||||
1, 1, 1, 1, 1.197
|
||||
1, 1, 1, -1, 1.2
|
||||
|
||||
This is intentional. Size == 1 is significantly less hot based on
|
||||
profiles of GCC11 and Python3 than sizes [4, 8] (which is made
|
||||
hotter).
|
||||
|
||||
Python3 Size = 1 -> 13.64%
|
||||
Python3 Size = [4, 8] -> 60.92%
|
||||
|
||||
GCC11 Size = 1 -> 1.29%
|
||||
GCC11 Size = [4, 8] -> 33.86%
|
||||
|
||||
size, align0, align1, ret, New Time/Old Time
|
||||
4, 4, 4, 0, 0.622
|
||||
4, 4, 4, 1, 0.797
|
||||
4, 4, 4, -1, 0.805
|
||||
5, 5, 5, 0, 0.623
|
||||
5, 5, 5, 1, 0.777
|
||||
5, 5, 5, -1, 0.802
|
||||
6, 6, 6, 0, 0.625
|
||||
6, 6, 6, 1, 0.813
|
||||
6, 6, 6, -1, 0.788
|
||||
7, 7, 7, 0, 0.625
|
||||
7, 7, 7, 1, 0.799
|
||||
7, 7, 7, -1, 0.795
|
||||
8, 8, 8, 0, 0.625
|
||||
8, 8, 8, 1, 0.848
|
||||
8, 8, 8, -1, 0.914
|
||||
9, 9, 9, 0, 0.625
|
||||
|
||||
Size = 65:
|
||||
size, align0, align1, ret, New Time/Old Time
|
||||
65, 0, 0, 0, 1.103
|
||||
65, 0, 0, 1, 1.216
|
||||
65, 0, 0, -1, 1.227
|
||||
65, 65, 0, 0, 1.091
|
||||
65, 0, 65, 1, 1.19
|
||||
65, 65, 65, -1, 1.215
|
||||
|
||||
This is because A) the checks in range [65, 96] are now unrolled 2x
|
||||
and B) because smaller values <= 16 are now given a hotter path. By
|
||||
contrast the SSE4 version has a branch for Size = 80. The unrolled
|
||||
version has get better performance for returns which need both
|
||||
comparisons.
|
||||
|
||||
size, align0, align1, ret, New Time/Old Time
|
||||
128, 4, 8, 0, 0.858
|
||||
128, 4, 8, 1, 0.879
|
||||
128, 4, 8, -1, 0.888
|
||||
|
||||
As well, out of microbenchmark environments that are not full
|
||||
predictable the branch will have a real-cost.
|
||||
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
|
||||
|
||||
(cherry picked from commit 7cbc03d03091d5664060924789afe46d30a5477e)
|
||||
|
||||
diff --git a/sysdeps/x86_64/multiarch/Makefile b/sysdeps/x86_64/multiarch/Makefile
|
||||
index bca82e38d86cc440..b503e4b81e92a11c 100644
|
||||
--- a/sysdeps/x86_64/multiarch/Makefile
|
||||
+++ b/sysdeps/x86_64/multiarch/Makefile
|
||||
@@ -11,7 +11,6 @@ sysdep_routines += \
|
||||
memcmp-avx2-movbe-rtm \
|
||||
memcmp-evex-movbe \
|
||||
memcmp-sse2 \
|
||||
- memcmp-sse4 \
|
||||
memcmp-ssse3 \
|
||||
memcpy-ssse3 \
|
||||
memcpy-ssse3-back \
|
||||
@@ -174,7 +173,6 @@ sysdep_routines += \
|
||||
wmemcmp-avx2-movbe-rtm \
|
||||
wmemcmp-c \
|
||||
wmemcmp-evex-movbe \
|
||||
- wmemcmp-sse4 \
|
||||
wmemcmp-ssse3 \
|
||||
# sysdep_routines
|
||||
endif
|
||||
diff --git a/sysdeps/x86_64/multiarch/ifunc-impl-list.c b/sysdeps/x86_64/multiarch/ifunc-impl-list.c
|
||||
index 4c7834dd0b951fa4..e5e48b36c3175e68 100644
|
||||
--- a/sysdeps/x86_64/multiarch/ifunc-impl-list.c
|
||||
+++ b/sysdeps/x86_64/multiarch/ifunc-impl-list.c
|
||||
@@ -78,8 +78,6 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array,
|
||||
&& CPU_FEATURE_USABLE (BMI2)
|
||||
&& CPU_FEATURE_USABLE (MOVBE)),
|
||||
__memcmp_evex_movbe)
|
||||
- IFUNC_IMPL_ADD (array, i, memcmp, CPU_FEATURE_USABLE (SSE4_1),
|
||||
- __memcmp_sse4_1)
|
||||
IFUNC_IMPL_ADD (array, i, memcmp, CPU_FEATURE_USABLE (SSSE3),
|
||||
__memcmp_ssse3)
|
||||
IFUNC_IMPL_ADD (array, i, memcmp, 1, __memcmp_sse2))
|
||||
@@ -824,8 +822,6 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array,
|
||||
&& CPU_FEATURE_USABLE (BMI2)
|
||||
&& CPU_FEATURE_USABLE (MOVBE)),
|
||||
__wmemcmp_evex_movbe)
|
||||
- IFUNC_IMPL_ADD (array, i, wmemcmp, CPU_FEATURE_USABLE (SSE4_1),
|
||||
- __wmemcmp_sse4_1)
|
||||
IFUNC_IMPL_ADD (array, i, wmemcmp, CPU_FEATURE_USABLE (SSSE3),
|
||||
__wmemcmp_ssse3)
|
||||
IFUNC_IMPL_ADD (array, i, wmemcmp, 1, __wmemcmp_sse2))
|
||||
diff --git a/sysdeps/x86_64/multiarch/ifunc-memcmp.h b/sysdeps/x86_64/multiarch/ifunc-memcmp.h
|
||||
index 89e2129968e1e49c..5b92594093c1e0bb 100644
|
||||
--- a/sysdeps/x86_64/multiarch/ifunc-memcmp.h
|
||||
+++ b/sysdeps/x86_64/multiarch/ifunc-memcmp.h
|
||||
@@ -21,7 +21,6 @@
|
||||
|
||||
extern __typeof (REDIRECT_NAME) OPTIMIZE (sse2) attribute_hidden;
|
||||
extern __typeof (REDIRECT_NAME) OPTIMIZE (ssse3) attribute_hidden;
|
||||
-extern __typeof (REDIRECT_NAME) OPTIMIZE (sse4_1) attribute_hidden;
|
||||
extern __typeof (REDIRECT_NAME) OPTIMIZE (avx2_movbe) attribute_hidden;
|
||||
extern __typeof (REDIRECT_NAME) OPTIMIZE (avx2_movbe_rtm) attribute_hidden;
|
||||
extern __typeof (REDIRECT_NAME) OPTIMIZE (evex_movbe) attribute_hidden;
|
||||
@@ -47,9 +46,6 @@ IFUNC_SELECTOR (void)
|
||||
return OPTIMIZE (avx2_movbe);
|
||||
}
|
||||
|
||||
- if (CPU_FEATURE_USABLE_P (cpu_features, SSE4_1))
|
||||
- return OPTIMIZE (sse4_1);
|
||||
-
|
||||
if (CPU_FEATURE_USABLE_P (cpu_features, SSSE3))
|
||||
return OPTIMIZE (ssse3);
|
||||
|
||||
diff --git a/sysdeps/x86_64/multiarch/memcmp-sse4.S b/sysdeps/x86_64/multiarch/memcmp-sse4.S
|
||||
deleted file mode 100644
|
||||
index 97c102a9c5ab2b91..0000000000000000
|
||||
--- a/sysdeps/x86_64/multiarch/memcmp-sse4.S
|
||||
+++ /dev/null
|
||||
@@ -1,804 +0,0 @@
|
||||
-/* memcmp with SSE4.1, wmemcmp with SSE4.1
|
||||
- Copyright (C) 2010-2021 Free Software Foundation, Inc.
|
||||
- Contributed by Intel Corporation.
|
||||
- This file is part of the GNU C Library.
|
||||
-
|
||||
- The GNU C Library is free software; you can redistribute it and/or
|
||||
- modify it under the terms of the GNU Lesser General Public
|
||||
- License as published by the Free Software Foundation; either
|
||||
- version 2.1 of the License, or (at your option) any later version.
|
||||
-
|
||||
- The GNU C Library is distributed in the hope that it will be useful,
|
||||
- but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||
- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
|
||||
- Lesser General Public License for more details.
|
||||
-
|
||||
- You should have received a copy of the GNU Lesser General Public
|
||||
- License along with the GNU C Library; if not, see
|
||||
- <https://www.gnu.org/licenses/>. */
|
||||
-
|
||||
-#if IS_IN (libc)
|
||||
-
|
||||
-# include <sysdep.h>
|
||||
-
|
||||
-# ifndef MEMCMP
|
||||
-# define MEMCMP __memcmp_sse4_1
|
||||
-# endif
|
||||
-
|
||||
-#ifdef USE_AS_WMEMCMP
|
||||
-# define CMPEQ pcmpeqd
|
||||
-# define CHAR_SIZE 4
|
||||
-#else
|
||||
-# define CMPEQ pcmpeqb
|
||||
-# define CHAR_SIZE 1
|
||||
-#endif
|
||||
-
|
||||
-
|
||||
-/* Warning!
|
||||
- wmemcmp has to use SIGNED comparison for elements.
|
||||
- memcmp has to use UNSIGNED comparison for elemnts.
|
||||
-*/
|
||||
-
|
||||
- .section .text.sse4.1,"ax",@progbits
|
||||
-ENTRY (MEMCMP)
|
||||
-# ifdef USE_AS_WMEMCMP
|
||||
- shl $2, %RDX_LP
|
||||
-# elif defined __ILP32__
|
||||
- /* Clear the upper 32 bits. */
|
||||
- mov %edx, %edx
|
||||
-# endif
|
||||
- cmp $79, %RDX_LP
|
||||
- ja L(79bytesormore)
|
||||
-
|
||||
- cmp $CHAR_SIZE, %RDX_LP
|
||||
- jbe L(firstbyte)
|
||||
-
|
||||
- /* N in (CHAR_SIZE, 79) bytes. */
|
||||
- cmpl $32, %edx
|
||||
- ja L(more_32_bytes)
|
||||
-
|
||||
- cmpl $16, %edx
|
||||
- jae L(16_to_32_bytes)
|
||||
-
|
||||
-# ifndef USE_AS_WMEMCMP
|
||||
- cmpl $8, %edx
|
||||
- jae L(8_to_16_bytes)
|
||||
-
|
||||
- cmpl $4, %edx
|
||||
- jb L(2_to_3_bytes)
|
||||
-
|
||||
- movl (%rdi), %eax
|
||||
- movl (%rsi), %ecx
|
||||
-
|
||||
- bswap %eax
|
||||
- bswap %ecx
|
||||
-
|
||||
- shlq $32, %rax
|
||||
- shlq $32, %rcx
|
||||
-
|
||||
- movl -4(%rdi, %rdx), %edi
|
||||
- movl -4(%rsi, %rdx), %esi
|
||||
-
|
||||
- bswap %edi
|
||||
- bswap %esi
|
||||
-
|
||||
- orq %rdi, %rax
|
||||
- orq %rsi, %rcx
|
||||
- subq %rcx, %rax
|
||||
- cmovne %edx, %eax
|
||||
- sbbl %ecx, %ecx
|
||||
- orl %ecx, %eax
|
||||
- ret
|
||||
-
|
||||
- .p2align 4,, 8
|
||||
-L(2_to_3_bytes):
|
||||
- movzwl (%rdi), %eax
|
||||
- movzwl (%rsi), %ecx
|
||||
- shll $8, %eax
|
||||
- shll $8, %ecx
|
||||
- bswap %eax
|
||||
- bswap %ecx
|
||||
- movzbl -1(%rdi, %rdx), %edi
|
||||
- movzbl -1(%rsi, %rdx), %esi
|
||||
- orl %edi, %eax
|
||||
- orl %esi, %ecx
|
||||
- subl %ecx, %eax
|
||||
- ret
|
||||
-
|
||||
- .p2align 4,, 8
|
||||
-L(8_to_16_bytes):
|
||||
- movq (%rdi), %rax
|
||||
- movq (%rsi), %rcx
|
||||
-
|
||||
- bswap %rax
|
||||
- bswap %rcx
|
||||
-
|
||||
- subq %rcx, %rax
|
||||
- jne L(8_to_16_bytes_done)
|
||||
-
|
||||
- movq -8(%rdi, %rdx), %rax
|
||||
- movq -8(%rsi, %rdx), %rcx
|
||||
-
|
||||
- bswap %rax
|
||||
- bswap %rcx
|
||||
-
|
||||
- subq %rcx, %rax
|
||||
-
|
||||
-L(8_to_16_bytes_done):
|
||||
- cmovne %edx, %eax
|
||||
- sbbl %ecx, %ecx
|
||||
- orl %ecx, %eax
|
||||
- ret
|
||||
-# else
|
||||
- xorl %eax, %eax
|
||||
- movl (%rdi), %ecx
|
||||
- cmpl (%rsi), %ecx
|
||||
- jne L(8_to_16_bytes_done)
|
||||
- movl 4(%rdi), %ecx
|
||||
- cmpl 4(%rsi), %ecx
|
||||
- jne L(8_to_16_bytes_done)
|
||||
- movl -4(%rdi, %rdx), %ecx
|
||||
- cmpl -4(%rsi, %rdx), %ecx
|
||||
- jne L(8_to_16_bytes_done)
|
||||
- ret
|
||||
-# endif
|
||||
-
|
||||
- .p2align 4,, 3
|
||||
-L(ret_zero):
|
||||
- xorl %eax, %eax
|
||||
-L(zero):
|
||||
- ret
|
||||
-
|
||||
- .p2align 4,, 8
|
||||
-L(firstbyte):
|
||||
- jb L(ret_zero)
|
||||
-# ifdef USE_AS_WMEMCMP
|
||||
- xorl %eax, %eax
|
||||
- movl (%rdi), %ecx
|
||||
- cmpl (%rsi), %ecx
|
||||
- je L(zero)
|
||||
-L(8_to_16_bytes_done):
|
||||
- setg %al
|
||||
- leal -1(%rax, %rax), %eax
|
||||
-# else
|
||||
- movzbl (%rdi), %eax
|
||||
- movzbl (%rsi), %ecx
|
||||
- sub %ecx, %eax
|
||||
-# endif
|
||||
- ret
|
||||
-
|
||||
- .p2align 4
|
||||
-L(vec_return_begin_48):
|
||||
- addq $16, %rdi
|
||||
- addq $16, %rsi
|
||||
-L(vec_return_begin_32):
|
||||
- bsfl %eax, %eax
|
||||
-# ifdef USE_AS_WMEMCMP
|
||||
- movl 32(%rdi, %rax), %ecx
|
||||
- xorl %edx, %edx
|
||||
- cmpl 32(%rsi, %rax), %ecx
|
||||
- setg %dl
|
||||
- leal -1(%rdx, %rdx), %eax
|
||||
-# else
|
||||
- movzbl 32(%rsi, %rax), %ecx
|
||||
- movzbl 32(%rdi, %rax), %eax
|
||||
- subl %ecx, %eax
|
||||
-# endif
|
||||
- ret
|
||||
-
|
||||
- .p2align 4
|
||||
-L(vec_return_begin_16):
|
||||
- addq $16, %rdi
|
||||
- addq $16, %rsi
|
||||
-L(vec_return_begin):
|
||||
- bsfl %eax, %eax
|
||||
-# ifdef USE_AS_WMEMCMP
|
||||
- movl (%rdi, %rax), %ecx
|
||||
- xorl %edx, %edx
|
||||
- cmpl (%rsi, %rax), %ecx
|
||||
- setg %dl
|
||||
- leal -1(%rdx, %rdx), %eax
|
||||
-# else
|
||||
- movzbl (%rsi, %rax), %ecx
|
||||
- movzbl (%rdi, %rax), %eax
|
||||
- subl %ecx, %eax
|
||||
-# endif
|
||||
- ret
|
||||
-
|
||||
- .p2align 4
|
||||
-L(vec_return_end_16):
|
||||
- subl $16, %edx
|
||||
-L(vec_return_end):
|
||||
- bsfl %eax, %eax
|
||||
- addl %edx, %eax
|
||||
-# ifdef USE_AS_WMEMCMP
|
||||
- movl -16(%rdi, %rax), %ecx
|
||||
- xorl %edx, %edx
|
||||
- cmpl -16(%rsi, %rax), %ecx
|
||||
- setg %dl
|
||||
- leal -1(%rdx, %rdx), %eax
|
||||
-# else
|
||||
- movzbl -16(%rsi, %rax), %ecx
|
||||
- movzbl -16(%rdi, %rax), %eax
|
||||
- subl %ecx, %eax
|
||||
-# endif
|
||||
- ret
|
||||
-
|
||||
- .p2align 4,, 8
|
||||
-L(more_32_bytes):
|
||||
- movdqu (%rdi), %xmm0
|
||||
- movdqu (%rsi), %xmm1
|
||||
- CMPEQ %xmm0, %xmm1
|
||||
- pmovmskb %xmm1, %eax
|
||||
- incw %ax
|
||||
- jnz L(vec_return_begin)
|
||||
-
|
||||
- movdqu 16(%rdi), %xmm0
|
||||
- movdqu 16(%rsi), %xmm1
|
||||
- CMPEQ %xmm0, %xmm1
|
||||
- pmovmskb %xmm1, %eax
|
||||
- incw %ax
|
||||
- jnz L(vec_return_begin_16)
|
||||
-
|
||||
- cmpl $64, %edx
|
||||
- jbe L(32_to_64_bytes)
|
||||
- movdqu 32(%rdi), %xmm0
|
||||
- movdqu 32(%rsi), %xmm1
|
||||
- CMPEQ %xmm0, %xmm1
|
||||
- pmovmskb %xmm1, %eax
|
||||
- incw %ax
|
||||
- jnz L(vec_return_begin_32)
|
||||
-
|
||||
- .p2align 4,, 6
|
||||
-L(32_to_64_bytes):
|
||||
- movdqu -32(%rdi, %rdx), %xmm0
|
||||
- movdqu -32(%rsi, %rdx), %xmm1
|
||||
- CMPEQ %xmm0, %xmm1
|
||||
- pmovmskb %xmm1, %eax
|
||||
- incw %ax
|
||||
- jnz L(vec_return_end_16)
|
||||
-
|
||||
- movdqu -16(%rdi, %rdx), %xmm0
|
||||
- movdqu -16(%rsi, %rdx), %xmm1
|
||||
- CMPEQ %xmm0, %xmm1
|
||||
- pmovmskb %xmm1, %eax
|
||||
- incw %ax
|
||||
- jnz L(vec_return_end)
|
||||
- ret
|
||||
-
|
||||
- .p2align 4
|
||||
-L(16_to_32_bytes):
|
||||
- movdqu (%rdi), %xmm0
|
||||
- movdqu (%rsi), %xmm1
|
||||
- CMPEQ %xmm0, %xmm1
|
||||
- pmovmskb %xmm1, %eax
|
||||
- incw %ax
|
||||
- jnz L(vec_return_begin)
|
||||
-
|
||||
- movdqu -16(%rdi, %rdx), %xmm0
|
||||
- movdqu -16(%rsi, %rdx), %xmm1
|
||||
- CMPEQ %xmm0, %xmm1
|
||||
- pmovmskb %xmm1, %eax
|
||||
- incw %ax
|
||||
- jnz L(vec_return_end)
|
||||
- ret
|
||||
-
|
||||
-
|
||||
- .p2align 4
|
||||
-L(79bytesormore):
|
||||
- movdqu (%rdi), %xmm0
|
||||
- movdqu (%rsi), %xmm1
|
||||
- CMPEQ %xmm0, %xmm1
|
||||
- pmovmskb %xmm1, %eax
|
||||
- incw %ax
|
||||
- jnz L(vec_return_begin)
|
||||
-
|
||||
-
|
||||
- mov %rsi, %rcx
|
||||
- and $-16, %rsi
|
||||
- add $16, %rsi
|
||||
- sub %rsi, %rcx
|
||||
-
|
||||
- sub %rcx, %rdi
|
||||
- add %rcx, %rdx
|
||||
- test $0xf, %rdi
|
||||
- jz L(2aligned)
|
||||
-
|
||||
- cmp $128, %rdx
|
||||
- ja L(128bytesormore)
|
||||
-
|
||||
- .p2align 4,, 6
|
||||
-L(less128bytes):
|
||||
- movdqu (%rdi), %xmm1
|
||||
- CMPEQ (%rsi), %xmm1
|
||||
- pmovmskb %xmm1, %eax
|
||||
- incw %ax
|
||||
- jnz L(vec_return_begin)
|
||||
-
|
||||
- movdqu 16(%rdi), %xmm1
|
||||
- CMPEQ 16(%rsi), %xmm1
|
||||
- pmovmskb %xmm1, %eax
|
||||
- incw %ax
|
||||
- jnz L(vec_return_begin_16)
|
||||
-
|
||||
- movdqu 32(%rdi), %xmm1
|
||||
- CMPEQ 32(%rsi), %xmm1
|
||||
- pmovmskb %xmm1, %eax
|
||||
- incw %ax
|
||||
- jnz L(vec_return_begin_32)
|
||||
-
|
||||
- movdqu 48(%rdi), %xmm1
|
||||
- CMPEQ 48(%rsi), %xmm1
|
||||
- pmovmskb %xmm1, %eax
|
||||
- incw %ax
|
||||
- jnz L(vec_return_begin_48)
|
||||
-
|
||||
- cmp $96, %rdx
|
||||
- jb L(32_to_64_bytes)
|
||||
-
|
||||
- addq $64, %rdi
|
||||
- addq $64, %rsi
|
||||
- subq $64, %rdx
|
||||
-
|
||||
- .p2align 4,, 6
|
||||
-L(last_64_bytes):
|
||||
- movdqu (%rdi), %xmm1
|
||||
- CMPEQ (%rsi), %xmm1
|
||||
- pmovmskb %xmm1, %eax
|
||||
- incw %ax
|
||||
- jnz L(vec_return_begin)
|
||||
-
|
||||
- movdqu 16(%rdi), %xmm1
|
||||
- CMPEQ 16(%rsi), %xmm1
|
||||
- pmovmskb %xmm1, %eax
|
||||
- incw %ax
|
||||
- jnz L(vec_return_begin_16)
|
||||
-
|
||||
- movdqu -32(%rdi, %rdx), %xmm0
|
||||
- movdqu -32(%rsi, %rdx), %xmm1
|
||||
- CMPEQ %xmm0, %xmm1
|
||||
- pmovmskb %xmm1, %eax
|
||||
- incw %ax
|
||||
- jnz L(vec_return_end_16)
|
||||
-
|
||||
- movdqu -16(%rdi, %rdx), %xmm0
|
||||
- movdqu -16(%rsi, %rdx), %xmm1
|
||||
- CMPEQ %xmm0, %xmm1
|
||||
- pmovmskb %xmm1, %eax
|
||||
- incw %ax
|
||||
- jnz L(vec_return_end)
|
||||
- ret
|
||||
-
|
||||
- .p2align 4
|
||||
-L(128bytesormore):
|
||||
- cmp $256, %rdx
|
||||
- ja L(unaligned_loop)
|
||||
-L(less256bytes):
|
||||
- movdqu (%rdi), %xmm1
|
||||
- CMPEQ (%rsi), %xmm1
|
||||
- pmovmskb %xmm1, %eax
|
||||
- incw %ax
|
||||
- jnz L(vec_return_begin)
|
||||
-
|
||||
- movdqu 16(%rdi), %xmm1
|
||||
- CMPEQ 16(%rsi), %xmm1
|
||||
- pmovmskb %xmm1, %eax
|
||||
- incw %ax
|
||||
- jnz L(vec_return_begin_16)
|
||||
-
|
||||
- movdqu 32(%rdi), %xmm1
|
||||
- CMPEQ 32(%rsi), %xmm1
|
||||
- pmovmskb %xmm1, %eax
|
||||
- incw %ax
|
||||
- jnz L(vec_return_begin_32)
|
||||
-
|
||||
- movdqu 48(%rdi), %xmm1
|
||||
- CMPEQ 48(%rsi), %xmm1
|
||||
- pmovmskb %xmm1, %eax
|
||||
- incw %ax
|
||||
- jnz L(vec_return_begin_48)
|
||||
-
|
||||
- addq $64, %rdi
|
||||
- addq $64, %rsi
|
||||
-
|
||||
- movdqu (%rdi), %xmm1
|
||||
- CMPEQ (%rsi), %xmm1
|
||||
- pmovmskb %xmm1, %eax
|
||||
- incw %ax
|
||||
- jnz L(vec_return_begin)
|
||||
-
|
||||
- movdqu 16(%rdi), %xmm1
|
||||
- CMPEQ 16(%rsi), %xmm1
|
||||
- pmovmskb %xmm1, %eax
|
||||
- incw %ax
|
||||
- jnz L(vec_return_begin_16)
|
||||
-
|
||||
- movdqu 32(%rdi), %xmm1
|
||||
- CMPEQ 32(%rsi), %xmm1
|
||||
- pmovmskb %xmm1, %eax
|
||||
- incw %ax
|
||||
- jnz L(vec_return_begin_32)
|
||||
-
|
||||
- movdqu 48(%rdi), %xmm1
|
||||
- CMPEQ 48(%rsi), %xmm1
|
||||
- pmovmskb %xmm1, %eax
|
||||
- incw %ax
|
||||
- jnz L(vec_return_begin_48)
|
||||
-
|
||||
- addq $-128, %rdx
|
||||
- subq $-64, %rsi
|
||||
- subq $-64, %rdi
|
||||
-
|
||||
- cmp $64, %rdx
|
||||
- ja L(less128bytes)
|
||||
-
|
||||
- cmp $32, %rdx
|
||||
- ja L(last_64_bytes)
|
||||
-
|
||||
- movdqu -32(%rdi, %rdx), %xmm0
|
||||
- movdqu -32(%rsi, %rdx), %xmm1
|
||||
- CMPEQ %xmm0, %xmm1
|
||||
- pmovmskb %xmm1, %eax
|
||||
- incw %ax
|
||||
- jnz L(vec_return_end_16)
|
||||
-
|
||||
- movdqu -16(%rdi, %rdx), %xmm0
|
||||
- movdqu -16(%rsi, %rdx), %xmm1
|
||||
- CMPEQ %xmm0, %xmm1
|
||||
- pmovmskb %xmm1, %eax
|
||||
- incw %ax
|
||||
- jnz L(vec_return_end)
|
||||
- ret
|
||||
-
|
||||
- .p2align 4
|
||||
-L(unaligned_loop):
|
||||
-# ifdef DATA_CACHE_SIZE_HALF
|
||||
- mov $DATA_CACHE_SIZE_HALF, %R8_LP
|
||||
-# else
|
||||
- mov __x86_data_cache_size_half(%rip), %R8_LP
|
||||
-# endif
|
||||
- movq %r8, %r9
|
||||
- addq %r8, %r8
|
||||
- addq %r9, %r8
|
||||
- cmpq %r8, %rdx
|
||||
- ja L(L2_L3_cache_unaligned)
|
||||
- sub $64, %rdx
|
||||
- .p2align 4
|
||||
-L(64bytesormore_loop):
|
||||
- movdqu (%rdi), %xmm0
|
||||
- movdqu 16(%rdi), %xmm1
|
||||
- movdqu 32(%rdi), %xmm2
|
||||
- movdqu 48(%rdi), %xmm3
|
||||
-
|
||||
- CMPEQ (%rsi), %xmm0
|
||||
- CMPEQ 16(%rsi), %xmm1
|
||||
- CMPEQ 32(%rsi), %xmm2
|
||||
- CMPEQ 48(%rsi), %xmm3
|
||||
-
|
||||
- pand %xmm0, %xmm1
|
||||
- pand %xmm2, %xmm3
|
||||
- pand %xmm1, %xmm3
|
||||
-
|
||||
- pmovmskb %xmm3, %eax
|
||||
- incw %ax
|
||||
- jnz L(64bytesormore_loop_end)
|
||||
-
|
||||
- add $64, %rsi
|
||||
- add $64, %rdi
|
||||
- sub $64, %rdx
|
||||
- ja L(64bytesormore_loop)
|
||||
-
|
||||
- .p2align 4,, 6
|
||||
-L(loop_tail):
|
||||
- addq %rdx, %rdi
|
||||
- movdqu (%rdi), %xmm0
|
||||
- movdqu 16(%rdi), %xmm1
|
||||
- movdqu 32(%rdi), %xmm2
|
||||
- movdqu 48(%rdi), %xmm3
|
||||
-
|
||||
- addq %rdx, %rsi
|
||||
- movdqu (%rsi), %xmm4
|
||||
- movdqu 16(%rsi), %xmm5
|
||||
- movdqu 32(%rsi), %xmm6
|
||||
- movdqu 48(%rsi), %xmm7
|
||||
-
|
||||
- CMPEQ %xmm4, %xmm0
|
||||
- CMPEQ %xmm5, %xmm1
|
||||
- CMPEQ %xmm6, %xmm2
|
||||
- CMPEQ %xmm7, %xmm3
|
||||
-
|
||||
- pand %xmm0, %xmm1
|
||||
- pand %xmm2, %xmm3
|
||||
- pand %xmm1, %xmm3
|
||||
-
|
||||
- pmovmskb %xmm3, %eax
|
||||
- incw %ax
|
||||
- jnz L(64bytesormore_loop_end)
|
||||
- ret
|
||||
-
|
||||
-L(L2_L3_cache_unaligned):
|
||||
- subq $64, %rdx
|
||||
- .p2align 4
|
||||
-L(L2_L3_unaligned_128bytes_loop):
|
||||
- prefetchnta 0x1c0(%rdi)
|
||||
- prefetchnta 0x1c0(%rsi)
|
||||
-
|
||||
- movdqu (%rdi), %xmm0
|
||||
- movdqu 16(%rdi), %xmm1
|
||||
- movdqu 32(%rdi), %xmm2
|
||||
- movdqu 48(%rdi), %xmm3
|
||||
-
|
||||
- CMPEQ (%rsi), %xmm0
|
||||
- CMPEQ 16(%rsi), %xmm1
|
||||
- CMPEQ 32(%rsi), %xmm2
|
||||
- CMPEQ 48(%rsi), %xmm3
|
||||
-
|
||||
- pand %xmm0, %xmm1
|
||||
- pand %xmm2, %xmm3
|
||||
- pand %xmm1, %xmm3
|
||||
-
|
||||
- pmovmskb %xmm3, %eax
|
||||
- incw %ax
|
||||
- jnz L(64bytesormore_loop_end)
|
||||
-
|
||||
- add $64, %rsi
|
||||
- add $64, %rdi
|
||||
- sub $64, %rdx
|
||||
- ja L(L2_L3_unaligned_128bytes_loop)
|
||||
- jmp L(loop_tail)
|
||||
-
|
||||
-
|
||||
- /* This case is for machines which are sensitive for unaligned
|
||||
- * instructions. */
|
||||
- .p2align 4
|
||||
-L(2aligned):
|
||||
- cmp $128, %rdx
|
||||
- ja L(128bytesormorein2aligned)
|
||||
-L(less128bytesin2aligned):
|
||||
- movdqa (%rdi), %xmm1
|
||||
- CMPEQ (%rsi), %xmm1
|
||||
- pmovmskb %xmm1, %eax
|
||||
- incw %ax
|
||||
- jnz L(vec_return_begin)
|
||||
-
|
||||
- movdqa 16(%rdi), %xmm1
|
||||
- CMPEQ 16(%rsi), %xmm1
|
||||
- pmovmskb %xmm1, %eax
|
||||
- incw %ax
|
||||
- jnz L(vec_return_begin_16)
|
||||
-
|
||||
- movdqa 32(%rdi), %xmm1
|
||||
- CMPEQ 32(%rsi), %xmm1
|
||||
- pmovmskb %xmm1, %eax
|
||||
- incw %ax
|
||||
- jnz L(vec_return_begin_32)
|
||||
-
|
||||
- movdqa 48(%rdi), %xmm1
|
||||
- CMPEQ 48(%rsi), %xmm1
|
||||
- pmovmskb %xmm1, %eax
|
||||
- incw %ax
|
||||
- jnz L(vec_return_begin_48)
|
||||
-
|
||||
- cmp $96, %rdx
|
||||
- jb L(32_to_64_bytes)
|
||||
-
|
||||
- addq $64, %rdi
|
||||
- addq $64, %rsi
|
||||
- subq $64, %rdx
|
||||
-
|
||||
- .p2align 4,, 6
|
||||
-L(aligned_last_64_bytes):
|
||||
- movdqa (%rdi), %xmm1
|
||||
- CMPEQ (%rsi), %xmm1
|
||||
- pmovmskb %xmm1, %eax
|
||||
- incw %ax
|
||||
- jnz L(vec_return_begin)
|
||||
-
|
||||
- movdqa 16(%rdi), %xmm1
|
||||
- CMPEQ 16(%rsi), %xmm1
|
||||
- pmovmskb %xmm1, %eax
|
||||
- incw %ax
|
||||
- jnz L(vec_return_begin_16)
|
||||
-
|
||||
- movdqu -32(%rdi, %rdx), %xmm0
|
||||
- movdqu -32(%rsi, %rdx), %xmm1
|
||||
- CMPEQ %xmm0, %xmm1
|
||||
- pmovmskb %xmm1, %eax
|
||||
- incw %ax
|
||||
- jnz L(vec_return_end_16)
|
||||
-
|
||||
- movdqu -16(%rdi, %rdx), %xmm0
|
||||
- movdqu -16(%rsi, %rdx), %xmm1
|
||||
- CMPEQ %xmm0, %xmm1
|
||||
- pmovmskb %xmm1, %eax
|
||||
- incw %ax
|
||||
- jnz L(vec_return_end)
|
||||
- ret
|
||||
-
|
||||
- .p2align 4
|
||||
-L(128bytesormorein2aligned):
|
||||
- cmp $256, %rdx
|
||||
- ja L(aligned_loop)
|
||||
-L(less256bytesin2alinged):
|
||||
- movdqa (%rdi), %xmm1
|
||||
- CMPEQ (%rsi), %xmm1
|
||||
- pmovmskb %xmm1, %eax
|
||||
- incw %ax
|
||||
- jnz L(vec_return_begin)
|
||||
-
|
||||
- movdqa 16(%rdi), %xmm1
|
||||
- CMPEQ 16(%rsi), %xmm1
|
||||
- pmovmskb %xmm1, %eax
|
||||
- incw %ax
|
||||
- jnz L(vec_return_begin_16)
|
||||
-
|
||||
- movdqa 32(%rdi), %xmm1
|
||||
- CMPEQ 32(%rsi), %xmm1
|
||||
- pmovmskb %xmm1, %eax
|
||||
- incw %ax
|
||||
- jnz L(vec_return_begin_32)
|
||||
-
|
||||
- movdqa 48(%rdi), %xmm1
|
||||
- CMPEQ 48(%rsi), %xmm1
|
||||
- pmovmskb %xmm1, %eax
|
||||
- incw %ax
|
||||
- jnz L(vec_return_begin_48)
|
||||
-
|
||||
- addq $64, %rdi
|
||||
- addq $64, %rsi
|
||||
-
|
||||
- movdqa (%rdi), %xmm1
|
||||
- CMPEQ (%rsi), %xmm1
|
||||
- pmovmskb %xmm1, %eax
|
||||
- incw %ax
|
||||
- jnz L(vec_return_begin)
|
||||
-
|
||||
- movdqa 16(%rdi), %xmm1
|
||||
- CMPEQ 16(%rsi), %xmm1
|
||||
- pmovmskb %xmm1, %eax
|
||||
- incw %ax
|
||||
- jnz L(vec_return_begin_16)
|
||||
-
|
||||
- movdqa 32(%rdi), %xmm1
|
||||
- CMPEQ 32(%rsi), %xmm1
|
||||
- pmovmskb %xmm1, %eax
|
||||
- incw %ax
|
||||
- jnz L(vec_return_begin_32)
|
||||
-
|
||||
- movdqa 48(%rdi), %xmm1
|
||||
- CMPEQ 48(%rsi), %xmm1
|
||||
- pmovmskb %xmm1, %eax
|
||||
- incw %ax
|
||||
- jnz L(vec_return_begin_48)
|
||||
-
|
||||
- addq $-128, %rdx
|
||||
- subq $-64, %rsi
|
||||
- subq $-64, %rdi
|
||||
-
|
||||
- cmp $64, %rdx
|
||||
- ja L(less128bytesin2aligned)
|
||||
-
|
||||
- cmp $32, %rdx
|
||||
- ja L(aligned_last_64_bytes)
|
||||
-
|
||||
- movdqu -32(%rdi, %rdx), %xmm0
|
||||
- movdqu -32(%rsi, %rdx), %xmm1
|
||||
- CMPEQ %xmm0, %xmm1
|
||||
- pmovmskb %xmm1, %eax
|
||||
- incw %ax
|
||||
- jnz L(vec_return_end_16)
|
||||
-
|
||||
- movdqu -16(%rdi, %rdx), %xmm0
|
||||
- movdqu -16(%rsi, %rdx), %xmm1
|
||||
- CMPEQ %xmm0, %xmm1
|
||||
- pmovmskb %xmm1, %eax
|
||||
- incw %ax
|
||||
- jnz L(vec_return_end)
|
||||
- ret
|
||||
-
|
||||
- .p2align 4
|
||||
-L(aligned_loop):
|
||||
-# ifdef DATA_CACHE_SIZE_HALF
|
||||
- mov $DATA_CACHE_SIZE_HALF, %R8_LP
|
||||
-# else
|
||||
- mov __x86_data_cache_size_half(%rip), %R8_LP
|
||||
-# endif
|
||||
- movq %r8, %r9
|
||||
- addq %r8, %r8
|
||||
- addq %r9, %r8
|
||||
- cmpq %r8, %rdx
|
||||
- ja L(L2_L3_cache_aligned)
|
||||
-
|
||||
- sub $64, %rdx
|
||||
- .p2align 4
|
||||
-L(64bytesormore_loopin2aligned):
|
||||
- movdqa (%rdi), %xmm0
|
||||
- movdqa 16(%rdi), %xmm1
|
||||
- movdqa 32(%rdi), %xmm2
|
||||
- movdqa 48(%rdi), %xmm3
|
||||
-
|
||||
- CMPEQ (%rsi), %xmm0
|
||||
- CMPEQ 16(%rsi), %xmm1
|
||||
- CMPEQ 32(%rsi), %xmm2
|
||||
- CMPEQ 48(%rsi), %xmm3
|
||||
-
|
||||
- pand %xmm0, %xmm1
|
||||
- pand %xmm2, %xmm3
|
||||
- pand %xmm1, %xmm3
|
||||
-
|
||||
- pmovmskb %xmm3, %eax
|
||||
- incw %ax
|
||||
- jnz L(64bytesormore_loop_end)
|
||||
- add $64, %rsi
|
||||
- add $64, %rdi
|
||||
- sub $64, %rdx
|
||||
- ja L(64bytesormore_loopin2aligned)
|
||||
- jmp L(loop_tail)
|
||||
-
|
||||
-L(L2_L3_cache_aligned):
|
||||
- subq $64, %rdx
|
||||
- .p2align 4
|
||||
-L(L2_L3_aligned_128bytes_loop):
|
||||
- prefetchnta 0x1c0(%rdi)
|
||||
- prefetchnta 0x1c0(%rsi)
|
||||
- movdqa (%rdi), %xmm0
|
||||
- movdqa 16(%rdi), %xmm1
|
||||
- movdqa 32(%rdi), %xmm2
|
||||
- movdqa 48(%rdi), %xmm3
|
||||
-
|
||||
- CMPEQ (%rsi), %xmm0
|
||||
- CMPEQ 16(%rsi), %xmm1
|
||||
- CMPEQ 32(%rsi), %xmm2
|
||||
- CMPEQ 48(%rsi), %xmm3
|
||||
-
|
||||
- pand %xmm0, %xmm1
|
||||
- pand %xmm2, %xmm3
|
||||
- pand %xmm1, %xmm3
|
||||
-
|
||||
- pmovmskb %xmm3, %eax
|
||||
- incw %ax
|
||||
- jnz L(64bytesormore_loop_end)
|
||||
-
|
||||
- addq $64, %rsi
|
||||
- addq $64, %rdi
|
||||
- subq $64, %rdx
|
||||
- ja L(L2_L3_aligned_128bytes_loop)
|
||||
- jmp L(loop_tail)
|
||||
-
|
||||
- .p2align 4
|
||||
-L(64bytesormore_loop_end):
|
||||
- pmovmskb %xmm0, %ecx
|
||||
- incw %cx
|
||||
- jnz L(loop_end_ret)
|
||||
-
|
||||
- pmovmskb %xmm1, %ecx
|
||||
- notw %cx
|
||||
- sall $16, %ecx
|
||||
- jnz L(loop_end_ret)
|
||||
-
|
||||
- pmovmskb %xmm2, %ecx
|
||||
- notw %cx
|
||||
- shlq $32, %rcx
|
||||
- jnz L(loop_end_ret)
|
||||
-
|
||||
- addq $48, %rdi
|
||||
- addq $48, %rsi
|
||||
- movq %rax, %rcx
|
||||
-
|
||||
- .p2align 4,, 6
|
||||
-L(loop_end_ret):
|
||||
- bsfq %rcx, %rcx
|
||||
-# ifdef USE_AS_WMEMCMP
|
||||
- movl (%rdi, %rcx), %eax
|
||||
- xorl %edx, %edx
|
||||
- cmpl (%rsi, %rcx), %eax
|
||||
- setg %dl
|
||||
- leal -1(%rdx, %rdx), %eax
|
||||
-# else
|
||||
- movzbl (%rdi, %rcx), %eax
|
||||
- movzbl (%rsi, %rcx), %ecx
|
||||
- subl %ecx, %eax
|
||||
-# endif
|
||||
- ret
|
||||
-END (MEMCMP)
|
||||
-#endif
|
259
glibc-upstream-2.34-232.patch
Normal file
259
glibc-upstream-2.34-232.patch
Normal file
@ -0,0 +1,259 @@
|
||||
commit df5de87260dba479873b2850bbe5c0b81c2376f6
|
||||
Author: Noah Goldstein <goldstein.w.n@gmail.com>
|
||||
Date: Fri Apr 15 12:28:01 2022 -0500
|
||||
|
||||
x86: Cleanup page cross code in memcmp-avx2-movbe.S
|
||||
|
||||
Old code was both inefficient and wasted code size. New code (-62
|
||||
bytes) and comparable or better performance in the page cross case.
|
||||
|
||||
geometric_mean(N=20) of page cross cases New / Original: 0.960
|
||||
|
||||
size, align0, align1, ret, New Time/Old Time
|
||||
1, 4095, 0, 0, 1.001
|
||||
1, 4095, 0, 1, 0.999
|
||||
1, 4095, 0, -1, 1.0
|
||||
2, 4094, 0, 0, 1.0
|
||||
2, 4094, 0, 1, 1.0
|
||||
2, 4094, 0, -1, 1.0
|
||||
3, 4093, 0, 0, 1.0
|
||||
3, 4093, 0, 1, 1.0
|
||||
3, 4093, 0, -1, 1.0
|
||||
4, 4092, 0, 0, 0.987
|
||||
4, 4092, 0, 1, 1.0
|
||||
4, 4092, 0, -1, 1.0
|
||||
5, 4091, 0, 0, 0.984
|
||||
5, 4091, 0, 1, 1.002
|
||||
5, 4091, 0, -1, 1.005
|
||||
6, 4090, 0, 0, 0.993
|
||||
6, 4090, 0, 1, 1.001
|
||||
6, 4090, 0, -1, 1.003
|
||||
7, 4089, 0, 0, 0.991
|
||||
7, 4089, 0, 1, 1.0
|
||||
7, 4089, 0, -1, 1.001
|
||||
8, 4088, 0, 0, 0.875
|
||||
8, 4088, 0, 1, 0.881
|
||||
8, 4088, 0, -1, 0.888
|
||||
9, 4087, 0, 0, 0.872
|
||||
9, 4087, 0, 1, 0.879
|
||||
9, 4087, 0, -1, 0.883
|
||||
10, 4086, 0, 0, 0.878
|
||||
10, 4086, 0, 1, 0.886
|
||||
10, 4086, 0, -1, 0.873
|
||||
11, 4085, 0, 0, 0.878
|
||||
11, 4085, 0, 1, 0.881
|
||||
11, 4085, 0, -1, 0.879
|
||||
12, 4084, 0, 0, 0.873
|
||||
12, 4084, 0, 1, 0.889
|
||||
12, 4084, 0, -1, 0.875
|
||||
13, 4083, 0, 0, 0.873
|
||||
13, 4083, 0, 1, 0.863
|
||||
13, 4083, 0, -1, 0.863
|
||||
14, 4082, 0, 0, 0.838
|
||||
14, 4082, 0, 1, 0.869
|
||||
14, 4082, 0, -1, 0.877
|
||||
15, 4081, 0, 0, 0.841
|
||||
15, 4081, 0, 1, 0.869
|
||||
15, 4081, 0, -1, 0.876
|
||||
16, 4080, 0, 0, 0.988
|
||||
16, 4080, 0, 1, 0.99
|
||||
16, 4080, 0, -1, 0.989
|
||||
17, 4079, 0, 0, 0.978
|
||||
17, 4079, 0, 1, 0.981
|
||||
17, 4079, 0, -1, 0.98
|
||||
18, 4078, 0, 0, 0.981
|
||||
18, 4078, 0, 1, 0.98
|
||||
18, 4078, 0, -1, 0.985
|
||||
19, 4077, 0, 0, 0.977
|
||||
19, 4077, 0, 1, 0.979
|
||||
19, 4077, 0, -1, 0.986
|
||||
20, 4076, 0, 0, 0.977
|
||||
20, 4076, 0, 1, 0.986
|
||||
20, 4076, 0, -1, 0.984
|
||||
21, 4075, 0, 0, 0.977
|
||||
21, 4075, 0, 1, 0.983
|
||||
21, 4075, 0, -1, 0.988
|
||||
22, 4074, 0, 0, 0.983
|
||||
22, 4074, 0, 1, 0.994
|
||||
22, 4074, 0, -1, 0.993
|
||||
23, 4073, 0, 0, 0.98
|
||||
23, 4073, 0, 1, 0.992
|
||||
23, 4073, 0, -1, 0.995
|
||||
24, 4072, 0, 0, 0.989
|
||||
24, 4072, 0, 1, 0.989
|
||||
24, 4072, 0, -1, 0.991
|
||||
25, 4071, 0, 0, 0.99
|
||||
25, 4071, 0, 1, 0.999
|
||||
25, 4071, 0, -1, 0.996
|
||||
26, 4070, 0, 0, 0.993
|
||||
26, 4070, 0, 1, 0.995
|
||||
26, 4070, 0, -1, 0.998
|
||||
27, 4069, 0, 0, 0.993
|
||||
27, 4069, 0, 1, 0.999
|
||||
27, 4069, 0, -1, 1.0
|
||||
28, 4068, 0, 0, 0.997
|
||||
28, 4068, 0, 1, 1.0
|
||||
28, 4068, 0, -1, 0.999
|
||||
29, 4067, 0, 0, 0.996
|
||||
29, 4067, 0, 1, 0.999
|
||||
29, 4067, 0, -1, 0.999
|
||||
30, 4066, 0, 0, 0.991
|
||||
30, 4066, 0, 1, 1.001
|
||||
30, 4066, 0, -1, 0.999
|
||||
31, 4065, 0, 0, 0.988
|
||||
31, 4065, 0, 1, 0.998
|
||||
31, 4065, 0, -1, 0.998
|
||||
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
|
||||
|
||||
(cherry picked from commit 23102686ec67b856a2d4fd25ddaa1c0b8d175c4f)
|
||||
|
||||
diff --git a/sysdeps/x86_64/multiarch/memcmp-avx2-movbe.S b/sysdeps/x86_64/multiarch/memcmp-avx2-movbe.S
|
||||
index 2621ec907aedb781..ec9cf0852edf216d 100644
|
||||
--- a/sysdeps/x86_64/multiarch/memcmp-avx2-movbe.S
|
||||
+++ b/sysdeps/x86_64/multiarch/memcmp-avx2-movbe.S
|
||||
@@ -429,22 +429,21 @@ L(page_cross_less_vec):
|
||||
# ifndef USE_AS_WMEMCMP
|
||||
cmpl $8, %edx
|
||||
jae L(between_8_15)
|
||||
+ /* Fall through for [4, 7]. */
|
||||
cmpl $4, %edx
|
||||
- jae L(between_4_7)
|
||||
+ jb L(between_2_3)
|
||||
|
||||
- /* Load as big endian to avoid branches. */
|
||||
- movzwl (%rdi), %eax
|
||||
- movzwl (%rsi), %ecx
|
||||
- shll $8, %eax
|
||||
- shll $8, %ecx
|
||||
- bswap %eax
|
||||
- bswap %ecx
|
||||
- movzbl -1(%rdi, %rdx), %edi
|
||||
- movzbl -1(%rsi, %rdx), %esi
|
||||
- orl %edi, %eax
|
||||
- orl %esi, %ecx
|
||||
- /* Subtraction is okay because the upper 8 bits are zero. */
|
||||
- subl %ecx, %eax
|
||||
+ movbe (%rdi), %eax
|
||||
+ movbe (%rsi), %ecx
|
||||
+ shlq $32, %rax
|
||||
+ shlq $32, %rcx
|
||||
+ movbe -4(%rdi, %rdx), %edi
|
||||
+ movbe -4(%rsi, %rdx), %esi
|
||||
+ orq %rdi, %rax
|
||||
+ orq %rsi, %rcx
|
||||
+ subq %rcx, %rax
|
||||
+ /* Fast path for return zero. */
|
||||
+ jnz L(ret_nonzero)
|
||||
/* No ymm register was touched. */
|
||||
ret
|
||||
|
||||
@@ -457,9 +456,33 @@ L(one_or_less):
|
||||
/* No ymm register was touched. */
|
||||
ret
|
||||
|
||||
+ .p2align 4,, 5
|
||||
+L(ret_nonzero):
|
||||
+ sbbl %eax, %eax
|
||||
+ orl $1, %eax
|
||||
+ /* No ymm register was touched. */
|
||||
+ ret
|
||||
+
|
||||
+ .p2align 4,, 2
|
||||
+L(zero):
|
||||
+ xorl %eax, %eax
|
||||
+ /* No ymm register was touched. */
|
||||
+ ret
|
||||
+
|
||||
.p2align 4
|
||||
L(between_8_15):
|
||||
-# endif
|
||||
+ movbe (%rdi), %rax
|
||||
+ movbe (%rsi), %rcx
|
||||
+ subq %rcx, %rax
|
||||
+ jnz L(ret_nonzero)
|
||||
+ movbe -8(%rdi, %rdx), %rax
|
||||
+ movbe -8(%rsi, %rdx), %rcx
|
||||
+ subq %rcx, %rax
|
||||
+ /* Fast path for return zero. */
|
||||
+ jnz L(ret_nonzero)
|
||||
+ /* No ymm register was touched. */
|
||||
+ ret
|
||||
+# else
|
||||
/* If USE_AS_WMEMCMP fall through into 8-15 byte case. */
|
||||
vmovq (%rdi), %xmm1
|
||||
vmovq (%rsi), %xmm2
|
||||
@@ -475,16 +498,13 @@ L(between_8_15):
|
||||
VPCMPEQ %xmm1, %xmm2, %xmm2
|
||||
vpmovmskb %xmm2, %eax
|
||||
subl $0xffff, %eax
|
||||
+ /* Fast path for return zero. */
|
||||
jnz L(return_vec_0)
|
||||
/* No ymm register was touched. */
|
||||
ret
|
||||
+# endif
|
||||
|
||||
- .p2align 4
|
||||
-L(zero):
|
||||
- xorl %eax, %eax
|
||||
- ret
|
||||
-
|
||||
- .p2align 4
|
||||
+ .p2align 4,, 10
|
||||
L(between_16_31):
|
||||
/* From 16 to 31 bytes. No branch when size == 16. */
|
||||
vmovdqu (%rsi), %xmm2
|
||||
@@ -501,11 +521,17 @@ L(between_16_31):
|
||||
VPCMPEQ (%rdi), %xmm2, %xmm2
|
||||
vpmovmskb %xmm2, %eax
|
||||
subl $0xffff, %eax
|
||||
+ /* Fast path for return zero. */
|
||||
jnz L(return_vec_0)
|
||||
/* No ymm register was touched. */
|
||||
ret
|
||||
|
||||
# ifdef USE_AS_WMEMCMP
|
||||
+ .p2align 4,, 2
|
||||
+L(zero):
|
||||
+ xorl %eax, %eax
|
||||
+ ret
|
||||
+
|
||||
.p2align 4
|
||||
L(one_or_less):
|
||||
jb L(zero)
|
||||
@@ -520,22 +546,20 @@ L(one_or_less):
|
||||
# else
|
||||
|
||||
.p2align 4
|
||||
-L(between_4_7):
|
||||
- /* Load as big endian with overlapping movbe to avoid branches.
|
||||
- */
|
||||
- movbe (%rdi), %eax
|
||||
- movbe (%rsi), %ecx
|
||||
- shlq $32, %rax
|
||||
- shlq $32, %rcx
|
||||
- movbe -4(%rdi, %rdx), %edi
|
||||
- movbe -4(%rsi, %rdx), %esi
|
||||
- orq %rdi, %rax
|
||||
- orq %rsi, %rcx
|
||||
- subq %rcx, %rax
|
||||
- jz L(zero_4_7)
|
||||
- sbbl %eax, %eax
|
||||
- orl $1, %eax
|
||||
-L(zero_4_7):
|
||||
+L(between_2_3):
|
||||
+ /* Load as big endian to avoid branches. */
|
||||
+ movzwl (%rdi), %eax
|
||||
+ movzwl (%rsi), %ecx
|
||||
+ bswap %eax
|
||||
+ bswap %ecx
|
||||
+ shrl %eax
|
||||
+ shrl %ecx
|
||||
+ movzbl -1(%rdi, %rdx), %edi
|
||||
+ movzbl -1(%rsi, %rdx), %esi
|
||||
+ orl %edi, %eax
|
||||
+ orl %esi, %ecx
|
||||
+ /* Subtraction is okay because the upper bit is zero. */
|
||||
+ subl %ecx, %eax
|
||||
/* No ymm register was touched. */
|
||||
ret
|
||||
# endif
|
865
glibc-upstream-2.34-233.patch
Normal file
865
glibc-upstream-2.34-233.patch
Normal file
@ -0,0 +1,865 @@
|
||||
commit 0a11305416e287d85c64f04337cfd64b6b350e0c
|
||||
Author: Noah Goldstein <goldstein.w.n@gmail.com>
|
||||
Date: Thu Apr 21 20:52:28 2022 -0500
|
||||
|
||||
x86: Optimize {str|wcs}rchr-sse2
|
||||
|
||||
The new code unrolls the main loop slightly without adding too much
|
||||
overhead and minimizes the comparisons for the search CHAR.
|
||||
|
||||
Geometric Mean of all benchmarks New / Old: 0.741
|
||||
See email for all results.
|
||||
|
||||
Full xcheck passes on x86_64 with and without multiarch enabled.
|
||||
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
|
||||
|
||||
(cherry picked from commit 5307aa9c1800f36a64c183c091c9af392c1fa75c)
|
||||
|
||||
diff --git a/sysdeps/x86_64/multiarch/strrchr-sse2.S b/sysdeps/x86_64/multiarch/strrchr-sse2.S
|
||||
index 67c30d0260cef8a3..a56300bc1830dedd 100644
|
||||
--- a/sysdeps/x86_64/multiarch/strrchr-sse2.S
|
||||
+++ b/sysdeps/x86_64/multiarch/strrchr-sse2.S
|
||||
@@ -17,7 +17,7 @@
|
||||
<https://www.gnu.org/licenses/>. */
|
||||
|
||||
#if IS_IN (libc)
|
||||
-# define strrchr __strrchr_sse2
|
||||
+# define STRRCHR __strrchr_sse2
|
||||
|
||||
# undef weak_alias
|
||||
# define weak_alias(strrchr, rindex)
|
||||
diff --git a/sysdeps/x86_64/multiarch/wcsrchr-sse2.S b/sysdeps/x86_64/multiarch/wcsrchr-sse2.S
|
||||
index a36034b40afe8d3d..00f69f2be77a43a0 100644
|
||||
--- a/sysdeps/x86_64/multiarch/wcsrchr-sse2.S
|
||||
+++ b/sysdeps/x86_64/multiarch/wcsrchr-sse2.S
|
||||
@@ -17,7 +17,6 @@
|
||||
<https://www.gnu.org/licenses/>. */
|
||||
|
||||
#if IS_IN (libc)
|
||||
-# define wcsrchr __wcsrchr_sse2
|
||||
+# define STRRCHR __wcsrchr_sse2
|
||||
#endif
|
||||
-
|
||||
#include "../wcsrchr.S"
|
||||
diff --git a/sysdeps/x86_64/strrchr.S b/sysdeps/x86_64/strrchr.S
|
||||
index dfd09fe9508cb5bc..fc1598bb11417fd5 100644
|
||||
--- a/sysdeps/x86_64/strrchr.S
|
||||
+++ b/sysdeps/x86_64/strrchr.S
|
||||
@@ -19,210 +19,360 @@
|
||||
|
||||
#include <sysdep.h>
|
||||
|
||||
+#ifndef STRRCHR
|
||||
+# define STRRCHR strrchr
|
||||
+#endif
|
||||
+
|
||||
+#ifdef USE_AS_WCSRCHR
|
||||
+# define PCMPEQ pcmpeqd
|
||||
+# define CHAR_SIZE 4
|
||||
+# define PMINU pminud
|
||||
+#else
|
||||
+# define PCMPEQ pcmpeqb
|
||||
+# define CHAR_SIZE 1
|
||||
+# define PMINU pminub
|
||||
+#endif
|
||||
+
|
||||
+#define PAGE_SIZE 4096
|
||||
+#define VEC_SIZE 16
|
||||
+
|
||||
.text
|
||||
-ENTRY (strrchr)
|
||||
- movd %esi, %xmm1
|
||||
+ENTRY(STRRCHR)
|
||||
+ movd %esi, %xmm0
|
||||
movq %rdi, %rax
|
||||
- andl $4095, %eax
|
||||
- punpcklbw %xmm1, %xmm1
|
||||
- cmpq $4032, %rax
|
||||
- punpcklwd %xmm1, %xmm1
|
||||
- pshufd $0, %xmm1, %xmm1
|
||||
+ andl $(PAGE_SIZE - 1), %eax
|
||||
+#ifndef USE_AS_WCSRCHR
|
||||
+ punpcklbw %xmm0, %xmm0
|
||||
+ punpcklwd %xmm0, %xmm0
|
||||
+#endif
|
||||
+ pshufd $0, %xmm0, %xmm0
|
||||
+ cmpl $(PAGE_SIZE - VEC_SIZE), %eax
|
||||
ja L(cross_page)
|
||||
- movdqu (%rdi), %xmm0
|
||||
+
|
||||
+L(cross_page_continue):
|
||||
+ movups (%rdi), %xmm1
|
||||
pxor %xmm2, %xmm2
|
||||
- movdqa %xmm0, %xmm3
|
||||
- pcmpeqb %xmm1, %xmm0
|
||||
- pcmpeqb %xmm2, %xmm3
|
||||
- pmovmskb %xmm0, %ecx
|
||||
- pmovmskb %xmm3, %edx
|
||||
- testq %rdx, %rdx
|
||||
- je L(next_48_bytes)
|
||||
- leaq -1(%rdx), %rax
|
||||
- xorq %rdx, %rax
|
||||
- andq %rcx, %rax
|
||||
- je L(exit)
|
||||
- bsrq %rax, %rax
|
||||
+ PCMPEQ %xmm1, %xmm2
|
||||
+ pmovmskb %xmm2, %ecx
|
||||
+ testl %ecx, %ecx
|
||||
+ jz L(aligned_more)
|
||||
+
|
||||
+ PCMPEQ %xmm0, %xmm1
|
||||
+ pmovmskb %xmm1, %eax
|
||||
+ leal -1(%rcx), %edx
|
||||
+ xorl %edx, %ecx
|
||||
+ andl %ecx, %eax
|
||||
+ jz L(ret0)
|
||||
+ bsrl %eax, %eax
|
||||
addq %rdi, %rax
|
||||
+ /* We are off by 3 for wcsrchr if search CHAR is non-zero. If
|
||||
+ search CHAR is zero we are correct. Either way `andq
|
||||
+ -CHAR_SIZE, %rax` gets the correct result. */
|
||||
+#ifdef USE_AS_WCSRCHR
|
||||
+ andq $-CHAR_SIZE, %rax
|
||||
+#endif
|
||||
+L(ret0):
|
||||
ret
|
||||
|
||||
+ /* Returns for first vec x1/x2 have hard coded backward search
|
||||
+ path for earlier matches. */
|
||||
.p2align 4
|
||||
-L(next_48_bytes):
|
||||
- movdqu 16(%rdi), %xmm4
|
||||
- movdqa %xmm4, %xmm5
|
||||
- movdqu 32(%rdi), %xmm3
|
||||
- pcmpeqb %xmm1, %xmm4
|
||||
- pcmpeqb %xmm2, %xmm5
|
||||
- movdqu 48(%rdi), %xmm0
|
||||
- pmovmskb %xmm5, %edx
|
||||
- movdqa %xmm3, %xmm5
|
||||
- pcmpeqb %xmm1, %xmm3
|
||||
- pcmpeqb %xmm2, %xmm5
|
||||
- pcmpeqb %xmm0, %xmm2
|
||||
- salq $16, %rdx
|
||||
- pmovmskb %xmm3, %r8d
|
||||
- pmovmskb %xmm5, %eax
|
||||
- pmovmskb %xmm2, %esi
|
||||
- salq $32, %r8
|
||||
- salq $32, %rax
|
||||
- pcmpeqb %xmm1, %xmm0
|
||||
- orq %rdx, %rax
|
||||
- movq %rsi, %rdx
|
||||
- pmovmskb %xmm4, %esi
|
||||
- salq $48, %rdx
|
||||
- salq $16, %rsi
|
||||
- orq %r8, %rsi
|
||||
- orq %rcx, %rsi
|
||||
- pmovmskb %xmm0, %ecx
|
||||
- salq $48, %rcx
|
||||
- orq %rcx, %rsi
|
||||
- orq %rdx, %rax
|
||||
- je L(loop_header2)
|
||||
- leaq -1(%rax), %rcx
|
||||
- xorq %rax, %rcx
|
||||
- andq %rcx, %rsi
|
||||
- je L(exit)
|
||||
- bsrq %rsi, %rsi
|
||||
- leaq (%rdi,%rsi), %rax
|
||||
+L(first_vec_x0_test):
|
||||
+ PCMPEQ %xmm0, %xmm1
|
||||
+ pmovmskb %xmm1, %eax
|
||||
+ testl %eax, %eax
|
||||
+ jz L(ret0)
|
||||
+ bsrl %eax, %eax
|
||||
+ addq %r8, %rax
|
||||
+#ifdef USE_AS_WCSRCHR
|
||||
+ andq $-CHAR_SIZE, %rax
|
||||
+#endif
|
||||
ret
|
||||
|
||||
.p2align 4
|
||||
-L(loop_header2):
|
||||
- testq %rsi, %rsi
|
||||
- movq %rdi, %rcx
|
||||
- je L(no_c_found)
|
||||
-L(loop_header):
|
||||
- addq $64, %rdi
|
||||
- pxor %xmm7, %xmm7
|
||||
- andq $-64, %rdi
|
||||
- jmp L(loop_entry)
|
||||
+L(first_vec_x1):
|
||||
+ PCMPEQ %xmm0, %xmm2
|
||||
+ pmovmskb %xmm2, %eax
|
||||
+ leal -1(%rcx), %edx
|
||||
+ xorl %edx, %ecx
|
||||
+ andl %ecx, %eax
|
||||
+ jz L(first_vec_x0_test)
|
||||
+ bsrl %eax, %eax
|
||||
+ leaq (VEC_SIZE)(%rdi, %rax), %rax
|
||||
+#ifdef USE_AS_WCSRCHR
|
||||
+ andq $-CHAR_SIZE, %rax
|
||||
+#endif
|
||||
+ ret
|
||||
|
||||
.p2align 4
|
||||
-L(loop64):
|
||||
- testq %rdx, %rdx
|
||||
- cmovne %rdx, %rsi
|
||||
- cmovne %rdi, %rcx
|
||||
- addq $64, %rdi
|
||||
-L(loop_entry):
|
||||
- movdqa 32(%rdi), %xmm3
|
||||
- pxor %xmm6, %xmm6
|
||||
- movdqa 48(%rdi), %xmm2
|
||||
- movdqa %xmm3, %xmm0
|
||||
- movdqa 16(%rdi), %xmm4
|
||||
- pminub %xmm2, %xmm0
|
||||
- movdqa (%rdi), %xmm5
|
||||
- pminub %xmm4, %xmm0
|
||||
- pminub %xmm5, %xmm0
|
||||
- pcmpeqb %xmm7, %xmm0
|
||||
- pmovmskb %xmm0, %eax
|
||||
- movdqa %xmm5, %xmm0
|
||||
- pcmpeqb %xmm1, %xmm0
|
||||
- pmovmskb %xmm0, %r9d
|
||||
- movdqa %xmm4, %xmm0
|
||||
- pcmpeqb %xmm1, %xmm0
|
||||
- pmovmskb %xmm0, %edx
|
||||
- movdqa %xmm3, %xmm0
|
||||
- pcmpeqb %xmm1, %xmm0
|
||||
- salq $16, %rdx
|
||||
- pmovmskb %xmm0, %r10d
|
||||
- movdqa %xmm2, %xmm0
|
||||
- pcmpeqb %xmm1, %xmm0
|
||||
- salq $32, %r10
|
||||
- orq %r10, %rdx
|
||||
- pmovmskb %xmm0, %r8d
|
||||
- orq %r9, %rdx
|
||||
- salq $48, %r8
|
||||
- orq %r8, %rdx
|
||||
+L(first_vec_x1_test):
|
||||
+ PCMPEQ %xmm0, %xmm2
|
||||
+ pmovmskb %xmm2, %eax
|
||||
testl %eax, %eax
|
||||
- je L(loop64)
|
||||
- pcmpeqb %xmm6, %xmm4
|
||||
- pcmpeqb %xmm6, %xmm3
|
||||
- pcmpeqb %xmm6, %xmm5
|
||||
- pmovmskb %xmm4, %eax
|
||||
- pmovmskb %xmm3, %r10d
|
||||
- pcmpeqb %xmm6, %xmm2
|
||||
- pmovmskb %xmm5, %r9d
|
||||
- salq $32, %r10
|
||||
- salq $16, %rax
|
||||
- pmovmskb %xmm2, %r8d
|
||||
- orq %r10, %rax
|
||||
- orq %r9, %rax
|
||||
- salq $48, %r8
|
||||
- orq %r8, %rax
|
||||
- leaq -1(%rax), %r8
|
||||
- xorq %rax, %r8
|
||||
- andq %r8, %rdx
|
||||
- cmovne %rdi, %rcx
|
||||
- cmovne %rdx, %rsi
|
||||
- bsrq %rsi, %rsi
|
||||
- leaq (%rcx,%rsi), %rax
|
||||
+ jz L(first_vec_x0_test)
|
||||
+ bsrl %eax, %eax
|
||||
+ leaq (VEC_SIZE)(%rdi, %rax), %rax
|
||||
+#ifdef USE_AS_WCSRCHR
|
||||
+ andq $-CHAR_SIZE, %rax
|
||||
+#endif
|
||||
+ ret
|
||||
+
|
||||
+ .p2align 4
|
||||
+L(first_vec_x2):
|
||||
+ PCMPEQ %xmm0, %xmm3
|
||||
+ pmovmskb %xmm3, %eax
|
||||
+ leal -1(%rcx), %edx
|
||||
+ xorl %edx, %ecx
|
||||
+ andl %ecx, %eax
|
||||
+ jz L(first_vec_x1_test)
|
||||
+ bsrl %eax, %eax
|
||||
+ leaq (VEC_SIZE * 2)(%rdi, %rax), %rax
|
||||
+#ifdef USE_AS_WCSRCHR
|
||||
+ andq $-CHAR_SIZE, %rax
|
||||
+#endif
|
||||
+ ret
|
||||
+
|
||||
+ .p2align 4
|
||||
+L(aligned_more):
|
||||
+ /* Save original pointer if match was in VEC 0. */
|
||||
+ movq %rdi, %r8
|
||||
+ andq $-VEC_SIZE, %rdi
|
||||
+
|
||||
+ movaps VEC_SIZE(%rdi), %xmm2
|
||||
+ pxor %xmm3, %xmm3
|
||||
+ PCMPEQ %xmm2, %xmm3
|
||||
+ pmovmskb %xmm3, %ecx
|
||||
+ testl %ecx, %ecx
|
||||
+ jnz L(first_vec_x1)
|
||||
+
|
||||
+ movaps (VEC_SIZE * 2)(%rdi), %xmm3
|
||||
+ pxor %xmm4, %xmm4
|
||||
+ PCMPEQ %xmm3, %xmm4
|
||||
+ pmovmskb %xmm4, %ecx
|
||||
+ testl %ecx, %ecx
|
||||
+ jnz L(first_vec_x2)
|
||||
+
|
||||
+ addq $VEC_SIZE, %rdi
|
||||
+ /* Save pointer again before realigning. */
|
||||
+ movq %rdi, %rsi
|
||||
+ andq $-(VEC_SIZE * 2), %rdi
|
||||
+ .p2align 4
|
||||
+L(first_loop):
|
||||
+ /* Do 2x VEC at a time. */
|
||||
+ movaps (VEC_SIZE * 2)(%rdi), %xmm4
|
||||
+ movaps (VEC_SIZE * 3)(%rdi), %xmm5
|
||||
+ /* Since SSE2 no pminud so wcsrchr needs seperate logic for
|
||||
+ detecting zero. Note if this is found to be a bottleneck it
|
||||
+ may be worth adding an SSE4.1 wcsrchr implementation. */
|
||||
+#ifdef USE_AS_WCSRCHR
|
||||
+ movaps %xmm5, %xmm6
|
||||
+ pxor %xmm8, %xmm8
|
||||
+
|
||||
+ PCMPEQ %xmm8, %xmm5
|
||||
+ PCMPEQ %xmm4, %xmm8
|
||||
+ por %xmm5, %xmm8
|
||||
+#else
|
||||
+ movaps %xmm5, %xmm6
|
||||
+ PMINU %xmm4, %xmm5
|
||||
+#endif
|
||||
+
|
||||
+ movaps %xmm4, %xmm9
|
||||
+ PCMPEQ %xmm0, %xmm4
|
||||
+ PCMPEQ %xmm0, %xmm6
|
||||
+ movaps %xmm6, %xmm7
|
||||
+ por %xmm4, %xmm6
|
||||
+#ifndef USE_AS_WCSRCHR
|
||||
+ pxor %xmm8, %xmm8
|
||||
+ PCMPEQ %xmm5, %xmm8
|
||||
+#endif
|
||||
+ pmovmskb %xmm8, %ecx
|
||||
+ pmovmskb %xmm6, %eax
|
||||
+
|
||||
+ addq $(VEC_SIZE * 2), %rdi
|
||||
+ /* Use `addl` 1) so we can undo it with `subl` and 2) it can
|
||||
+ macro-fuse with `jz`. */
|
||||
+ addl %ecx, %eax
|
||||
+ jz L(first_loop)
|
||||
+
|
||||
+ /* Check if there is zero match. */
|
||||
+ testl %ecx, %ecx
|
||||
+ jz L(second_loop_match)
|
||||
+
|
||||
+ /* Check if there was a match in last iteration. */
|
||||
+ subl %ecx, %eax
|
||||
+ jnz L(new_match)
|
||||
+
|
||||
+L(first_loop_old_match):
|
||||
+ PCMPEQ %xmm0, %xmm2
|
||||
+ PCMPEQ %xmm0, %xmm3
|
||||
+ pmovmskb %xmm2, %ecx
|
||||
+ pmovmskb %xmm3, %eax
|
||||
+ addl %eax, %ecx
|
||||
+ jz L(first_vec_x0_test)
|
||||
+ /* NB: We could move this shift to before the branch and save a
|
||||
+ bit of code size / performance on the fall through. The
|
||||
+ branch leads to the null case which generally seems hotter
|
||||
+ than char in first 3x VEC. */
|
||||
+ sall $16, %eax
|
||||
+ orl %ecx, %eax
|
||||
+
|
||||
+ bsrl %eax, %eax
|
||||
+ addq %rsi, %rax
|
||||
+#ifdef USE_AS_WCSRCHR
|
||||
+ andq $-CHAR_SIZE, %rax
|
||||
+#endif
|
||||
+ ret
|
||||
+
|
||||
+ .p2align 4
|
||||
+L(new_match):
|
||||
+ pxor %xmm6, %xmm6
|
||||
+ PCMPEQ %xmm9, %xmm6
|
||||
+ pmovmskb %xmm6, %eax
|
||||
+ sall $16, %ecx
|
||||
+ orl %eax, %ecx
|
||||
+
|
||||
+ /* We can't reuse either of the old comparisons as since we mask
|
||||
+ of zeros after first zero (instead of using the full
|
||||
+ comparison) we can't gurantee no interference between match
|
||||
+ after end of string and valid match. */
|
||||
+ pmovmskb %xmm4, %eax
|
||||
+ pmovmskb %xmm7, %edx
|
||||
+ sall $16, %edx
|
||||
+ orl %edx, %eax
|
||||
+
|
||||
+ leal -1(%ecx), %edx
|
||||
+ xorl %edx, %ecx
|
||||
+ andl %ecx, %eax
|
||||
+ jz L(first_loop_old_match)
|
||||
+ bsrl %eax, %eax
|
||||
+ addq %rdi, %rax
|
||||
+#ifdef USE_AS_WCSRCHR
|
||||
+ andq $-CHAR_SIZE, %rax
|
||||
+#endif
|
||||
ret
|
||||
|
||||
+ /* Save minimum state for getting most recent match. We can
|
||||
+ throw out all previous work. */
|
||||
.p2align 4
|
||||
-L(no_c_found):
|
||||
- movl $1, %esi
|
||||
- xorl %ecx, %ecx
|
||||
- jmp L(loop_header)
|
||||
+L(second_loop_match):
|
||||
+ movq %rdi, %rsi
|
||||
+ movaps %xmm4, %xmm2
|
||||
+ movaps %xmm7, %xmm3
|
||||
|
||||
.p2align 4
|
||||
-L(exit):
|
||||
- xorl %eax, %eax
|
||||
+L(second_loop):
|
||||
+ movaps (VEC_SIZE * 2)(%rdi), %xmm4
|
||||
+ movaps (VEC_SIZE * 3)(%rdi), %xmm5
|
||||
+ /* Since SSE2 no pminud so wcsrchr needs seperate logic for
|
||||
+ detecting zero. Note if this is found to be a bottleneck it
|
||||
+ may be worth adding an SSE4.1 wcsrchr implementation. */
|
||||
+#ifdef USE_AS_WCSRCHR
|
||||
+ movaps %xmm5, %xmm6
|
||||
+ pxor %xmm8, %xmm8
|
||||
+
|
||||
+ PCMPEQ %xmm8, %xmm5
|
||||
+ PCMPEQ %xmm4, %xmm8
|
||||
+ por %xmm5, %xmm8
|
||||
+#else
|
||||
+ movaps %xmm5, %xmm6
|
||||
+ PMINU %xmm4, %xmm5
|
||||
+#endif
|
||||
+
|
||||
+ movaps %xmm4, %xmm9
|
||||
+ PCMPEQ %xmm0, %xmm4
|
||||
+ PCMPEQ %xmm0, %xmm6
|
||||
+ movaps %xmm6, %xmm7
|
||||
+ por %xmm4, %xmm6
|
||||
+#ifndef USE_AS_WCSRCHR
|
||||
+ pxor %xmm8, %xmm8
|
||||
+ PCMPEQ %xmm5, %xmm8
|
||||
+#endif
|
||||
+
|
||||
+ pmovmskb %xmm8, %ecx
|
||||
+ pmovmskb %xmm6, %eax
|
||||
+
|
||||
+ addq $(VEC_SIZE * 2), %rdi
|
||||
+ /* Either null term or new occurence of CHAR. */
|
||||
+ addl %ecx, %eax
|
||||
+ jz L(second_loop)
|
||||
+
|
||||
+ /* No null term so much be new occurence of CHAR. */
|
||||
+ testl %ecx, %ecx
|
||||
+ jz L(second_loop_match)
|
||||
+
|
||||
+
|
||||
+ subl %ecx, %eax
|
||||
+ jnz L(second_loop_new_match)
|
||||
+
|
||||
+L(second_loop_old_match):
|
||||
+ pmovmskb %xmm2, %ecx
|
||||
+ pmovmskb %xmm3, %eax
|
||||
+ sall $16, %eax
|
||||
+ orl %ecx, %eax
|
||||
+ bsrl %eax, %eax
|
||||
+ addq %rsi, %rax
|
||||
+#ifdef USE_AS_WCSRCHR
|
||||
+ andq $-CHAR_SIZE, %rax
|
||||
+#endif
|
||||
ret
|
||||
|
||||
.p2align 4
|
||||
+L(second_loop_new_match):
|
||||
+ pxor %xmm6, %xmm6
|
||||
+ PCMPEQ %xmm9, %xmm6
|
||||
+ pmovmskb %xmm6, %eax
|
||||
+ sall $16, %ecx
|
||||
+ orl %eax, %ecx
|
||||
+
|
||||
+ /* We can't reuse either of the old comparisons as since we mask
|
||||
+ of zeros after first zero (instead of using the full
|
||||
+ comparison) we can't gurantee no interference between match
|
||||
+ after end of string and valid match. */
|
||||
+ pmovmskb %xmm4, %eax
|
||||
+ pmovmskb %xmm7, %edx
|
||||
+ sall $16, %edx
|
||||
+ orl %edx, %eax
|
||||
+
|
||||
+ leal -1(%ecx), %edx
|
||||
+ xorl %edx, %ecx
|
||||
+ andl %ecx, %eax
|
||||
+ jz L(second_loop_old_match)
|
||||
+ bsrl %eax, %eax
|
||||
+ addq %rdi, %rax
|
||||
+#ifdef USE_AS_WCSRCHR
|
||||
+ andq $-CHAR_SIZE, %rax
|
||||
+#endif
|
||||
+ ret
|
||||
+
|
||||
+ .p2align 4,, 4
|
||||
L(cross_page):
|
||||
- movq %rdi, %rax
|
||||
- pxor %xmm0, %xmm0
|
||||
- andq $-64, %rax
|
||||
- movdqu (%rax), %xmm5
|
||||
- movdqa %xmm5, %xmm6
|
||||
- movdqu 16(%rax), %xmm4
|
||||
- pcmpeqb %xmm1, %xmm5
|
||||
- pcmpeqb %xmm0, %xmm6
|
||||
- movdqu 32(%rax), %xmm3
|
||||
- pmovmskb %xmm6, %esi
|
||||
- movdqa %xmm4, %xmm6
|
||||
- movdqu 48(%rax), %xmm2
|
||||
- pcmpeqb %xmm1, %xmm4
|
||||
- pcmpeqb %xmm0, %xmm6
|
||||
- pmovmskb %xmm6, %edx
|
||||
- movdqa %xmm3, %xmm6
|
||||
- pcmpeqb %xmm1, %xmm3
|
||||
- pcmpeqb %xmm0, %xmm6
|
||||
- pcmpeqb %xmm2, %xmm0
|
||||
- salq $16, %rdx
|
||||
- pmovmskb %xmm3, %r9d
|
||||
- pmovmskb %xmm6, %r8d
|
||||
- pmovmskb %xmm0, %ecx
|
||||
- salq $32, %r9
|
||||
- salq $32, %r8
|
||||
- pcmpeqb %xmm1, %xmm2
|
||||
- orq %r8, %rdx
|
||||
- salq $48, %rcx
|
||||
- pmovmskb %xmm5, %r8d
|
||||
- orq %rsi, %rdx
|
||||
- pmovmskb %xmm4, %esi
|
||||
- orq %rcx, %rdx
|
||||
- pmovmskb %xmm2, %ecx
|
||||
- salq $16, %rsi
|
||||
- salq $48, %rcx
|
||||
- orq %r9, %rsi
|
||||
- orq %r8, %rsi
|
||||
- orq %rcx, %rsi
|
||||
+ movq %rdi, %rsi
|
||||
+ andq $-VEC_SIZE, %rsi
|
||||
+ movaps (%rsi), %xmm1
|
||||
+ pxor %xmm2, %xmm2
|
||||
+ PCMPEQ %xmm1, %xmm2
|
||||
+ pmovmskb %xmm2, %edx
|
||||
movl %edi, %ecx
|
||||
- subl %eax, %ecx
|
||||
- shrq %cl, %rdx
|
||||
- shrq %cl, %rsi
|
||||
- testq %rdx, %rdx
|
||||
- je L(loop_header2)
|
||||
- leaq -1(%rdx), %rax
|
||||
- xorq %rdx, %rax
|
||||
- andq %rax, %rsi
|
||||
- je L(exit)
|
||||
- bsrq %rsi, %rax
|
||||
+ andl $(VEC_SIZE - 1), %ecx
|
||||
+ sarl %cl, %edx
|
||||
+ jz L(cross_page_continue)
|
||||
+ PCMPEQ %xmm0, %xmm1
|
||||
+ pmovmskb %xmm1, %eax
|
||||
+ sarl %cl, %eax
|
||||
+ leal -1(%rdx), %ecx
|
||||
+ xorl %edx, %ecx
|
||||
+ andl %ecx, %eax
|
||||
+ jz L(ret1)
|
||||
+ bsrl %eax, %eax
|
||||
addq %rdi, %rax
|
||||
+#ifdef USE_AS_WCSRCHR
|
||||
+ andq $-CHAR_SIZE, %rax
|
||||
+#endif
|
||||
+L(ret1):
|
||||
ret
|
||||
-END (strrchr)
|
||||
+END(STRRCHR)
|
||||
|
||||
-weak_alias (strrchr, rindex)
|
||||
-libc_hidden_builtin_def (strrchr)
|
||||
+#ifndef USE_AS_WCSRCHR
|
||||
+ weak_alias (STRRCHR, rindex)
|
||||
+ libc_hidden_builtin_def (STRRCHR)
|
||||
+#endif
|
||||
diff --git a/sysdeps/x86_64/wcsrchr.S b/sysdeps/x86_64/wcsrchr.S
|
||||
index 6b318d3f29de9a9e..9006f2220963d76c 100644
|
||||
--- a/sysdeps/x86_64/wcsrchr.S
|
||||
+++ b/sysdeps/x86_64/wcsrchr.S
|
||||
@@ -17,266 +17,12 @@
|
||||
License along with the GNU C Library; if not, see
|
||||
<https://www.gnu.org/licenses/>. */
|
||||
|
||||
-#include <sysdep.h>
|
||||
|
||||
- .text
|
||||
-ENTRY (wcsrchr)
|
||||
+#define USE_AS_WCSRCHR 1
|
||||
+#define NO_PMINU 1
|
||||
|
||||
- movd %rsi, %xmm1
|
||||
- mov %rdi, %rcx
|
||||
- punpckldq %xmm1, %xmm1
|
||||
- pxor %xmm2, %xmm2
|
||||
- punpckldq %xmm1, %xmm1
|
||||
- and $63, %rcx
|
||||
- cmp $48, %rcx
|
||||
- ja L(crosscache)
|
||||
+#ifndef STRRCHR
|
||||
+# define STRRCHR wcsrchr
|
||||
+#endif
|
||||
|
||||
- movdqu (%rdi), %xmm0
|
||||
- pcmpeqd %xmm0, %xmm2
|
||||
- pcmpeqd %xmm1, %xmm0
|
||||
- pmovmskb %xmm2, %rcx
|
||||
- pmovmskb %xmm0, %rax
|
||||
- add $16, %rdi
|
||||
-
|
||||
- test %rax, %rax
|
||||
- jnz L(unaligned_match1)
|
||||
-
|
||||
- test %rcx, %rcx
|
||||
- jnz L(return_null)
|
||||
-
|
||||
- and $-16, %rdi
|
||||
- xor %r8, %r8
|
||||
- jmp L(loop)
|
||||
-
|
||||
- .p2align 4
|
||||
-L(unaligned_match1):
|
||||
- test %rcx, %rcx
|
||||
- jnz L(prolog_find_zero_1)
|
||||
-
|
||||
- mov %rax, %r8
|
||||
- mov %rdi, %rsi
|
||||
- and $-16, %rdi
|
||||
- jmp L(loop)
|
||||
-
|
||||
- .p2align 4
|
||||
-L(crosscache):
|
||||
- and $15, %rcx
|
||||
- and $-16, %rdi
|
||||
- pxor %xmm3, %xmm3
|
||||
- movdqa (%rdi), %xmm0
|
||||
- pcmpeqd %xmm0, %xmm3
|
||||
- pcmpeqd %xmm1, %xmm0
|
||||
- pmovmskb %xmm3, %rdx
|
||||
- pmovmskb %xmm0, %rax
|
||||
- shr %cl, %rdx
|
||||
- shr %cl, %rax
|
||||
- add $16, %rdi
|
||||
-
|
||||
- test %rax, %rax
|
||||
- jnz L(unaligned_match)
|
||||
-
|
||||
- test %rdx, %rdx
|
||||
- jnz L(return_null)
|
||||
-
|
||||
- xor %r8, %r8
|
||||
- jmp L(loop)
|
||||
-
|
||||
- .p2align 4
|
||||
-L(unaligned_match):
|
||||
- test %rdx, %rdx
|
||||
- jnz L(prolog_find_zero)
|
||||
-
|
||||
- mov %rax, %r8
|
||||
- lea (%rdi, %rcx), %rsi
|
||||
-
|
||||
-/* Loop start on aligned string. */
|
||||
- .p2align 4
|
||||
-L(loop):
|
||||
- movdqa (%rdi), %xmm0
|
||||
- pcmpeqd %xmm0, %xmm2
|
||||
- add $16, %rdi
|
||||
- pcmpeqd %xmm1, %xmm0
|
||||
- pmovmskb %xmm2, %rcx
|
||||
- pmovmskb %xmm0, %rax
|
||||
- or %rax, %rcx
|
||||
- jnz L(matches)
|
||||
-
|
||||
- movdqa (%rdi), %xmm3
|
||||
- pcmpeqd %xmm3, %xmm2
|
||||
- add $16, %rdi
|
||||
- pcmpeqd %xmm1, %xmm3
|
||||
- pmovmskb %xmm2, %rcx
|
||||
- pmovmskb %xmm3, %rax
|
||||
- or %rax, %rcx
|
||||
- jnz L(matches)
|
||||
-
|
||||
- movdqa (%rdi), %xmm4
|
||||
- pcmpeqd %xmm4, %xmm2
|
||||
- add $16, %rdi
|
||||
- pcmpeqd %xmm1, %xmm4
|
||||
- pmovmskb %xmm2, %rcx
|
||||
- pmovmskb %xmm4, %rax
|
||||
- or %rax, %rcx
|
||||
- jnz L(matches)
|
||||
-
|
||||
- movdqa (%rdi), %xmm5
|
||||
- pcmpeqd %xmm5, %xmm2
|
||||
- add $16, %rdi
|
||||
- pcmpeqd %xmm1, %xmm5
|
||||
- pmovmskb %xmm2, %rcx
|
||||
- pmovmskb %xmm5, %rax
|
||||
- or %rax, %rcx
|
||||
- jz L(loop)
|
||||
-
|
||||
- .p2align 4
|
||||
-L(matches):
|
||||
- test %rax, %rax
|
||||
- jnz L(match)
|
||||
-L(return_value):
|
||||
- test %r8, %r8
|
||||
- jz L(return_null)
|
||||
- mov %r8, %rax
|
||||
- mov %rsi, %rdi
|
||||
-
|
||||
- test $15 << 4, %ah
|
||||
- jnz L(match_fourth_wchar)
|
||||
- test %ah, %ah
|
||||
- jnz L(match_third_wchar)
|
||||
- test $15 << 4, %al
|
||||
- jnz L(match_second_wchar)
|
||||
- lea -16(%rdi), %rax
|
||||
- ret
|
||||
-
|
||||
- .p2align 4
|
||||
-L(match):
|
||||
- pmovmskb %xmm2, %rcx
|
||||
- test %rcx, %rcx
|
||||
- jnz L(find_zero)
|
||||
- mov %rax, %r8
|
||||
- mov %rdi, %rsi
|
||||
- jmp L(loop)
|
||||
-
|
||||
- .p2align 4
|
||||
-L(find_zero):
|
||||
- test $15, %cl
|
||||
- jnz L(find_zero_in_first_wchar)
|
||||
- test %cl, %cl
|
||||
- jnz L(find_zero_in_second_wchar)
|
||||
- test $15, %ch
|
||||
- jnz L(find_zero_in_third_wchar)
|
||||
-
|
||||
- and $1 << 13 - 1, %rax
|
||||
- jz L(return_value)
|
||||
-
|
||||
- test $15 << 4, %ah
|
||||
- jnz L(match_fourth_wchar)
|
||||
- test %ah, %ah
|
||||
- jnz L(match_third_wchar)
|
||||
- test $15 << 4, %al
|
||||
- jnz L(match_second_wchar)
|
||||
- lea -16(%rdi), %rax
|
||||
- ret
|
||||
-
|
||||
- .p2align 4
|
||||
-L(find_zero_in_first_wchar):
|
||||
- test $1, %rax
|
||||
- jz L(return_value)
|
||||
- lea -16(%rdi), %rax
|
||||
- ret
|
||||
-
|
||||
- .p2align 4
|
||||
-L(find_zero_in_second_wchar):
|
||||
- and $1 << 5 - 1, %rax
|
||||
- jz L(return_value)
|
||||
-
|
||||
- test $15 << 4, %al
|
||||
- jnz L(match_second_wchar)
|
||||
- lea -16(%rdi), %rax
|
||||
- ret
|
||||
-
|
||||
- .p2align 4
|
||||
-L(find_zero_in_third_wchar):
|
||||
- and $1 << 9 - 1, %rax
|
||||
- jz L(return_value)
|
||||
-
|
||||
- test %ah, %ah
|
||||
- jnz L(match_third_wchar)
|
||||
- test $15 << 4, %al
|
||||
- jnz L(match_second_wchar)
|
||||
- lea -16(%rdi), %rax
|
||||
- ret
|
||||
-
|
||||
- .p2align 4
|
||||
-L(prolog_find_zero):
|
||||
- add %rcx, %rdi
|
||||
- mov %rdx, %rcx
|
||||
-L(prolog_find_zero_1):
|
||||
- test $15, %cl
|
||||
- jnz L(prolog_find_zero_in_first_wchar)
|
||||
- test %cl, %cl
|
||||
- jnz L(prolog_find_zero_in_second_wchar)
|
||||
- test $15, %ch
|
||||
- jnz L(prolog_find_zero_in_third_wchar)
|
||||
-
|
||||
- and $1 << 13 - 1, %rax
|
||||
- jz L(return_null)
|
||||
-
|
||||
- test $15 << 4, %ah
|
||||
- jnz L(match_fourth_wchar)
|
||||
- test %ah, %ah
|
||||
- jnz L(match_third_wchar)
|
||||
- test $15 << 4, %al
|
||||
- jnz L(match_second_wchar)
|
||||
- lea -16(%rdi), %rax
|
||||
- ret
|
||||
-
|
||||
- .p2align 4
|
||||
-L(prolog_find_zero_in_first_wchar):
|
||||
- test $1, %rax
|
||||
- jz L(return_null)
|
||||
- lea -16(%rdi), %rax
|
||||
- ret
|
||||
-
|
||||
- .p2align 4
|
||||
-L(prolog_find_zero_in_second_wchar):
|
||||
- and $1 << 5 - 1, %rax
|
||||
- jz L(return_null)
|
||||
-
|
||||
- test $15 << 4, %al
|
||||
- jnz L(match_second_wchar)
|
||||
- lea -16(%rdi), %rax
|
||||
- ret
|
||||
-
|
||||
- .p2align 4
|
||||
-L(prolog_find_zero_in_third_wchar):
|
||||
- and $1 << 9 - 1, %rax
|
||||
- jz L(return_null)
|
||||
-
|
||||
- test %ah, %ah
|
||||
- jnz L(match_third_wchar)
|
||||
- test $15 << 4, %al
|
||||
- jnz L(match_second_wchar)
|
||||
- lea -16(%rdi), %rax
|
||||
- ret
|
||||
-
|
||||
- .p2align 4
|
||||
-L(match_second_wchar):
|
||||
- lea -12(%rdi), %rax
|
||||
- ret
|
||||
-
|
||||
- .p2align 4
|
||||
-L(match_third_wchar):
|
||||
- lea -8(%rdi), %rax
|
||||
- ret
|
||||
-
|
||||
- .p2align 4
|
||||
-L(match_fourth_wchar):
|
||||
- lea -4(%rdi), %rax
|
||||
- ret
|
||||
-
|
||||
- .p2align 4
|
||||
-L(return_null):
|
||||
- xor %rax, %rax
|
||||
- ret
|
||||
-
|
||||
-END (wcsrchr)
|
||||
+#include "../strrchr.S"
|
497
glibc-upstream-2.34-234.patch
Normal file
497
glibc-upstream-2.34-234.patch
Normal file
@ -0,0 +1,497 @@
|
||||
commit 00f09a14d2818f438959e764834abb3913f2b20a
|
||||
Author: Noah Goldstein <goldstein.w.n@gmail.com>
|
||||
Date: Thu Apr 21 20:52:29 2022 -0500
|
||||
|
||||
x86: Optimize {str|wcs}rchr-avx2
|
||||
|
||||
The new code unrolls the main loop slightly without adding too much
|
||||
overhead and minimizes the comparisons for the search CHAR.
|
||||
|
||||
Geometric Mean of all benchmarks New / Old: 0.832
|
||||
See email for all results.
|
||||
|
||||
Full xcheck passes on x86_64 with and without multiarch enabled.
|
||||
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
|
||||
|
||||
(cherry picked from commit df7e295d18ffa34f629578c0017a9881af7620f6)
|
||||
|
||||
diff --git a/sysdeps/x86_64/multiarch/strrchr-avx2.S b/sysdeps/x86_64/multiarch/strrchr-avx2.S
|
||||
index 0deba97114d3b83d..b8dec737d5213b25 100644
|
||||
--- a/sysdeps/x86_64/multiarch/strrchr-avx2.S
|
||||
+++ b/sysdeps/x86_64/multiarch/strrchr-avx2.S
|
||||
@@ -27,9 +27,13 @@
|
||||
# ifdef USE_AS_WCSRCHR
|
||||
# define VPBROADCAST vpbroadcastd
|
||||
# define VPCMPEQ vpcmpeqd
|
||||
+# define VPMIN vpminud
|
||||
+# define CHAR_SIZE 4
|
||||
# else
|
||||
# define VPBROADCAST vpbroadcastb
|
||||
# define VPCMPEQ vpcmpeqb
|
||||
+# define VPMIN vpminub
|
||||
+# define CHAR_SIZE 1
|
||||
# endif
|
||||
|
||||
# ifndef VZEROUPPER
|
||||
@@ -41,196 +45,304 @@
|
||||
# endif
|
||||
|
||||
# define VEC_SIZE 32
|
||||
+# define PAGE_SIZE 4096
|
||||
|
||||
- .section SECTION(.text),"ax",@progbits
|
||||
-ENTRY (STRRCHR)
|
||||
- movd %esi, %xmm4
|
||||
- movl %edi, %ecx
|
||||
+ .section SECTION(.text), "ax", @progbits
|
||||
+ENTRY(STRRCHR)
|
||||
+ movd %esi, %xmm7
|
||||
+ movl %edi, %eax
|
||||
/* Broadcast CHAR to YMM4. */
|
||||
- VPBROADCAST %xmm4, %ymm4
|
||||
+ VPBROADCAST %xmm7, %ymm7
|
||||
vpxor %xmm0, %xmm0, %xmm0
|
||||
|
||||
- /* Check if we may cross page boundary with one vector load. */
|
||||
- andl $(2 * VEC_SIZE - 1), %ecx
|
||||
- cmpl $VEC_SIZE, %ecx
|
||||
- ja L(cros_page_boundary)
|
||||
+ /* Shift here instead of `andl` to save code size (saves a fetch
|
||||
+ block). */
|
||||
+ sall $20, %eax
|
||||
+ cmpl $((PAGE_SIZE - VEC_SIZE) << 20), %eax
|
||||
+ ja L(cross_page)
|
||||
|
||||
+L(page_cross_continue):
|
||||
vmovdqu (%rdi), %ymm1
|
||||
- VPCMPEQ %ymm1, %ymm0, %ymm2
|
||||
- VPCMPEQ %ymm1, %ymm4, %ymm3
|
||||
- vpmovmskb %ymm2, %ecx
|
||||
- vpmovmskb %ymm3, %eax
|
||||
- addq $VEC_SIZE, %rdi
|
||||
+ /* Check end of string match. */
|
||||
+ VPCMPEQ %ymm1, %ymm0, %ymm6
|
||||
+ vpmovmskb %ymm6, %ecx
|
||||
+ testl %ecx, %ecx
|
||||
+ jz L(aligned_more)
|
||||
+
|
||||
+ /* Only check match with search CHAR if needed. */
|
||||
+ VPCMPEQ %ymm1, %ymm7, %ymm1
|
||||
+ vpmovmskb %ymm1, %eax
|
||||
+ /* Check if match before first zero. */
|
||||
+ blsmskl %ecx, %ecx
|
||||
+ andl %ecx, %eax
|
||||
+ jz L(ret0)
|
||||
+ bsrl %eax, %eax
|
||||
+ addq %rdi, %rax
|
||||
+ /* We are off by 3 for wcsrchr if search CHAR is non-zero. If
|
||||
+ search CHAR is zero we are correct. Either way `andq
|
||||
+ -CHAR_SIZE, %rax` gets the correct result. */
|
||||
+# ifdef USE_AS_WCSRCHR
|
||||
+ andq $-CHAR_SIZE, %rax
|
||||
+# endif
|
||||
+L(ret0):
|
||||
+L(return_vzeroupper):
|
||||
+ ZERO_UPPER_VEC_REGISTERS_RETURN
|
||||
+
|
||||
+ /* Returns for first vec x1/x2 have hard coded backward search
|
||||
+ path for earlier matches. */
|
||||
+ .p2align 4,, 10
|
||||
+L(first_vec_x1):
|
||||
+ VPCMPEQ %ymm2, %ymm7, %ymm6
|
||||
+ vpmovmskb %ymm6, %eax
|
||||
+ blsmskl %ecx, %ecx
|
||||
+ andl %ecx, %eax
|
||||
+ jnz L(first_vec_x1_return)
|
||||
+
|
||||
+ .p2align 4,, 4
|
||||
+L(first_vec_x0_test):
|
||||
+ VPCMPEQ %ymm1, %ymm7, %ymm6
|
||||
+ vpmovmskb %ymm6, %eax
|
||||
+ testl %eax, %eax
|
||||
+ jz L(ret1)
|
||||
+ bsrl %eax, %eax
|
||||
+ addq %r8, %rax
|
||||
+# ifdef USE_AS_WCSRCHR
|
||||
+ andq $-CHAR_SIZE, %rax
|
||||
+# endif
|
||||
+L(ret1):
|
||||
+ VZEROUPPER_RETURN
|
||||
|
||||
+ .p2align 4,, 10
|
||||
+L(first_vec_x0_x1_test):
|
||||
+ VPCMPEQ %ymm2, %ymm7, %ymm6
|
||||
+ vpmovmskb %ymm6, %eax
|
||||
+ /* Check ymm2 for search CHAR match. If no match then check ymm1
|
||||
+ before returning. */
|
||||
testl %eax, %eax
|
||||
- jnz L(first_vec)
|
||||
+ jz L(first_vec_x0_test)
|
||||
+ .p2align 4,, 4
|
||||
+L(first_vec_x1_return):
|
||||
+ bsrl %eax, %eax
|
||||
+ leaq 1(%rdi, %rax), %rax
|
||||
+# ifdef USE_AS_WCSRCHR
|
||||
+ andq $-CHAR_SIZE, %rax
|
||||
+# endif
|
||||
+ VZEROUPPER_RETURN
|
||||
|
||||
- testl %ecx, %ecx
|
||||
- jnz L(return_null)
|
||||
|
||||
- andq $-VEC_SIZE, %rdi
|
||||
- xorl %edx, %edx
|
||||
- jmp L(aligned_loop)
|
||||
+ .p2align 4,, 10
|
||||
+L(first_vec_x2):
|
||||
+ VPCMPEQ %ymm3, %ymm7, %ymm6
|
||||
+ vpmovmskb %ymm6, %eax
|
||||
+ blsmskl %ecx, %ecx
|
||||
+ /* If no in-range search CHAR match in ymm3 then need to check
|
||||
+ ymm1/ymm2 for an earlier match (we delay checking search
|
||||
+ CHAR matches until needed). */
|
||||
+ andl %ecx, %eax
|
||||
+ jz L(first_vec_x0_x1_test)
|
||||
+ bsrl %eax, %eax
|
||||
+ leaq (VEC_SIZE + 1)(%rdi, %rax), %rax
|
||||
+# ifdef USE_AS_WCSRCHR
|
||||
+ andq $-CHAR_SIZE, %rax
|
||||
+# endif
|
||||
+ VZEROUPPER_RETURN
|
||||
+
|
||||
|
||||
.p2align 4
|
||||
-L(first_vec):
|
||||
- /* Check if there is a nul CHAR. */
|
||||
+L(aligned_more):
|
||||
+ /* Save original pointer if match was in VEC 0. */
|
||||
+ movq %rdi, %r8
|
||||
+
|
||||
+ /* Align src. */
|
||||
+ orq $(VEC_SIZE - 1), %rdi
|
||||
+ vmovdqu 1(%rdi), %ymm2
|
||||
+ VPCMPEQ %ymm2, %ymm0, %ymm6
|
||||
+ vpmovmskb %ymm6, %ecx
|
||||
testl %ecx, %ecx
|
||||
- jnz L(char_and_nul_in_first_vec)
|
||||
+ jnz L(first_vec_x1)
|
||||
|
||||
- /* Remember the match and keep searching. */
|
||||
- movl %eax, %edx
|
||||
- movq %rdi, %rsi
|
||||
- andq $-VEC_SIZE, %rdi
|
||||
- jmp L(aligned_loop)
|
||||
+ vmovdqu (VEC_SIZE + 1)(%rdi), %ymm3
|
||||
+ VPCMPEQ %ymm3, %ymm0, %ymm6
|
||||
+ vpmovmskb %ymm6, %ecx
|
||||
+ testl %ecx, %ecx
|
||||
+ jnz L(first_vec_x2)
|
||||
|
||||
+ /* Save pointer again before realigning. */
|
||||
+ movq %rdi, %rsi
|
||||
+ addq $(VEC_SIZE + 1), %rdi
|
||||
+ andq $-(VEC_SIZE * 2), %rdi
|
||||
.p2align 4
|
||||
-L(cros_page_boundary):
|
||||
- andl $(VEC_SIZE - 1), %ecx
|
||||
- andq $-VEC_SIZE, %rdi
|
||||
- vmovdqa (%rdi), %ymm1
|
||||
- VPCMPEQ %ymm1, %ymm0, %ymm2
|
||||
- VPCMPEQ %ymm1, %ymm4, %ymm3
|
||||
- vpmovmskb %ymm2, %edx
|
||||
- vpmovmskb %ymm3, %eax
|
||||
- shrl %cl, %edx
|
||||
- shrl %cl, %eax
|
||||
- addq $VEC_SIZE, %rdi
|
||||
-
|
||||
- /* Check if there is a CHAR. */
|
||||
+L(first_aligned_loop):
|
||||
+ /* Do 2x VEC at a time. Any more and the cost of finding the
|
||||
+ match outweights loop benefit. */
|
||||
+ vmovdqa (VEC_SIZE * 0)(%rdi), %ymm4
|
||||
+ vmovdqa (VEC_SIZE * 1)(%rdi), %ymm5
|
||||
+
|
||||
+ VPCMPEQ %ymm4, %ymm7, %ymm6
|
||||
+ VPMIN %ymm4, %ymm5, %ymm8
|
||||
+ VPCMPEQ %ymm5, %ymm7, %ymm10
|
||||
+ vpor %ymm6, %ymm10, %ymm5
|
||||
+ VPCMPEQ %ymm8, %ymm0, %ymm8
|
||||
+ vpor %ymm5, %ymm8, %ymm9
|
||||
+
|
||||
+ vpmovmskb %ymm9, %eax
|
||||
+ addq $(VEC_SIZE * 2), %rdi
|
||||
+ /* No zero or search CHAR. */
|
||||
testl %eax, %eax
|
||||
- jnz L(found_char)
|
||||
-
|
||||
- testl %edx, %edx
|
||||
- jnz L(return_null)
|
||||
+ jz L(first_aligned_loop)
|
||||
|
||||
- jmp L(aligned_loop)
|
||||
-
|
||||
- .p2align 4
|
||||
-L(found_char):
|
||||
- testl %edx, %edx
|
||||
- jnz L(char_and_nul)
|
||||
+ /* If no zero CHAR then go to second loop (this allows us to
|
||||
+ throw away all prior work). */
|
||||
+ vpmovmskb %ymm8, %ecx
|
||||
+ testl %ecx, %ecx
|
||||
+ jz L(second_aligned_loop_prep)
|
||||
|
||||
- /* Remember the match and keep searching. */
|
||||
- movl %eax, %edx
|
||||
- leaq (%rdi, %rcx), %rsi
|
||||
+ /* Search char could be zero so we need to get the true match.
|
||||
+ */
|
||||
+ vpmovmskb %ymm5, %eax
|
||||
+ testl %eax, %eax
|
||||
+ jnz L(first_aligned_loop_return)
|
||||
|
||||
- .p2align 4
|
||||
-L(aligned_loop):
|
||||
- vmovdqa (%rdi), %ymm1
|
||||
- VPCMPEQ %ymm1, %ymm0, %ymm2
|
||||
- addq $VEC_SIZE, %rdi
|
||||
- VPCMPEQ %ymm1, %ymm4, %ymm3
|
||||
- vpmovmskb %ymm2, %ecx
|
||||
- vpmovmskb %ymm3, %eax
|
||||
- orl %eax, %ecx
|
||||
- jnz L(char_nor_null)
|
||||
-
|
||||
- vmovdqa (%rdi), %ymm1
|
||||
- VPCMPEQ %ymm1, %ymm0, %ymm2
|
||||
- add $VEC_SIZE, %rdi
|
||||
- VPCMPEQ %ymm1, %ymm4, %ymm3
|
||||
- vpmovmskb %ymm2, %ecx
|
||||
+ .p2align 4,, 4
|
||||
+L(first_vec_x1_or_x2):
|
||||
+ VPCMPEQ %ymm3, %ymm7, %ymm3
|
||||
+ VPCMPEQ %ymm2, %ymm7, %ymm2
|
||||
vpmovmskb %ymm3, %eax
|
||||
- orl %eax, %ecx
|
||||
- jnz L(char_nor_null)
|
||||
-
|
||||
- vmovdqa (%rdi), %ymm1
|
||||
- VPCMPEQ %ymm1, %ymm0, %ymm2
|
||||
- addq $VEC_SIZE, %rdi
|
||||
- VPCMPEQ %ymm1, %ymm4, %ymm3
|
||||
- vpmovmskb %ymm2, %ecx
|
||||
- vpmovmskb %ymm3, %eax
|
||||
- orl %eax, %ecx
|
||||
- jnz L(char_nor_null)
|
||||
-
|
||||
- vmovdqa (%rdi), %ymm1
|
||||
- VPCMPEQ %ymm1, %ymm0, %ymm2
|
||||
- addq $VEC_SIZE, %rdi
|
||||
- VPCMPEQ %ymm1, %ymm4, %ymm3
|
||||
- vpmovmskb %ymm2, %ecx
|
||||
- vpmovmskb %ymm3, %eax
|
||||
- orl %eax, %ecx
|
||||
- jz L(aligned_loop)
|
||||
-
|
||||
- .p2align 4
|
||||
-L(char_nor_null):
|
||||
- /* Find a CHAR or a nul CHAR in a loop. */
|
||||
- testl %eax, %eax
|
||||
- jnz L(match)
|
||||
-L(return_value):
|
||||
- testl %edx, %edx
|
||||
- jz L(return_null)
|
||||
- movl %edx, %eax
|
||||
- movq %rsi, %rdi
|
||||
+ vpmovmskb %ymm2, %edx
|
||||
+ /* Use add for macro-fusion. */
|
||||
+ addq %rax, %rdx
|
||||
+ jz L(first_vec_x0_test)
|
||||
+ /* NB: We could move this shift to before the branch and save a
|
||||
+ bit of code size / performance on the fall through. The
|
||||
+ branch leads to the null case which generally seems hotter
|
||||
+ than char in first 3x VEC. */
|
||||
+ salq $32, %rax
|
||||
+ addq %rdx, %rax
|
||||
+ bsrq %rax, %rax
|
||||
+ leaq 1(%rsi, %rax), %rax
|
||||
+# ifdef USE_AS_WCSRCHR
|
||||
+ andq $-CHAR_SIZE, %rax
|
||||
+# endif
|
||||
+ VZEROUPPER_RETURN
|
||||
|
||||
+ .p2align 4,, 8
|
||||
+L(first_aligned_loop_return):
|
||||
+ VPCMPEQ %ymm4, %ymm0, %ymm4
|
||||
+ vpmovmskb %ymm4, %edx
|
||||
+ salq $32, %rcx
|
||||
+ orq %rdx, %rcx
|
||||
+
|
||||
+ vpmovmskb %ymm10, %eax
|
||||
+ vpmovmskb %ymm6, %edx
|
||||
+ salq $32, %rax
|
||||
+ orq %rdx, %rax
|
||||
+ blsmskq %rcx, %rcx
|
||||
+ andq %rcx, %rax
|
||||
+ jz L(first_vec_x1_or_x2)
|
||||
+
|
||||
+ bsrq %rax, %rax
|
||||
+ leaq -(VEC_SIZE * 2)(%rdi, %rax), %rax
|
||||
# ifdef USE_AS_WCSRCHR
|
||||
- /* Keep the first bit for each matching CHAR for bsr. */
|
||||
- andl $0x11111111, %eax
|
||||
+ andq $-CHAR_SIZE, %rax
|
||||
# endif
|
||||
- bsrl %eax, %eax
|
||||
- leaq -VEC_SIZE(%rdi, %rax), %rax
|
||||
-L(return_vzeroupper):
|
||||
- ZERO_UPPER_VEC_REGISTERS_RETURN
|
||||
+ VZEROUPPER_RETURN
|
||||
|
||||
+ /* Search char cannot be zero. */
|
||||
.p2align 4
|
||||
-L(match):
|
||||
- /* Find a CHAR. Check if there is a nul CHAR. */
|
||||
- vpmovmskb %ymm2, %ecx
|
||||
- testl %ecx, %ecx
|
||||
- jnz L(find_nul)
|
||||
-
|
||||
- /* Remember the match and keep searching. */
|
||||
- movl %eax, %edx
|
||||
+L(second_aligned_loop_set_furthest_match):
|
||||
+ /* Save VEC and pointer from most recent match. */
|
||||
+L(second_aligned_loop_prep):
|
||||
movq %rdi, %rsi
|
||||
- jmp L(aligned_loop)
|
||||
+ vmovdqu %ymm6, %ymm2
|
||||
+ vmovdqu %ymm10, %ymm3
|
||||
|
||||
.p2align 4
|
||||
-L(find_nul):
|
||||
-# ifdef USE_AS_WCSRCHR
|
||||
- /* Keep the first bit for each matching CHAR for bsr. */
|
||||
- andl $0x11111111, %ecx
|
||||
- andl $0x11111111, %eax
|
||||
-# endif
|
||||
- /* Mask out any matching bits after the nul CHAR. */
|
||||
- movl %ecx, %r8d
|
||||
- subl $1, %r8d
|
||||
- xorl %ecx, %r8d
|
||||
- andl %r8d, %eax
|
||||
+L(second_aligned_loop):
|
||||
+ /* Search 2x at at time. */
|
||||
+ vmovdqa (VEC_SIZE * 0)(%rdi), %ymm4
|
||||
+ vmovdqa (VEC_SIZE * 1)(%rdi), %ymm5
|
||||
+
|
||||
+ VPCMPEQ %ymm4, %ymm7, %ymm6
|
||||
+ VPMIN %ymm4, %ymm5, %ymm1
|
||||
+ VPCMPEQ %ymm5, %ymm7, %ymm10
|
||||
+ vpor %ymm6, %ymm10, %ymm5
|
||||
+ VPCMPEQ %ymm1, %ymm0, %ymm1
|
||||
+ vpor %ymm5, %ymm1, %ymm9
|
||||
+
|
||||
+ vpmovmskb %ymm9, %eax
|
||||
+ addq $(VEC_SIZE * 2), %rdi
|
||||
testl %eax, %eax
|
||||
- /* If there is no CHAR here, return the remembered one. */
|
||||
- jz L(return_value)
|
||||
- bsrl %eax, %eax
|
||||
- leaq -VEC_SIZE(%rdi, %rax), %rax
|
||||
- VZEROUPPER_RETURN
|
||||
-
|
||||
- .p2align 4
|
||||
-L(char_and_nul):
|
||||
- /* Find both a CHAR and a nul CHAR. */
|
||||
- addq %rcx, %rdi
|
||||
- movl %edx, %ecx
|
||||
-L(char_and_nul_in_first_vec):
|
||||
-# ifdef USE_AS_WCSRCHR
|
||||
- /* Keep the first bit for each matching CHAR for bsr. */
|
||||
- andl $0x11111111, %ecx
|
||||
- andl $0x11111111, %eax
|
||||
-# endif
|
||||
- /* Mask out any matching bits after the nul CHAR. */
|
||||
- movl %ecx, %r8d
|
||||
- subl $1, %r8d
|
||||
- xorl %ecx, %r8d
|
||||
- andl %r8d, %eax
|
||||
+ jz L(second_aligned_loop)
|
||||
+ vpmovmskb %ymm1, %ecx
|
||||
+ testl %ecx, %ecx
|
||||
+ jz L(second_aligned_loop_set_furthest_match)
|
||||
+ vpmovmskb %ymm5, %eax
|
||||
testl %eax, %eax
|
||||
- /* Return null pointer if the nul CHAR comes first. */
|
||||
- jz L(return_null)
|
||||
- bsrl %eax, %eax
|
||||
- leaq -VEC_SIZE(%rdi, %rax), %rax
|
||||
+ jnz L(return_new_match)
|
||||
+
|
||||
+ /* This is the hot patch. We know CHAR is inbounds and that
|
||||
+ ymm3/ymm2 have latest match. */
|
||||
+ .p2align 4,, 4
|
||||
+L(return_old_match):
|
||||
+ vpmovmskb %ymm3, %eax
|
||||
+ vpmovmskb %ymm2, %edx
|
||||
+ salq $32, %rax
|
||||
+ orq %rdx, %rax
|
||||
+ bsrq %rax, %rax
|
||||
+ /* Search char cannot be zero so safe to just use lea for
|
||||
+ wcsrchr. */
|
||||
+ leaq (VEC_SIZE * -2 -(CHAR_SIZE - 1))(%rsi, %rax), %rax
|
||||
VZEROUPPER_RETURN
|
||||
|
||||
- .p2align 4
|
||||
-L(return_null):
|
||||
- xorl %eax, %eax
|
||||
+ /* Last iteration also potentially has a match. */
|
||||
+ .p2align 4,, 8
|
||||
+L(return_new_match):
|
||||
+ VPCMPEQ %ymm4, %ymm0, %ymm4
|
||||
+ vpmovmskb %ymm4, %edx
|
||||
+ salq $32, %rcx
|
||||
+ orq %rdx, %rcx
|
||||
+
|
||||
+ vpmovmskb %ymm10, %eax
|
||||
+ vpmovmskb %ymm6, %edx
|
||||
+ salq $32, %rax
|
||||
+ orq %rdx, %rax
|
||||
+ blsmskq %rcx, %rcx
|
||||
+ andq %rcx, %rax
|
||||
+ jz L(return_old_match)
|
||||
+ bsrq %rax, %rax
|
||||
+ /* Search char cannot be zero so safe to just use lea for
|
||||
+ wcsrchr. */
|
||||
+ leaq (VEC_SIZE * -2 -(CHAR_SIZE - 1))(%rdi, %rax), %rax
|
||||
VZEROUPPER_RETURN
|
||||
|
||||
-END (STRRCHR)
|
||||
+ .p2align 4,, 4
|
||||
+L(cross_page):
|
||||
+ movq %rdi, %rsi
|
||||
+ andq $-VEC_SIZE, %rsi
|
||||
+ vmovdqu (%rsi), %ymm1
|
||||
+ VPCMPEQ %ymm1, %ymm0, %ymm6
|
||||
+ vpmovmskb %ymm6, %ecx
|
||||
+ /* Shift out zero CHAR matches that are before the begining of
|
||||
+ src (rdi). */
|
||||
+ shrxl %edi, %ecx, %ecx
|
||||
+ testl %ecx, %ecx
|
||||
+ jz L(page_cross_continue)
|
||||
+ VPCMPEQ %ymm1, %ymm7, %ymm1
|
||||
+ vpmovmskb %ymm1, %eax
|
||||
+
|
||||
+ /* Shift out search CHAR matches that are before the begining of
|
||||
+ src (rdi). */
|
||||
+ shrxl %edi, %eax, %eax
|
||||
+ blsmskl %ecx, %ecx
|
||||
+ /* Check if any search CHAR match in range. */
|
||||
+ andl %ecx, %eax
|
||||
+ jz L(ret2)
|
||||
+ bsrl %eax, %eax
|
||||
+ addq %rdi, %rax
|
||||
+# ifdef USE_AS_WCSRCHR
|
||||
+ andq $-CHAR_SIZE, %rax
|
||||
+# endif
|
||||
+L(ret2):
|
||||
+ VZEROUPPER_RETURN
|
||||
+END(STRRCHR)
|
||||
#endif
|
554
glibc-upstream-2.34-235.patch
Normal file
554
glibc-upstream-2.34-235.patch
Normal file
@ -0,0 +1,554 @@
|
||||
commit 596c9a32cc5d5eb82587e92d1e66c9ecb7668456
|
||||
Author: Noah Goldstein <goldstein.w.n@gmail.com>
|
||||
Date: Thu Apr 21 20:52:30 2022 -0500
|
||||
|
||||
x86: Optimize {str|wcs}rchr-evex
|
||||
|
||||
The new code unrolls the main loop slightly without adding too much
|
||||
overhead and minimizes the comparisons for the search CHAR.
|
||||
|
||||
Geometric Mean of all benchmarks New / Old: 0.755
|
||||
See email for all results.
|
||||
|
||||
Full xcheck passes on x86_64 with and without multiarch enabled.
|
||||
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
|
||||
|
||||
(cherry picked from commit c966099cdc3e0fdf92f63eac09b22fa7e5f5f02d)
|
||||
|
||||
diff --git a/sysdeps/x86_64/multiarch/strrchr-evex.S b/sysdeps/x86_64/multiarch/strrchr-evex.S
|
||||
index f920b5a584edd293..f5b6d755ceb85ae2 100644
|
||||
--- a/sysdeps/x86_64/multiarch/strrchr-evex.S
|
||||
+++ b/sysdeps/x86_64/multiarch/strrchr-evex.S
|
||||
@@ -24,242 +24,351 @@
|
||||
# define STRRCHR __strrchr_evex
|
||||
# endif
|
||||
|
||||
-# define VMOVU vmovdqu64
|
||||
-# define VMOVA vmovdqa64
|
||||
+# define VMOVU vmovdqu64
|
||||
+# define VMOVA vmovdqa64
|
||||
|
||||
# ifdef USE_AS_WCSRCHR
|
||||
+# define SHIFT_REG esi
|
||||
+
|
||||
+# define kunpck kunpckbw
|
||||
+# define kmov_2x kmovd
|
||||
+# define maskz_2x ecx
|
||||
+# define maskm_2x eax
|
||||
+# define CHAR_SIZE 4
|
||||
+# define VPMIN vpminud
|
||||
+# define VPTESTN vptestnmd
|
||||
# define VPBROADCAST vpbroadcastd
|
||||
-# define VPCMP vpcmpd
|
||||
-# define SHIFT_REG r8d
|
||||
+# define VPCMP vpcmpd
|
||||
# else
|
||||
+# define SHIFT_REG edi
|
||||
+
|
||||
+# define kunpck kunpckdq
|
||||
+# define kmov_2x kmovq
|
||||
+# define maskz_2x rcx
|
||||
+# define maskm_2x rax
|
||||
+
|
||||
+# define CHAR_SIZE 1
|
||||
+# define VPMIN vpminub
|
||||
+# define VPTESTN vptestnmb
|
||||
# define VPBROADCAST vpbroadcastb
|
||||
-# define VPCMP vpcmpb
|
||||
-# define SHIFT_REG ecx
|
||||
+# define VPCMP vpcmpb
|
||||
# endif
|
||||
|
||||
# define XMMZERO xmm16
|
||||
# define YMMZERO ymm16
|
||||
# define YMMMATCH ymm17
|
||||
-# define YMM1 ymm18
|
||||
+# define YMMSAVE ymm18
|
||||
+
|
||||
+# define YMM1 ymm19
|
||||
+# define YMM2 ymm20
|
||||
+# define YMM3 ymm21
|
||||
+# define YMM4 ymm22
|
||||
+# define YMM5 ymm23
|
||||
+# define YMM6 ymm24
|
||||
+# define YMM7 ymm25
|
||||
+# define YMM8 ymm26
|
||||
|
||||
-# define VEC_SIZE 32
|
||||
|
||||
- .section .text.evex,"ax",@progbits
|
||||
-ENTRY (STRRCHR)
|
||||
- movl %edi, %ecx
|
||||
+# define VEC_SIZE 32
|
||||
+# define PAGE_SIZE 4096
|
||||
+ .section .text.evex, "ax", @progbits
|
||||
+ENTRY(STRRCHR)
|
||||
+ movl %edi, %eax
|
||||
/* Broadcast CHAR to YMMMATCH. */
|
||||
VPBROADCAST %esi, %YMMMATCH
|
||||
|
||||
- vpxorq %XMMZERO, %XMMZERO, %XMMZERO
|
||||
-
|
||||
- /* Check if we may cross page boundary with one vector load. */
|
||||
- andl $(2 * VEC_SIZE - 1), %ecx
|
||||
- cmpl $VEC_SIZE, %ecx
|
||||
- ja L(cros_page_boundary)
|
||||
+ andl $(PAGE_SIZE - 1), %eax
|
||||
+ cmpl $(PAGE_SIZE - VEC_SIZE), %eax
|
||||
+ jg L(cross_page_boundary)
|
||||
|
||||
+L(page_cross_continue):
|
||||
VMOVU (%rdi), %YMM1
|
||||
-
|
||||
- /* Each bit in K0 represents a null byte in YMM1. */
|
||||
- VPCMP $0, %YMMZERO, %YMM1, %k0
|
||||
- /* Each bit in K1 represents a CHAR in YMM1. */
|
||||
- VPCMP $0, %YMMMATCH, %YMM1, %k1
|
||||
+ /* k0 has a 1 for each zero CHAR in YMM1. */
|
||||
+ VPTESTN %YMM1, %YMM1, %k0
|
||||
kmovd %k0, %ecx
|
||||
- kmovd %k1, %eax
|
||||
-
|
||||
- addq $VEC_SIZE, %rdi
|
||||
-
|
||||
- testl %eax, %eax
|
||||
- jnz L(first_vec)
|
||||
-
|
||||
testl %ecx, %ecx
|
||||
- jnz L(return_null)
|
||||
-
|
||||
- andq $-VEC_SIZE, %rdi
|
||||
- xorl %edx, %edx
|
||||
- jmp L(aligned_loop)
|
||||
-
|
||||
- .p2align 4
|
||||
-L(first_vec):
|
||||
- /* Check if there is a null byte. */
|
||||
- testl %ecx, %ecx
|
||||
- jnz L(char_and_nul_in_first_vec)
|
||||
-
|
||||
- /* Remember the match and keep searching. */
|
||||
- movl %eax, %edx
|
||||
- movq %rdi, %rsi
|
||||
- andq $-VEC_SIZE, %rdi
|
||||
- jmp L(aligned_loop)
|
||||
-
|
||||
- .p2align 4
|
||||
-L(cros_page_boundary):
|
||||
- andl $(VEC_SIZE - 1), %ecx
|
||||
- andq $-VEC_SIZE, %rdi
|
||||
+ jz L(aligned_more)
|
||||
+ /* fallthrough: zero CHAR in first VEC. */
|
||||
|
||||
+ /* K1 has a 1 for each search CHAR match in YMM1. */
|
||||
+ VPCMP $0, %YMMMATCH, %YMM1, %k1
|
||||
+ kmovd %k1, %eax
|
||||
+ /* Build mask up until first zero CHAR (used to mask of
|
||||
+ potential search CHAR matches past the end of the string).
|
||||
+ */
|
||||
+ blsmskl %ecx, %ecx
|
||||
+ andl %ecx, %eax
|
||||
+ jz L(ret0)
|
||||
+ /* Get last match (the `andl` removed any out of bounds
|
||||
+ matches). */
|
||||
+ bsrl %eax, %eax
|
||||
# ifdef USE_AS_WCSRCHR
|
||||
- /* NB: Divide shift count by 4 since each bit in K1 represent 4
|
||||
- bytes. */
|
||||
- movl %ecx, %SHIFT_REG
|
||||
- sarl $2, %SHIFT_REG
|
||||
+ leaq (%rdi, %rax, CHAR_SIZE), %rax
|
||||
+# else
|
||||
+ addq %rdi, %rax
|
||||
# endif
|
||||
+L(ret0):
|
||||
+ ret
|
||||
|
||||
- VMOVA (%rdi), %YMM1
|
||||
-
|
||||
- /* Each bit in K0 represents a null byte in YMM1. */
|
||||
- VPCMP $0, %YMMZERO, %YMM1, %k0
|
||||
- /* Each bit in K1 represents a CHAR in YMM1. */
|
||||
+ /* Returns for first vec x1/x2/x3 have hard coded backward
|
||||
+ search path for earlier matches. */
|
||||
+ .p2align 4,, 6
|
||||
+L(first_vec_x1):
|
||||
+ VPCMP $0, %YMMMATCH, %YMM2, %k1
|
||||
+ kmovd %k1, %eax
|
||||
+ blsmskl %ecx, %ecx
|
||||
+ /* eax non-zero if search CHAR in range. */
|
||||
+ andl %ecx, %eax
|
||||
+ jnz L(first_vec_x1_return)
|
||||
+
|
||||
+ /* fallthrough: no match in YMM2 then need to check for earlier
|
||||
+ matches (in YMM1). */
|
||||
+ .p2align 4,, 4
|
||||
+L(first_vec_x0_test):
|
||||
VPCMP $0, %YMMMATCH, %YMM1, %k1
|
||||
- kmovd %k0, %edx
|
||||
kmovd %k1, %eax
|
||||
-
|
||||
- shrxl %SHIFT_REG, %edx, %edx
|
||||
- shrxl %SHIFT_REG, %eax, %eax
|
||||
- addq $VEC_SIZE, %rdi
|
||||
-
|
||||
- /* Check if there is a CHAR. */
|
||||
testl %eax, %eax
|
||||
- jnz L(found_char)
|
||||
-
|
||||
- testl %edx, %edx
|
||||
- jnz L(return_null)
|
||||
-
|
||||
- jmp L(aligned_loop)
|
||||
-
|
||||
- .p2align 4
|
||||
-L(found_char):
|
||||
- testl %edx, %edx
|
||||
- jnz L(char_and_nul)
|
||||
-
|
||||
- /* Remember the match and keep searching. */
|
||||
- movl %eax, %edx
|
||||
- leaq (%rdi, %rcx), %rsi
|
||||
+ jz L(ret1)
|
||||
+ bsrl %eax, %eax
|
||||
+# ifdef USE_AS_WCSRCHR
|
||||
+ leaq (%rsi, %rax, CHAR_SIZE), %rax
|
||||
+# else
|
||||
+ addq %rsi, %rax
|
||||
+# endif
|
||||
+L(ret1):
|
||||
+ ret
|
||||
|
||||
- .p2align 4
|
||||
-L(aligned_loop):
|
||||
- VMOVA (%rdi), %YMM1
|
||||
- addq $VEC_SIZE, %rdi
|
||||
+ .p2align 4,, 10
|
||||
+L(first_vec_x1_or_x2):
|
||||
+ VPCMP $0, %YMM3, %YMMMATCH, %k3
|
||||
+ VPCMP $0, %YMM2, %YMMMATCH, %k2
|
||||
+ /* K2 and K3 have 1 for any search CHAR match. Test if any
|
||||
+ matches between either of them. Otherwise check YMM1. */
|
||||
+ kortestd %k2, %k3
|
||||
+ jz L(first_vec_x0_test)
|
||||
+
|
||||
+ /* Guranteed that YMM2 and YMM3 are within range so merge the
|
||||
+ two bitmasks then get last result. */
|
||||
+ kunpck %k2, %k3, %k3
|
||||
+ kmovq %k3, %rax
|
||||
+ bsrq %rax, %rax
|
||||
+ leaq (VEC_SIZE)(%r8, %rax, CHAR_SIZE), %rax
|
||||
+ ret
|
||||
|
||||
- /* Each bit in K0 represents a null byte in YMM1. */
|
||||
- VPCMP $0, %YMMZERO, %YMM1, %k0
|
||||
- /* Each bit in K1 represents a CHAR in YMM1. */
|
||||
- VPCMP $0, %YMMMATCH, %YMM1, %k1
|
||||
- kmovd %k0, %ecx
|
||||
+ .p2align 4,, 6
|
||||
+L(first_vec_x3):
|
||||
+ VPCMP $0, %YMMMATCH, %YMM4, %k1
|
||||
kmovd %k1, %eax
|
||||
- orl %eax, %ecx
|
||||
- jnz L(char_nor_null)
|
||||
+ blsmskl %ecx, %ecx
|
||||
+ /* If no search CHAR match in range check YMM1/YMM2/YMM3. */
|
||||
+ andl %ecx, %eax
|
||||
+ jz L(first_vec_x1_or_x2)
|
||||
+ bsrl %eax, %eax
|
||||
+ leaq (VEC_SIZE * 3)(%rdi, %rax, CHAR_SIZE), %rax
|
||||
+ ret
|
||||
|
||||
- VMOVA (%rdi), %YMM1
|
||||
- add $VEC_SIZE, %rdi
|
||||
+ .p2align 4,, 6
|
||||
+L(first_vec_x0_x1_test):
|
||||
+ VPCMP $0, %YMMMATCH, %YMM2, %k1
|
||||
+ kmovd %k1, %eax
|
||||
+ /* Check YMM2 for last match first. If no match try YMM1. */
|
||||
+ testl %eax, %eax
|
||||
+ jz L(first_vec_x0_test)
|
||||
+ .p2align 4,, 4
|
||||
+L(first_vec_x1_return):
|
||||
+ bsrl %eax, %eax
|
||||
+ leaq (VEC_SIZE)(%rdi, %rax, CHAR_SIZE), %rax
|
||||
+ ret
|
||||
|
||||
- /* Each bit in K0 represents a null byte in YMM1. */
|
||||
- VPCMP $0, %YMMZERO, %YMM1, %k0
|
||||
- /* Each bit in K1 represents a CHAR in YMM1. */
|
||||
- VPCMP $0, %YMMMATCH, %YMM1, %k1
|
||||
- kmovd %k0, %ecx
|
||||
+ .p2align 4,, 10
|
||||
+L(first_vec_x2):
|
||||
+ VPCMP $0, %YMMMATCH, %YMM3, %k1
|
||||
kmovd %k1, %eax
|
||||
- orl %eax, %ecx
|
||||
- jnz L(char_nor_null)
|
||||
+ blsmskl %ecx, %ecx
|
||||
+ /* Check YMM3 for last match first. If no match try YMM2/YMM1.
|
||||
+ */
|
||||
+ andl %ecx, %eax
|
||||
+ jz L(first_vec_x0_x1_test)
|
||||
+ bsrl %eax, %eax
|
||||
+ leaq (VEC_SIZE * 2)(%rdi, %rax, CHAR_SIZE), %rax
|
||||
+ ret
|
||||
|
||||
- VMOVA (%rdi), %YMM1
|
||||
- addq $VEC_SIZE, %rdi
|
||||
|
||||
- /* Each bit in K0 represents a null byte in YMM1. */
|
||||
- VPCMP $0, %YMMZERO, %YMM1, %k0
|
||||
- /* Each bit in K1 represents a CHAR in YMM1. */
|
||||
- VPCMP $0, %YMMMATCH, %YMM1, %k1
|
||||
+ .p2align 4
|
||||
+L(aligned_more):
|
||||
+ /* Need to keep original pointer incase YMM1 has last match. */
|
||||
+ movq %rdi, %rsi
|
||||
+ andq $-VEC_SIZE, %rdi
|
||||
+ VMOVU VEC_SIZE(%rdi), %YMM2
|
||||
+ VPTESTN %YMM2, %YMM2, %k0
|
||||
kmovd %k0, %ecx
|
||||
- kmovd %k1, %eax
|
||||
- orl %eax, %ecx
|
||||
- jnz L(char_nor_null)
|
||||
+ testl %ecx, %ecx
|
||||
+ jnz L(first_vec_x1)
|
||||
|
||||
- VMOVA (%rdi), %YMM1
|
||||
- addq $VEC_SIZE, %rdi
|
||||
+ VMOVU (VEC_SIZE * 2)(%rdi), %YMM3
|
||||
+ VPTESTN %YMM3, %YMM3, %k0
|
||||
+ kmovd %k0, %ecx
|
||||
+ testl %ecx, %ecx
|
||||
+ jnz L(first_vec_x2)
|
||||
|
||||
- /* Each bit in K0 represents a null byte in YMM1. */
|
||||
- VPCMP $0, %YMMZERO, %YMM1, %k0
|
||||
- /* Each bit in K1 represents a CHAR in YMM1. */
|
||||
- VPCMP $0, %YMMMATCH, %YMM1, %k1
|
||||
+ VMOVU (VEC_SIZE * 3)(%rdi), %YMM4
|
||||
+ VPTESTN %YMM4, %YMM4, %k0
|
||||
kmovd %k0, %ecx
|
||||
- kmovd %k1, %eax
|
||||
- orl %eax, %ecx
|
||||
- jz L(aligned_loop)
|
||||
+ movq %rdi, %r8
|
||||
+ testl %ecx, %ecx
|
||||
+ jnz L(first_vec_x3)
|
||||
|
||||
+ andq $-(VEC_SIZE * 2), %rdi
|
||||
.p2align 4
|
||||
-L(char_nor_null):
|
||||
- /* Find a CHAR or a null byte in a loop. */
|
||||
+L(first_aligned_loop):
|
||||
+ /* Preserve YMM1, YMM2, YMM3, and YMM4 until we can gurantee
|
||||
+ they don't store a match. */
|
||||
+ VMOVA (VEC_SIZE * 4)(%rdi), %YMM5
|
||||
+ VMOVA (VEC_SIZE * 5)(%rdi), %YMM6
|
||||
+
|
||||
+ VPCMP $0, %YMM5, %YMMMATCH, %k2
|
||||
+ vpxord %YMM6, %YMMMATCH, %YMM7
|
||||
+
|
||||
+ VPMIN %YMM5, %YMM6, %YMM8
|
||||
+ VPMIN %YMM8, %YMM7, %YMM7
|
||||
+
|
||||
+ VPTESTN %YMM7, %YMM7, %k1
|
||||
+ subq $(VEC_SIZE * -2), %rdi
|
||||
+ kortestd %k1, %k2
|
||||
+ jz L(first_aligned_loop)
|
||||
+
|
||||
+ VPCMP $0, %YMM6, %YMMMATCH, %k3
|
||||
+ VPTESTN %YMM8, %YMM8, %k1
|
||||
+ ktestd %k1, %k1
|
||||
+ jz L(second_aligned_loop_prep)
|
||||
+
|
||||
+ kortestd %k2, %k3
|
||||
+ jnz L(return_first_aligned_loop)
|
||||
+
|
||||
+ .p2align 4,, 6
|
||||
+L(first_vec_x1_or_x2_or_x3):
|
||||
+ VPCMP $0, %YMM4, %YMMMATCH, %k4
|
||||
+ kmovd %k4, %eax
|
||||
testl %eax, %eax
|
||||
- jnz L(match)
|
||||
-L(return_value):
|
||||
- testl %edx, %edx
|
||||
- jz L(return_null)
|
||||
- movl %edx, %eax
|
||||
- movq %rsi, %rdi
|
||||
+ jz L(first_vec_x1_or_x2)
|
||||
bsrl %eax, %eax
|
||||
-# ifdef USE_AS_WCSRCHR
|
||||
- /* NB: Multiply wchar_t count by 4 to get the number of bytes. */
|
||||
- leaq -VEC_SIZE(%rdi, %rax, 4), %rax
|
||||
-# else
|
||||
- leaq -VEC_SIZE(%rdi, %rax), %rax
|
||||
-# endif
|
||||
+ leaq (VEC_SIZE * 3)(%r8, %rax, CHAR_SIZE), %rax
|
||||
ret
|
||||
|
||||
- .p2align 4
|
||||
-L(match):
|
||||
- /* Find a CHAR. Check if there is a null byte. */
|
||||
- kmovd %k0, %ecx
|
||||
- testl %ecx, %ecx
|
||||
- jnz L(find_nul)
|
||||
+ .p2align 4,, 8
|
||||
+L(return_first_aligned_loop):
|
||||
+ VPTESTN %YMM5, %YMM5, %k0
|
||||
+ kunpck %k0, %k1, %k0
|
||||
+ kmov_2x %k0, %maskz_2x
|
||||
+
|
||||
+ blsmsk %maskz_2x, %maskz_2x
|
||||
+ kunpck %k2, %k3, %k3
|
||||
+ kmov_2x %k3, %maskm_2x
|
||||
+ and %maskz_2x, %maskm_2x
|
||||
+ jz L(first_vec_x1_or_x2_or_x3)
|
||||
|
||||
- /* Remember the match and keep searching. */
|
||||
- movl %eax, %edx
|
||||
+ bsr %maskm_2x, %maskm_2x
|
||||
+ leaq (VEC_SIZE * 2)(%rdi, %rax, CHAR_SIZE), %rax
|
||||
+ ret
|
||||
+
|
||||
+ .p2align 4
|
||||
+ /* We can throw away the work done for the first 4x checks here
|
||||
+ as we have a later match. This is the 'fast' path persay.
|
||||
+ */
|
||||
+L(second_aligned_loop_prep):
|
||||
+L(second_aligned_loop_set_furthest_match):
|
||||
movq %rdi, %rsi
|
||||
- jmp L(aligned_loop)
|
||||
+ kunpck %k2, %k3, %k4
|
||||
|
||||
.p2align 4
|
||||
-L(find_nul):
|
||||
- /* Mask out any matching bits after the null byte. */
|
||||
- movl %ecx, %r8d
|
||||
- subl $1, %r8d
|
||||
- xorl %ecx, %r8d
|
||||
- andl %r8d, %eax
|
||||
- testl %eax, %eax
|
||||
- /* If there is no CHAR here, return the remembered one. */
|
||||
- jz L(return_value)
|
||||
- bsrl %eax, %eax
|
||||
+L(second_aligned_loop):
|
||||
+ VMOVU (VEC_SIZE * 4)(%rdi), %YMM1
|
||||
+ VMOVU (VEC_SIZE * 5)(%rdi), %YMM2
|
||||
+
|
||||
+ VPCMP $0, %YMM1, %YMMMATCH, %k2
|
||||
+ vpxord %YMM2, %YMMMATCH, %YMM3
|
||||
+
|
||||
+ VPMIN %YMM1, %YMM2, %YMM4
|
||||
+ VPMIN %YMM3, %YMM4, %YMM3
|
||||
+
|
||||
+ VPTESTN %YMM3, %YMM3, %k1
|
||||
+ subq $(VEC_SIZE * -2), %rdi
|
||||
+ kortestd %k1, %k2
|
||||
+ jz L(second_aligned_loop)
|
||||
+
|
||||
+ VPCMP $0, %YMM2, %YMMMATCH, %k3
|
||||
+ VPTESTN %YMM4, %YMM4, %k1
|
||||
+ ktestd %k1, %k1
|
||||
+ jz L(second_aligned_loop_set_furthest_match)
|
||||
+
|
||||
+ kortestd %k2, %k3
|
||||
+ /* branch here because there is a significant advantage interms
|
||||
+ of output dependency chance in using edx. */
|
||||
+ jnz L(return_new_match)
|
||||
+L(return_old_match):
|
||||
+ kmovq %k4, %rax
|
||||
+ bsrq %rax, %rax
|
||||
+ leaq (VEC_SIZE * 2)(%rsi, %rax, CHAR_SIZE), %rax
|
||||
+ ret
|
||||
+
|
||||
+L(return_new_match):
|
||||
+ VPTESTN %YMM1, %YMM1, %k0
|
||||
+ kunpck %k0, %k1, %k0
|
||||
+ kmov_2x %k0, %maskz_2x
|
||||
+
|
||||
+ blsmsk %maskz_2x, %maskz_2x
|
||||
+ kunpck %k2, %k3, %k3
|
||||
+ kmov_2x %k3, %maskm_2x
|
||||
+ and %maskz_2x, %maskm_2x
|
||||
+ jz L(return_old_match)
|
||||
+
|
||||
+ bsr %maskm_2x, %maskm_2x
|
||||
+ leaq (VEC_SIZE * 2)(%rdi, %rax, CHAR_SIZE), %rax
|
||||
+ ret
|
||||
+
|
||||
+L(cross_page_boundary):
|
||||
+ /* eax contains all the page offset bits of src (rdi). `xor rdi,
|
||||
+ rax` sets pointer will all page offset bits cleared so
|
||||
+ offset of (PAGE_SIZE - VEC_SIZE) will get last aligned VEC
|
||||
+ before page cross (guranteed to be safe to read). Doing this
|
||||
+ as opposed to `movq %rdi, %rax; andq $-VEC_SIZE, %rax` saves
|
||||
+ a bit of code size. */
|
||||
+ xorq %rdi, %rax
|
||||
+ VMOVU (PAGE_SIZE - VEC_SIZE)(%rax), %YMM1
|
||||
+ VPTESTN %YMM1, %YMM1, %k0
|
||||
+ kmovd %k0, %ecx
|
||||
+
|
||||
+ /* Shift out zero CHAR matches that are before the begining of
|
||||
+ src (rdi). */
|
||||
# ifdef USE_AS_WCSRCHR
|
||||
- /* NB: Multiply wchar_t count by 4 to get the number of bytes. */
|
||||
- leaq -VEC_SIZE(%rdi, %rax, 4), %rax
|
||||
-# else
|
||||
- leaq -VEC_SIZE(%rdi, %rax), %rax
|
||||
+ movl %edi, %esi
|
||||
+ andl $(VEC_SIZE - 1), %esi
|
||||
+ shrl $2, %esi
|
||||
# endif
|
||||
- ret
|
||||
+ shrxl %SHIFT_REG, %ecx, %ecx
|
||||
|
||||
- .p2align 4
|
||||
-L(char_and_nul):
|
||||
- /* Find both a CHAR and a null byte. */
|
||||
- addq %rcx, %rdi
|
||||
- movl %edx, %ecx
|
||||
-L(char_and_nul_in_first_vec):
|
||||
- /* Mask out any matching bits after the null byte. */
|
||||
- movl %ecx, %r8d
|
||||
- subl $1, %r8d
|
||||
- xorl %ecx, %r8d
|
||||
- andl %r8d, %eax
|
||||
- testl %eax, %eax
|
||||
- /* Return null pointer if the null byte comes first. */
|
||||
- jz L(return_null)
|
||||
+ testl %ecx, %ecx
|
||||
+ jz L(page_cross_continue)
|
||||
+
|
||||
+ /* Found zero CHAR so need to test for search CHAR. */
|
||||
+ VPCMP $0, %YMMMATCH, %YMM1, %k1
|
||||
+ kmovd %k1, %eax
|
||||
+ /* Shift out search CHAR matches that are before the begining of
|
||||
+ src (rdi). */
|
||||
+ shrxl %SHIFT_REG, %eax, %eax
|
||||
+
|
||||
+ /* Check if any search CHAR match in range. */
|
||||
+ blsmskl %ecx, %ecx
|
||||
+ andl %ecx, %eax
|
||||
+ jz L(ret3)
|
||||
bsrl %eax, %eax
|
||||
# ifdef USE_AS_WCSRCHR
|
||||
- /* NB: Multiply wchar_t count by 4 to get the number of bytes. */
|
||||
- leaq -VEC_SIZE(%rdi, %rax, 4), %rax
|
||||
+ leaq (%rdi, %rax, CHAR_SIZE), %rax
|
||||
# else
|
||||
- leaq -VEC_SIZE(%rdi, %rax), %rax
|
||||
+ addq %rdi, %rax
|
||||
# endif
|
||||
+L(ret3):
|
||||
ret
|
||||
|
||||
- .p2align 4
|
||||
-L(return_null):
|
||||
- xorl %eax, %eax
|
||||
- ret
|
||||
-
|
||||
-END (STRRCHR)
|
||||
+END(STRRCHR)
|
||||
#endif
|
35
glibc-upstream-2.34-236.patch
Normal file
35
glibc-upstream-2.34-236.patch
Normal file
@ -0,0 +1,35 @@
|
||||
commit 1f83d40dfab15a6888759552f24d1b5c0907408b
|
||||
Author: Florian Weimer <fweimer@redhat.com>
|
||||
Date: Thu Dec 23 12:24:30 2021 +0100
|
||||
|
||||
elf: Remove unused NEED_DL_BASE_ADDR and _dl_base_addr
|
||||
|
||||
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
|
||||
(cherry picked from commit cd0c333d2ea82d0ae14719bdbef86d99615bdb00)
|
||||
|
||||
diff --git a/elf/dl-sysdep.c b/elf/dl-sysdep.c
|
||||
index 4dc366eea445e974..1c78dc89c9cbe54d 100644
|
||||
--- a/elf/dl-sysdep.c
|
||||
+++ b/elf/dl-sysdep.c
|
||||
@@ -54,9 +54,6 @@ extern char _end[] attribute_hidden;
|
||||
/* Protect SUID program against misuse of file descriptors. */
|
||||
extern void __libc_check_standard_fds (void);
|
||||
|
||||
-#ifdef NEED_DL_BASE_ADDR
|
||||
-ElfW(Addr) _dl_base_addr;
|
||||
-#endif
|
||||
int __libc_enable_secure attribute_relro = 0;
|
||||
rtld_hidden_data_def (__libc_enable_secure)
|
||||
/* This variable contains the lowest stack address ever used. */
|
||||
@@ -136,11 +133,6 @@ _dl_sysdep_start (void **start_argptr,
|
||||
case AT_ENTRY:
|
||||
user_entry = av->a_un.a_val;
|
||||
break;
|
||||
-#ifdef NEED_DL_BASE_ADDR
|
||||
- case AT_BASE:
|
||||
- _dl_base_addr = av->a_un.a_val;
|
||||
- break;
|
||||
-#endif
|
||||
#ifndef HAVE_AUX_SECURE
|
||||
case AT_UID:
|
||||
case AT_EUID:
|
751
glibc-upstream-2.34-237.patch
Normal file
751
glibc-upstream-2.34-237.patch
Normal file
@ -0,0 +1,751 @@
|
||||
commit b0bd6a1323c3eccd16c45bae359a76877fa75639
|
||||
Author: Florian Weimer <fweimer@redhat.com>
|
||||
Date: Thu May 19 11:43:53 2022 +0200
|
||||
|
||||
elf: Merge dl-sysdep.c into the Linux version
|
||||
|
||||
The generic version is the de-facto Linux implementation. It
|
||||
requires an auxiliary vector, so Hurd does not use it.
|
||||
|
||||
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
|
||||
(cherry picked from commit 91c0a47ffb66e7cd802de870686465db3b3976a0)
|
||||
|
||||
Conflicts:
|
||||
elf/dl-sysdep.c
|
||||
(missing ld.so dependency sorting optimization upstream)
|
||||
|
||||
diff --git a/elf/dl-sysdep.c b/elf/dl-sysdep.c
|
||||
index 1c78dc89c9cbe54d..7aa90ad6eeb35cad 100644
|
||||
--- a/elf/dl-sysdep.c
|
||||
+++ b/elf/dl-sysdep.c
|
||||
@@ -1,5 +1,5 @@
|
||||
-/* Operating system support for run-time dynamic linker. Generic Unix version.
|
||||
- Copyright (C) 1995-2021 Free Software Foundation, Inc.
|
||||
+/* Operating system support for run-time dynamic linker. Stub version.
|
||||
+ Copyright (C) 1995-2022 Free Software Foundation, Inc.
|
||||
This file is part of the GNU C Library.
|
||||
|
||||
The GNU C Library is free software; you can redistribute it and/or
|
||||
@@ -16,352 +16,4 @@
|
||||
License along with the GNU C Library; if not, see
|
||||
<https://www.gnu.org/licenses/>. */
|
||||
|
||||
-/* We conditionalize the whole of this file rather than simply eliding it
|
||||
- from the static build, because other sysdeps/ versions of this file
|
||||
- might define things needed by a static build. */
|
||||
-
|
||||
-#ifdef SHARED
|
||||
-
|
||||
-#include <assert.h>
|
||||
-#include <elf.h>
|
||||
-#include <errno.h>
|
||||
-#include <fcntl.h>
|
||||
-#include <libintl.h>
|
||||
-#include <stdlib.h>
|
||||
-#include <string.h>
|
||||
-#include <unistd.h>
|
||||
-#include <sys/types.h>
|
||||
-#include <sys/stat.h>
|
||||
-#include <sys/mman.h>
|
||||
-#include <ldsodefs.h>
|
||||
-#include <_itoa.h>
|
||||
-#include <fpu_control.h>
|
||||
-
|
||||
-#include <entry.h>
|
||||
-#include <dl-machine.h>
|
||||
-#include <dl-procinfo.h>
|
||||
-#include <dl-osinfo.h>
|
||||
-#include <libc-internal.h>
|
||||
-#include <tls.h>
|
||||
-
|
||||
-#include <dl-tunables.h>
|
||||
-#include <dl-auxv.h>
|
||||
-#include <dl-hwcap-check.h>
|
||||
-
|
||||
-extern char **_environ attribute_hidden;
|
||||
-extern char _end[] attribute_hidden;
|
||||
-
|
||||
-/* Protect SUID program against misuse of file descriptors. */
|
||||
-extern void __libc_check_standard_fds (void);
|
||||
-
|
||||
-int __libc_enable_secure attribute_relro = 0;
|
||||
-rtld_hidden_data_def (__libc_enable_secure)
|
||||
-/* This variable contains the lowest stack address ever used. */
|
||||
-void *__libc_stack_end attribute_relro = NULL;
|
||||
-rtld_hidden_data_def(__libc_stack_end)
|
||||
-void *_dl_random attribute_relro = NULL;
|
||||
-
|
||||
-#ifndef DL_FIND_ARG_COMPONENTS
|
||||
-# define DL_FIND_ARG_COMPONENTS(cookie, argc, argv, envp, auxp) \
|
||||
- do { \
|
||||
- void **_tmp; \
|
||||
- (argc) = *(long int *) cookie; \
|
||||
- (argv) = (char **) ((long int *) cookie + 1); \
|
||||
- (envp) = (argv) + (argc) + 1; \
|
||||
- for (_tmp = (void **) (envp); *_tmp; ++_tmp) \
|
||||
- continue; \
|
||||
- (auxp) = (void *) ++_tmp; \
|
||||
- } while (0)
|
||||
-#endif
|
||||
-
|
||||
-#ifndef DL_STACK_END
|
||||
-# define DL_STACK_END(cookie) ((void *) (cookie))
|
||||
-#endif
|
||||
-
|
||||
-ElfW(Addr)
|
||||
-_dl_sysdep_start (void **start_argptr,
|
||||
- void (*dl_main) (const ElfW(Phdr) *phdr, ElfW(Word) phnum,
|
||||
- ElfW(Addr) *user_entry, ElfW(auxv_t) *auxv))
|
||||
-{
|
||||
- const ElfW(Phdr) *phdr = NULL;
|
||||
- ElfW(Word) phnum = 0;
|
||||
- ElfW(Addr) user_entry;
|
||||
- ElfW(auxv_t) *av;
|
||||
-#ifdef HAVE_AUX_SECURE
|
||||
-# define set_seen(tag) (tag) /* Evaluate for the side effects. */
|
||||
-# define set_seen_secure() ((void) 0)
|
||||
-#else
|
||||
- uid_t uid = 0;
|
||||
- gid_t gid = 0;
|
||||
- unsigned int seen = 0;
|
||||
-# define set_seen_secure() (seen = -1)
|
||||
-# ifdef HAVE_AUX_XID
|
||||
-# define set_seen(tag) (tag) /* Evaluate for the side effects. */
|
||||
-# else
|
||||
-# define M(type) (1 << (type))
|
||||
-# define set_seen(tag) seen |= M ((tag)->a_type)
|
||||
-# endif
|
||||
-#endif
|
||||
-#ifdef NEED_DL_SYSINFO
|
||||
- uintptr_t new_sysinfo = 0;
|
||||
-#endif
|
||||
-
|
||||
- __libc_stack_end = DL_STACK_END (start_argptr);
|
||||
- DL_FIND_ARG_COMPONENTS (start_argptr, _dl_argc, _dl_argv, _environ,
|
||||
- GLRO(dl_auxv));
|
||||
-
|
||||
- user_entry = (ElfW(Addr)) ENTRY_POINT;
|
||||
- GLRO(dl_platform) = NULL; /* Default to nothing known about the platform. */
|
||||
-
|
||||
- /* NB: Default to a constant CONSTANT_MINSIGSTKSZ. */
|
||||
- _Static_assert (__builtin_constant_p (CONSTANT_MINSIGSTKSZ),
|
||||
- "CONSTANT_MINSIGSTKSZ is constant");
|
||||
- GLRO(dl_minsigstacksize) = CONSTANT_MINSIGSTKSZ;
|
||||
-
|
||||
- for (av = GLRO(dl_auxv); av->a_type != AT_NULL; set_seen (av++))
|
||||
- switch (av->a_type)
|
||||
- {
|
||||
- case AT_PHDR:
|
||||
- phdr = (void *) av->a_un.a_val;
|
||||
- break;
|
||||
- case AT_PHNUM:
|
||||
- phnum = av->a_un.a_val;
|
||||
- break;
|
||||
- case AT_PAGESZ:
|
||||
- GLRO(dl_pagesize) = av->a_un.a_val;
|
||||
- break;
|
||||
- case AT_ENTRY:
|
||||
- user_entry = av->a_un.a_val;
|
||||
- break;
|
||||
-#ifndef HAVE_AUX_SECURE
|
||||
- case AT_UID:
|
||||
- case AT_EUID:
|
||||
- uid ^= av->a_un.a_val;
|
||||
- break;
|
||||
- case AT_GID:
|
||||
- case AT_EGID:
|
||||
- gid ^= av->a_un.a_val;
|
||||
- break;
|
||||
-#endif
|
||||
- case AT_SECURE:
|
||||
-#ifndef HAVE_AUX_SECURE
|
||||
- seen = -1;
|
||||
-#endif
|
||||
- __libc_enable_secure = av->a_un.a_val;
|
||||
- break;
|
||||
- case AT_PLATFORM:
|
||||
- GLRO(dl_platform) = (void *) av->a_un.a_val;
|
||||
- break;
|
||||
- case AT_HWCAP:
|
||||
- GLRO(dl_hwcap) = (unsigned long int) av->a_un.a_val;
|
||||
- break;
|
||||
- case AT_HWCAP2:
|
||||
- GLRO(dl_hwcap2) = (unsigned long int) av->a_un.a_val;
|
||||
- break;
|
||||
- case AT_CLKTCK:
|
||||
- GLRO(dl_clktck) = av->a_un.a_val;
|
||||
- break;
|
||||
- case AT_FPUCW:
|
||||
- GLRO(dl_fpu_control) = av->a_un.a_val;
|
||||
- break;
|
||||
-#ifdef NEED_DL_SYSINFO
|
||||
- case AT_SYSINFO:
|
||||
- new_sysinfo = av->a_un.a_val;
|
||||
- break;
|
||||
-#endif
|
||||
-#ifdef NEED_DL_SYSINFO_DSO
|
||||
- case AT_SYSINFO_EHDR:
|
||||
- GLRO(dl_sysinfo_dso) = (void *) av->a_un.a_val;
|
||||
- break;
|
||||
-#endif
|
||||
- case AT_RANDOM:
|
||||
- _dl_random = (void *) av->a_un.a_val;
|
||||
- break;
|
||||
- case AT_MINSIGSTKSZ:
|
||||
- GLRO(dl_minsigstacksize) = av->a_un.a_val;
|
||||
- break;
|
||||
- DL_PLATFORM_AUXV
|
||||
- }
|
||||
-
|
||||
- dl_hwcap_check ();
|
||||
-
|
||||
-#ifndef HAVE_AUX_SECURE
|
||||
- if (seen != -1)
|
||||
- {
|
||||
- /* Fill in the values we have not gotten from the kernel through the
|
||||
- auxiliary vector. */
|
||||
-# ifndef HAVE_AUX_XID
|
||||
-# define SEE(UID, var, uid) \
|
||||
- if ((seen & M (AT_##UID)) == 0) var ^= __get##uid ()
|
||||
- SEE (UID, uid, uid);
|
||||
- SEE (EUID, uid, euid);
|
||||
- SEE (GID, gid, gid);
|
||||
- SEE (EGID, gid, egid);
|
||||
-# endif
|
||||
-
|
||||
- /* If one of the two pairs of IDs does not match this is a setuid
|
||||
- or setgid run. */
|
||||
- __libc_enable_secure = uid | gid;
|
||||
- }
|
||||
-#endif
|
||||
-
|
||||
-#ifndef HAVE_AUX_PAGESIZE
|
||||
- if (GLRO(dl_pagesize) == 0)
|
||||
- GLRO(dl_pagesize) = __getpagesize ();
|
||||
-#endif
|
||||
-
|
||||
-#ifdef NEED_DL_SYSINFO
|
||||
- if (new_sysinfo != 0)
|
||||
- {
|
||||
-# ifdef NEED_DL_SYSINFO_DSO
|
||||
- /* Only set the sysinfo value if we also have the vsyscall DSO. */
|
||||
- if (GLRO(dl_sysinfo_dso) != 0)
|
||||
-# endif
|
||||
- GLRO(dl_sysinfo) = new_sysinfo;
|
||||
- }
|
||||
-#endif
|
||||
-
|
||||
- __tunables_init (_environ);
|
||||
-
|
||||
- /* Initialize DSO sorting algorithm after tunables. */
|
||||
- _dl_sort_maps_init ();
|
||||
-
|
||||
-#ifdef DL_SYSDEP_INIT
|
||||
- DL_SYSDEP_INIT;
|
||||
-#endif
|
||||
-
|
||||
-#ifdef DL_PLATFORM_INIT
|
||||
- DL_PLATFORM_INIT;
|
||||
-#endif
|
||||
-
|
||||
- /* Determine the length of the platform name. */
|
||||
- if (GLRO(dl_platform) != NULL)
|
||||
- GLRO(dl_platformlen) = strlen (GLRO(dl_platform));
|
||||
-
|
||||
- if (__sbrk (0) == _end)
|
||||
- /* The dynamic linker was run as a program, and so the initial break
|
||||
- starts just after our bss, at &_end. The malloc in dl-minimal.c
|
||||
- will consume the rest of this page, so tell the kernel to move the
|
||||
- break up that far. When the user program examines its break, it
|
||||
- will see this new value and not clobber our data. */
|
||||
- __sbrk (GLRO(dl_pagesize)
|
||||
- - ((_end - (char *) 0) & (GLRO(dl_pagesize) - 1)));
|
||||
-
|
||||
- /* If this is a SUID program we make sure that FDs 0, 1, and 2 are
|
||||
- allocated. If necessary we are doing it ourself. If it is not
|
||||
- possible we stop the program. */
|
||||
- if (__builtin_expect (__libc_enable_secure, 0))
|
||||
- __libc_check_standard_fds ();
|
||||
-
|
||||
- (*dl_main) (phdr, phnum, &user_entry, GLRO(dl_auxv));
|
||||
- return user_entry;
|
||||
-}
|
||||
-
|
||||
-void
|
||||
-_dl_sysdep_start_cleanup (void)
|
||||
-{
|
||||
-}
|
||||
-
|
||||
-void
|
||||
-_dl_show_auxv (void)
|
||||
-{
|
||||
- char buf[64];
|
||||
- ElfW(auxv_t) *av;
|
||||
-
|
||||
- /* Terminate string. */
|
||||
- buf[63] = '\0';
|
||||
-
|
||||
- /* The following code assumes that the AT_* values are encoded
|
||||
- starting from 0 with AT_NULL, 1 for AT_IGNORE, and all other values
|
||||
- close by (otherwise the array will be too large). In case we have
|
||||
- to support a platform where these requirements are not fulfilled
|
||||
- some alternative implementation has to be used. */
|
||||
- for (av = GLRO(dl_auxv); av->a_type != AT_NULL; ++av)
|
||||
- {
|
||||
- static const struct
|
||||
- {
|
||||
- const char label[22];
|
||||
- enum { unknown = 0, dec, hex, str, ignore } form : 8;
|
||||
- } auxvars[] =
|
||||
- {
|
||||
- [AT_EXECFD - 2] = { "EXECFD: ", dec },
|
||||
- [AT_EXECFN - 2] = { "EXECFN: ", str },
|
||||
- [AT_PHDR - 2] = { "PHDR: 0x", hex },
|
||||
- [AT_PHENT - 2] = { "PHENT: ", dec },
|
||||
- [AT_PHNUM - 2] = { "PHNUM: ", dec },
|
||||
- [AT_PAGESZ - 2] = { "PAGESZ: ", dec },
|
||||
- [AT_BASE - 2] = { "BASE: 0x", hex },
|
||||
- [AT_FLAGS - 2] = { "FLAGS: 0x", hex },
|
||||
- [AT_ENTRY - 2] = { "ENTRY: 0x", hex },
|
||||
- [AT_NOTELF - 2] = { "NOTELF: ", hex },
|
||||
- [AT_UID - 2] = { "UID: ", dec },
|
||||
- [AT_EUID - 2] = { "EUID: ", dec },
|
||||
- [AT_GID - 2] = { "GID: ", dec },
|
||||
- [AT_EGID - 2] = { "EGID: ", dec },
|
||||
- [AT_PLATFORM - 2] = { "PLATFORM: ", str },
|
||||
- [AT_HWCAP - 2] = { "HWCAP: ", hex },
|
||||
- [AT_CLKTCK - 2] = { "CLKTCK: ", dec },
|
||||
- [AT_FPUCW - 2] = { "FPUCW: ", hex },
|
||||
- [AT_DCACHEBSIZE - 2] = { "DCACHEBSIZE: 0x", hex },
|
||||
- [AT_ICACHEBSIZE - 2] = { "ICACHEBSIZE: 0x", hex },
|
||||
- [AT_UCACHEBSIZE - 2] = { "UCACHEBSIZE: 0x", hex },
|
||||
- [AT_IGNOREPPC - 2] = { "IGNOREPPC", ignore },
|
||||
- [AT_SECURE - 2] = { "SECURE: ", dec },
|
||||
- [AT_BASE_PLATFORM - 2] = { "BASE_PLATFORM: ", str },
|
||||
- [AT_SYSINFO - 2] = { "SYSINFO: 0x", hex },
|
||||
- [AT_SYSINFO_EHDR - 2] = { "SYSINFO_EHDR: 0x", hex },
|
||||
- [AT_RANDOM - 2] = { "RANDOM: 0x", hex },
|
||||
- [AT_HWCAP2 - 2] = { "HWCAP2: 0x", hex },
|
||||
- [AT_MINSIGSTKSZ - 2] = { "MINSIGSTKSZ: ", dec },
|
||||
- [AT_L1I_CACHESIZE - 2] = { "L1I_CACHESIZE: ", dec },
|
||||
- [AT_L1I_CACHEGEOMETRY - 2] = { "L1I_CACHEGEOMETRY: 0x", hex },
|
||||
- [AT_L1D_CACHESIZE - 2] = { "L1D_CACHESIZE: ", dec },
|
||||
- [AT_L1D_CACHEGEOMETRY - 2] = { "L1D_CACHEGEOMETRY: 0x", hex },
|
||||
- [AT_L2_CACHESIZE - 2] = { "L2_CACHESIZE: ", dec },
|
||||
- [AT_L2_CACHEGEOMETRY - 2] = { "L2_CACHEGEOMETRY: 0x", hex },
|
||||
- [AT_L3_CACHESIZE - 2] = { "L3_CACHESIZE: ", dec },
|
||||
- [AT_L3_CACHEGEOMETRY - 2] = { "L3_CACHEGEOMETRY: 0x", hex },
|
||||
- };
|
||||
- unsigned int idx = (unsigned int) (av->a_type - 2);
|
||||
-
|
||||
- if ((unsigned int) av->a_type < 2u
|
||||
- || (idx < sizeof (auxvars) / sizeof (auxvars[0])
|
||||
- && auxvars[idx].form == ignore))
|
||||
- continue;
|
||||
-
|
||||
- assert (AT_NULL == 0);
|
||||
- assert (AT_IGNORE == 1);
|
||||
-
|
||||
- /* Some entries are handled in a special way per platform. */
|
||||
- if (_dl_procinfo (av->a_type, av->a_un.a_val) == 0)
|
||||
- continue;
|
||||
-
|
||||
- if (idx < sizeof (auxvars) / sizeof (auxvars[0])
|
||||
- && auxvars[idx].form != unknown)
|
||||
- {
|
||||
- const char *val = (char *) av->a_un.a_val;
|
||||
-
|
||||
- if (__builtin_expect (auxvars[idx].form, dec) == dec)
|
||||
- val = _itoa ((unsigned long int) av->a_un.a_val,
|
||||
- buf + sizeof buf - 1, 10, 0);
|
||||
- else if (__builtin_expect (auxvars[idx].form, hex) == hex)
|
||||
- val = _itoa ((unsigned long int) av->a_un.a_val,
|
||||
- buf + sizeof buf - 1, 16, 0);
|
||||
-
|
||||
- _dl_printf ("AT_%s%s\n", auxvars[idx].label, val);
|
||||
-
|
||||
- continue;
|
||||
- }
|
||||
-
|
||||
- /* Unknown value: print a generic line. */
|
||||
- char buf2[17];
|
||||
- buf2[sizeof (buf2) - 1] = '\0';
|
||||
- const char *val2 = _itoa ((unsigned long int) av->a_un.a_val,
|
||||
- buf2 + sizeof buf2 - 1, 16, 0);
|
||||
- const char *val = _itoa ((unsigned long int) av->a_type,
|
||||
- buf + sizeof buf - 1, 16, 0);
|
||||
- _dl_printf ("AT_??? (0x%s): 0x%s\n", val, val2);
|
||||
- }
|
||||
-}
|
||||
-
|
||||
-#endif
|
||||
+#error dl-sysdep support missing.
|
||||
diff --git a/sysdeps/unix/sysv/linux/dl-sysdep.c b/sysdeps/unix/sysv/linux/dl-sysdep.c
|
||||
index 144dc5ce5a1bba17..3e41469bcc395179 100644
|
||||
--- a/sysdeps/unix/sysv/linux/dl-sysdep.c
|
||||
+++ b/sysdeps/unix/sysv/linux/dl-sysdep.c
|
||||
@@ -16,29 +16,352 @@
|
||||
License along with the GNU C Library; if not, see
|
||||
<https://www.gnu.org/licenses/>. */
|
||||
|
||||
-/* Linux needs some special initialization, but otherwise uses
|
||||
- the generic dynamic linker system interface code. */
|
||||
-
|
||||
-#include <string.h>
|
||||
+#include <_itoa.h>
|
||||
+#include <assert.h>
|
||||
+#include <dl-auxv.h>
|
||||
+#include <dl-hwcap-check.h>
|
||||
+#include <dl-osinfo.h>
|
||||
+#include <dl-procinfo.h>
|
||||
+#include <dl-tunables.h>
|
||||
+#include <elf.h>
|
||||
+#include <entry.h>
|
||||
+#include <errno.h>
|
||||
#include <fcntl.h>
|
||||
-#include <unistd.h>
|
||||
-#include <sys/param.h>
|
||||
-#include <sys/utsname.h>
|
||||
+#include <fpu_control.h>
|
||||
#include <ldsodefs.h>
|
||||
+#include <libc-internal.h>
|
||||
+#include <libintl.h>
|
||||
#include <not-cancel.h>
|
||||
+#include <stdlib.h>
|
||||
+#include <string.h>
|
||||
+#include <string.h>
|
||||
+#include <sys/mman.h>
|
||||
+#include <sys/param.h>
|
||||
+#include <sys/stat.h>
|
||||
+#include <sys/types.h>
|
||||
+#include <sys/utsname.h>
|
||||
+#include <tls.h>
|
||||
+#include <unistd.h>
|
||||
+
|
||||
+#include <dl-machine.h>
|
||||
|
||||
#ifdef SHARED
|
||||
-# define DL_SYSDEP_INIT frob_brk ()
|
||||
+extern char **_environ attribute_hidden;
|
||||
+extern char _end[] attribute_hidden;
|
||||
+
|
||||
+/* Protect SUID program against misuse of file descriptors. */
|
||||
+extern void __libc_check_standard_fds (void);
|
||||
|
||||
-static inline void
|
||||
-frob_brk (void)
|
||||
+int __libc_enable_secure attribute_relro = 0;
|
||||
+rtld_hidden_data_def (__libc_enable_secure)
|
||||
+/* This variable contains the lowest stack address ever used. */
|
||||
+void *__libc_stack_end attribute_relro = NULL;
|
||||
+rtld_hidden_data_def(__libc_stack_end)
|
||||
+void *_dl_random attribute_relro = NULL;
|
||||
+
|
||||
+#ifndef DL_FIND_ARG_COMPONENTS
|
||||
+# define DL_FIND_ARG_COMPONENTS(cookie, argc, argv, envp, auxp) \
|
||||
+ do { \
|
||||
+ void **_tmp; \
|
||||
+ (argc) = *(long int *) cookie; \
|
||||
+ (argv) = (char **) ((long int *) cookie + 1); \
|
||||
+ (envp) = (argv) + (argc) + 1; \
|
||||
+ for (_tmp = (void **) (envp); *_tmp; ++_tmp) \
|
||||
+ continue; \
|
||||
+ (auxp) = (void *) ++_tmp; \
|
||||
+ } while (0)
|
||||
+#endif
|
||||
+
|
||||
+#ifndef DL_STACK_END
|
||||
+# define DL_STACK_END(cookie) ((void *) (cookie))
|
||||
+#endif
|
||||
+
|
||||
+ElfW(Addr)
|
||||
+_dl_sysdep_start (void **start_argptr,
|
||||
+ void (*dl_main) (const ElfW(Phdr) *phdr, ElfW(Word) phnum,
|
||||
+ ElfW(Addr) *user_entry, ElfW(auxv_t) *auxv))
|
||||
{
|
||||
+ const ElfW(Phdr) *phdr = NULL;
|
||||
+ ElfW(Word) phnum = 0;
|
||||
+ ElfW(Addr) user_entry;
|
||||
+ ElfW(auxv_t) *av;
|
||||
+#ifdef HAVE_AUX_SECURE
|
||||
+# define set_seen(tag) (tag) /* Evaluate for the side effects. */
|
||||
+# define set_seen_secure() ((void) 0)
|
||||
+#else
|
||||
+ uid_t uid = 0;
|
||||
+ gid_t gid = 0;
|
||||
+ unsigned int seen = 0;
|
||||
+# define set_seen_secure() (seen = -1)
|
||||
+# ifdef HAVE_AUX_XID
|
||||
+# define set_seen(tag) (tag) /* Evaluate for the side effects. */
|
||||
+# else
|
||||
+# define M(type) (1 << (type))
|
||||
+# define set_seen(tag) seen |= M ((tag)->a_type)
|
||||
+# endif
|
||||
+#endif
|
||||
+#ifdef NEED_DL_SYSINFO
|
||||
+ uintptr_t new_sysinfo = 0;
|
||||
+#endif
|
||||
+
|
||||
+ __libc_stack_end = DL_STACK_END (start_argptr);
|
||||
+ DL_FIND_ARG_COMPONENTS (start_argptr, _dl_argc, _dl_argv, _environ,
|
||||
+ GLRO(dl_auxv));
|
||||
+
|
||||
+ user_entry = (ElfW(Addr)) ENTRY_POINT;
|
||||
+ GLRO(dl_platform) = NULL; /* Default to nothing known about the platform. */
|
||||
+
|
||||
+ /* NB: Default to a constant CONSTANT_MINSIGSTKSZ. */
|
||||
+ _Static_assert (__builtin_constant_p (CONSTANT_MINSIGSTKSZ),
|
||||
+ "CONSTANT_MINSIGSTKSZ is constant");
|
||||
+ GLRO(dl_minsigstacksize) = CONSTANT_MINSIGSTKSZ;
|
||||
+
|
||||
+ for (av = GLRO(dl_auxv); av->a_type != AT_NULL; set_seen (av++))
|
||||
+ switch (av->a_type)
|
||||
+ {
|
||||
+ case AT_PHDR:
|
||||
+ phdr = (void *) av->a_un.a_val;
|
||||
+ break;
|
||||
+ case AT_PHNUM:
|
||||
+ phnum = av->a_un.a_val;
|
||||
+ break;
|
||||
+ case AT_PAGESZ:
|
||||
+ GLRO(dl_pagesize) = av->a_un.a_val;
|
||||
+ break;
|
||||
+ case AT_ENTRY:
|
||||
+ user_entry = av->a_un.a_val;
|
||||
+ break;
|
||||
+#ifndef HAVE_AUX_SECURE
|
||||
+ case AT_UID:
|
||||
+ case AT_EUID:
|
||||
+ uid ^= av->a_un.a_val;
|
||||
+ break;
|
||||
+ case AT_GID:
|
||||
+ case AT_EGID:
|
||||
+ gid ^= av->a_un.a_val;
|
||||
+ break;
|
||||
+#endif
|
||||
+ case AT_SECURE:
|
||||
+#ifndef HAVE_AUX_SECURE
|
||||
+ seen = -1;
|
||||
+#endif
|
||||
+ __libc_enable_secure = av->a_un.a_val;
|
||||
+ break;
|
||||
+ case AT_PLATFORM:
|
||||
+ GLRO(dl_platform) = (void *) av->a_un.a_val;
|
||||
+ break;
|
||||
+ case AT_HWCAP:
|
||||
+ GLRO(dl_hwcap) = (unsigned long int) av->a_un.a_val;
|
||||
+ break;
|
||||
+ case AT_HWCAP2:
|
||||
+ GLRO(dl_hwcap2) = (unsigned long int) av->a_un.a_val;
|
||||
+ break;
|
||||
+ case AT_CLKTCK:
|
||||
+ GLRO(dl_clktck) = av->a_un.a_val;
|
||||
+ break;
|
||||
+ case AT_FPUCW:
|
||||
+ GLRO(dl_fpu_control) = av->a_un.a_val;
|
||||
+ break;
|
||||
+#ifdef NEED_DL_SYSINFO
|
||||
+ case AT_SYSINFO:
|
||||
+ new_sysinfo = av->a_un.a_val;
|
||||
+ break;
|
||||
+#endif
|
||||
+#ifdef NEED_DL_SYSINFO_DSO
|
||||
+ case AT_SYSINFO_EHDR:
|
||||
+ GLRO(dl_sysinfo_dso) = (void *) av->a_un.a_val;
|
||||
+ break;
|
||||
+#endif
|
||||
+ case AT_RANDOM:
|
||||
+ _dl_random = (void *) av->a_un.a_val;
|
||||
+ break;
|
||||
+ case AT_MINSIGSTKSZ:
|
||||
+ GLRO(dl_minsigstacksize) = av->a_un.a_val;
|
||||
+ break;
|
||||
+ DL_PLATFORM_AUXV
|
||||
+ }
|
||||
+
|
||||
+ dl_hwcap_check ();
|
||||
+
|
||||
+#ifndef HAVE_AUX_SECURE
|
||||
+ if (seen != -1)
|
||||
+ {
|
||||
+ /* Fill in the values we have not gotten from the kernel through the
|
||||
+ auxiliary vector. */
|
||||
+# ifndef HAVE_AUX_XID
|
||||
+# define SEE(UID, var, uid) \
|
||||
+ if ((seen & M (AT_##UID)) == 0) var ^= __get##uid ()
|
||||
+ SEE (UID, uid, uid);
|
||||
+ SEE (EUID, uid, euid);
|
||||
+ SEE (GID, gid, gid);
|
||||
+ SEE (EGID, gid, egid);
|
||||
+# endif
|
||||
+
|
||||
+ /* If one of the two pairs of IDs does not match this is a setuid
|
||||
+ or setgid run. */
|
||||
+ __libc_enable_secure = uid | gid;
|
||||
+ }
|
||||
+#endif
|
||||
+
|
||||
+#ifndef HAVE_AUX_PAGESIZE
|
||||
+ if (GLRO(dl_pagesize) == 0)
|
||||
+ GLRO(dl_pagesize) = __getpagesize ();
|
||||
+#endif
|
||||
+
|
||||
+#ifdef NEED_DL_SYSINFO
|
||||
+ if (new_sysinfo != 0)
|
||||
+ {
|
||||
+# ifdef NEED_DL_SYSINFO_DSO
|
||||
+ /* Only set the sysinfo value if we also have the vsyscall DSO. */
|
||||
+ if (GLRO(dl_sysinfo_dso) != 0)
|
||||
+# endif
|
||||
+ GLRO(dl_sysinfo) = new_sysinfo;
|
||||
+ }
|
||||
+#endif
|
||||
+
|
||||
+ __tunables_init (_environ);
|
||||
+
|
||||
+ /* Initialize DSO sorting algorithm after tunables. */
|
||||
+ _dl_sort_maps_init ();
|
||||
+
|
||||
__brk (0); /* Initialize the break. */
|
||||
-}
|
||||
|
||||
-# include <elf/dl-sysdep.c>
|
||||
+#ifdef DL_PLATFORM_INIT
|
||||
+ DL_PLATFORM_INIT;
|
||||
#endif
|
||||
|
||||
+ /* Determine the length of the platform name. */
|
||||
+ if (GLRO(dl_platform) != NULL)
|
||||
+ GLRO(dl_platformlen) = strlen (GLRO(dl_platform));
|
||||
+
|
||||
+ if (__sbrk (0) == _end)
|
||||
+ /* The dynamic linker was run as a program, and so the initial break
|
||||
+ starts just after our bss, at &_end. The malloc in dl-minimal.c
|
||||
+ will consume the rest of this page, so tell the kernel to move the
|
||||
+ break up that far. When the user program examines its break, it
|
||||
+ will see this new value and not clobber our data. */
|
||||
+ __sbrk (GLRO(dl_pagesize)
|
||||
+ - ((_end - (char *) 0) & (GLRO(dl_pagesize) - 1)));
|
||||
+
|
||||
+ /* If this is a SUID program we make sure that FDs 0, 1, and 2 are
|
||||
+ allocated. If necessary we are doing it ourself. If it is not
|
||||
+ possible we stop the program. */
|
||||
+ if (__builtin_expect (__libc_enable_secure, 0))
|
||||
+ __libc_check_standard_fds ();
|
||||
+
|
||||
+ (*dl_main) (phdr, phnum, &user_entry, GLRO(dl_auxv));
|
||||
+ return user_entry;
|
||||
+}
|
||||
+
|
||||
+void
|
||||
+_dl_sysdep_start_cleanup (void)
|
||||
+{
|
||||
+}
|
||||
+
|
||||
+void
|
||||
+_dl_show_auxv (void)
|
||||
+{
|
||||
+ char buf[64];
|
||||
+ ElfW(auxv_t) *av;
|
||||
+
|
||||
+ /* Terminate string. */
|
||||
+ buf[63] = '\0';
|
||||
+
|
||||
+ /* The following code assumes that the AT_* values are encoded
|
||||
+ starting from 0 with AT_NULL, 1 for AT_IGNORE, and all other values
|
||||
+ close by (otherwise the array will be too large). In case we have
|
||||
+ to support a platform where these requirements are not fulfilled
|
||||
+ some alternative implementation has to be used. */
|
||||
+ for (av = GLRO(dl_auxv); av->a_type != AT_NULL; ++av)
|
||||
+ {
|
||||
+ static const struct
|
||||
+ {
|
||||
+ const char label[22];
|
||||
+ enum { unknown = 0, dec, hex, str, ignore } form : 8;
|
||||
+ } auxvars[] =
|
||||
+ {
|
||||
+ [AT_EXECFD - 2] = { "EXECFD: ", dec },
|
||||
+ [AT_EXECFN - 2] = { "EXECFN: ", str },
|
||||
+ [AT_PHDR - 2] = { "PHDR: 0x", hex },
|
||||
+ [AT_PHENT - 2] = { "PHENT: ", dec },
|
||||
+ [AT_PHNUM - 2] = { "PHNUM: ", dec },
|
||||
+ [AT_PAGESZ - 2] = { "PAGESZ: ", dec },
|
||||
+ [AT_BASE - 2] = { "BASE: 0x", hex },
|
||||
+ [AT_FLAGS - 2] = { "FLAGS: 0x", hex },
|
||||
+ [AT_ENTRY - 2] = { "ENTRY: 0x", hex },
|
||||
+ [AT_NOTELF - 2] = { "NOTELF: ", hex },
|
||||
+ [AT_UID - 2] = { "UID: ", dec },
|
||||
+ [AT_EUID - 2] = { "EUID: ", dec },
|
||||
+ [AT_GID - 2] = { "GID: ", dec },
|
||||
+ [AT_EGID - 2] = { "EGID: ", dec },
|
||||
+ [AT_PLATFORM - 2] = { "PLATFORM: ", str },
|
||||
+ [AT_HWCAP - 2] = { "HWCAP: ", hex },
|
||||
+ [AT_CLKTCK - 2] = { "CLKTCK: ", dec },
|
||||
+ [AT_FPUCW - 2] = { "FPUCW: ", hex },
|
||||
+ [AT_DCACHEBSIZE - 2] = { "DCACHEBSIZE: 0x", hex },
|
||||
+ [AT_ICACHEBSIZE - 2] = { "ICACHEBSIZE: 0x", hex },
|
||||
+ [AT_UCACHEBSIZE - 2] = { "UCACHEBSIZE: 0x", hex },
|
||||
+ [AT_IGNOREPPC - 2] = { "IGNOREPPC", ignore },
|
||||
+ [AT_SECURE - 2] = { "SECURE: ", dec },
|
||||
+ [AT_BASE_PLATFORM - 2] = { "BASE_PLATFORM: ", str },
|
||||
+ [AT_SYSINFO - 2] = { "SYSINFO: 0x", hex },
|
||||
+ [AT_SYSINFO_EHDR - 2] = { "SYSINFO_EHDR: 0x", hex },
|
||||
+ [AT_RANDOM - 2] = { "RANDOM: 0x", hex },
|
||||
+ [AT_HWCAP2 - 2] = { "HWCAP2: 0x", hex },
|
||||
+ [AT_MINSIGSTKSZ - 2] = { "MINSIGSTKSZ: ", dec },
|
||||
+ [AT_L1I_CACHESIZE - 2] = { "L1I_CACHESIZE: ", dec },
|
||||
+ [AT_L1I_CACHEGEOMETRY - 2] = { "L1I_CACHEGEOMETRY: 0x", hex },
|
||||
+ [AT_L1D_CACHESIZE - 2] = { "L1D_CACHESIZE: ", dec },
|
||||
+ [AT_L1D_CACHEGEOMETRY - 2] = { "L1D_CACHEGEOMETRY: 0x", hex },
|
||||
+ [AT_L2_CACHESIZE - 2] = { "L2_CACHESIZE: ", dec },
|
||||
+ [AT_L2_CACHEGEOMETRY - 2] = { "L2_CACHEGEOMETRY: 0x", hex },
|
||||
+ [AT_L3_CACHESIZE - 2] = { "L3_CACHESIZE: ", dec },
|
||||
+ [AT_L3_CACHEGEOMETRY - 2] = { "L3_CACHEGEOMETRY: 0x", hex },
|
||||
+ };
|
||||
+ unsigned int idx = (unsigned int) (av->a_type - 2);
|
||||
+
|
||||
+ if ((unsigned int) av->a_type < 2u
|
||||
+ || (idx < sizeof (auxvars) / sizeof (auxvars[0])
|
||||
+ && auxvars[idx].form == ignore))
|
||||
+ continue;
|
||||
+
|
||||
+ assert (AT_NULL == 0);
|
||||
+ assert (AT_IGNORE == 1);
|
||||
+
|
||||
+ /* Some entries are handled in a special way per platform. */
|
||||
+ if (_dl_procinfo (av->a_type, av->a_un.a_val) == 0)
|
||||
+ continue;
|
||||
+
|
||||
+ if (idx < sizeof (auxvars) / sizeof (auxvars[0])
|
||||
+ && auxvars[idx].form != unknown)
|
||||
+ {
|
||||
+ const char *val = (char *) av->a_un.a_val;
|
||||
+
|
||||
+ if (__builtin_expect (auxvars[idx].form, dec) == dec)
|
||||
+ val = _itoa ((unsigned long int) av->a_un.a_val,
|
||||
+ buf + sizeof buf - 1, 10, 0);
|
||||
+ else if (__builtin_expect (auxvars[idx].form, hex) == hex)
|
||||
+ val = _itoa ((unsigned long int) av->a_un.a_val,
|
||||
+ buf + sizeof buf - 1, 16, 0);
|
||||
+
|
||||
+ _dl_printf ("AT_%s%s\n", auxvars[idx].label, val);
|
||||
+
|
||||
+ continue;
|
||||
+ }
|
||||
+
|
||||
+ /* Unknown value: print a generic line. */
|
||||
+ char buf2[17];
|
||||
+ buf2[sizeof (buf2) - 1] = '\0';
|
||||
+ const char *val2 = _itoa ((unsigned long int) av->a_un.a_val,
|
||||
+ buf2 + sizeof buf2 - 1, 16, 0);
|
||||
+ const char *val = _itoa ((unsigned long int) av->a_type,
|
||||
+ buf + sizeof buf - 1, 16, 0);
|
||||
+ _dl_printf ("AT_??? (0x%s): 0x%s\n", val, val2);
|
||||
+ }
|
||||
+}
|
||||
+
|
||||
+#endif /* SHARED */
|
||||
+
|
||||
|
||||
int
|
||||
attribute_hidden
|
120
glibc-upstream-2.34-238.patch
Normal file
120
glibc-upstream-2.34-238.patch
Normal file
@ -0,0 +1,120 @@
|
||||
commit 2139b1848e3e0a960ccc615fe1fd78b5d10b1411
|
||||
Author: Florian Weimer <fweimer@redhat.com>
|
||||
Date: Thu Feb 3 10:58:59 2022 +0100
|
||||
|
||||
Linux: Remove HAVE_AUX_SECURE, HAVE_AUX_XID, HAVE_AUX_PAGESIZE
|
||||
|
||||
They are always defined.
|
||||
|
||||
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
|
||||
(cherry picked from commit b9c3d3382f6f50e9723002deb2dc8127de720fa6)
|
||||
|
||||
diff --git a/sysdeps/unix/sysv/linux/dl-sysdep.c b/sysdeps/unix/sysv/linux/dl-sysdep.c
|
||||
index 3e41469bcc395179..aae983777ba15fae 100644
|
||||
--- a/sysdeps/unix/sysv/linux/dl-sysdep.c
|
||||
+++ b/sysdeps/unix/sysv/linux/dl-sysdep.c
|
||||
@@ -85,21 +85,6 @@ _dl_sysdep_start (void **start_argptr,
|
||||
ElfW(Word) phnum = 0;
|
||||
ElfW(Addr) user_entry;
|
||||
ElfW(auxv_t) *av;
|
||||
-#ifdef HAVE_AUX_SECURE
|
||||
-# define set_seen(tag) (tag) /* Evaluate for the side effects. */
|
||||
-# define set_seen_secure() ((void) 0)
|
||||
-#else
|
||||
- uid_t uid = 0;
|
||||
- gid_t gid = 0;
|
||||
- unsigned int seen = 0;
|
||||
-# define set_seen_secure() (seen = -1)
|
||||
-# ifdef HAVE_AUX_XID
|
||||
-# define set_seen(tag) (tag) /* Evaluate for the side effects. */
|
||||
-# else
|
||||
-# define M(type) (1 << (type))
|
||||
-# define set_seen(tag) seen |= M ((tag)->a_type)
|
||||
-# endif
|
||||
-#endif
|
||||
#ifdef NEED_DL_SYSINFO
|
||||
uintptr_t new_sysinfo = 0;
|
||||
#endif
|
||||
@@ -116,7 +101,7 @@ _dl_sysdep_start (void **start_argptr,
|
||||
"CONSTANT_MINSIGSTKSZ is constant");
|
||||
GLRO(dl_minsigstacksize) = CONSTANT_MINSIGSTKSZ;
|
||||
|
||||
- for (av = GLRO(dl_auxv); av->a_type != AT_NULL; set_seen (av++))
|
||||
+ for (av = GLRO(dl_auxv); av->a_type != AT_NULL; av++)
|
||||
switch (av->a_type)
|
||||
{
|
||||
case AT_PHDR:
|
||||
@@ -131,20 +116,7 @@ _dl_sysdep_start (void **start_argptr,
|
||||
case AT_ENTRY:
|
||||
user_entry = av->a_un.a_val;
|
||||
break;
|
||||
-#ifndef HAVE_AUX_SECURE
|
||||
- case AT_UID:
|
||||
- case AT_EUID:
|
||||
- uid ^= av->a_un.a_val;
|
||||
- break;
|
||||
- case AT_GID:
|
||||
- case AT_EGID:
|
||||
- gid ^= av->a_un.a_val;
|
||||
- break;
|
||||
-#endif
|
||||
case AT_SECURE:
|
||||
-#ifndef HAVE_AUX_SECURE
|
||||
- seen = -1;
|
||||
-#endif
|
||||
__libc_enable_secure = av->a_un.a_val;
|
||||
break;
|
||||
case AT_PLATFORM:
|
||||
@@ -183,31 +155,6 @@ _dl_sysdep_start (void **start_argptr,
|
||||
|
||||
dl_hwcap_check ();
|
||||
|
||||
-#ifndef HAVE_AUX_SECURE
|
||||
- if (seen != -1)
|
||||
- {
|
||||
- /* Fill in the values we have not gotten from the kernel through the
|
||||
- auxiliary vector. */
|
||||
-# ifndef HAVE_AUX_XID
|
||||
-# define SEE(UID, var, uid) \
|
||||
- if ((seen & M (AT_##UID)) == 0) var ^= __get##uid ()
|
||||
- SEE (UID, uid, uid);
|
||||
- SEE (EUID, uid, euid);
|
||||
- SEE (GID, gid, gid);
|
||||
- SEE (EGID, gid, egid);
|
||||
-# endif
|
||||
-
|
||||
- /* If one of the two pairs of IDs does not match this is a setuid
|
||||
- or setgid run. */
|
||||
- __libc_enable_secure = uid | gid;
|
||||
- }
|
||||
-#endif
|
||||
-
|
||||
-#ifndef HAVE_AUX_PAGESIZE
|
||||
- if (GLRO(dl_pagesize) == 0)
|
||||
- GLRO(dl_pagesize) = __getpagesize ();
|
||||
-#endif
|
||||
-
|
||||
#ifdef NEED_DL_SYSINFO
|
||||
if (new_sysinfo != 0)
|
||||
{
|
||||
diff --git a/sysdeps/unix/sysv/linux/ldsodefs.h b/sysdeps/unix/sysv/linux/ldsodefs.h
|
||||
index 7e01f685b03b984d..0f152c592c2a9b04 100644
|
||||
--- a/sysdeps/unix/sysv/linux/ldsodefs.h
|
||||
+++ b/sysdeps/unix/sysv/linux/ldsodefs.h
|
||||
@@ -24,16 +24,4 @@
|
||||
/* Get the real definitions. */
|
||||
#include_next <ldsodefs.h>
|
||||
|
||||
-/* We can assume that the kernel always provides the AT_UID, AT_EUID,
|
||||
- AT_GID, and AT_EGID values in the auxiliary vector from 2.4.0 or so on. */
|
||||
-#define HAVE_AUX_XID
|
||||
-
|
||||
-/* We can assume that the kernel always provides the AT_SECURE value
|
||||
- in the auxiliary vector from 2.5.74 or so on. */
|
||||
-#define HAVE_AUX_SECURE
|
||||
-
|
||||
-/* Starting with one of the 2.4.0 pre-releases the Linux kernel passes
|
||||
- up the page size information. */
|
||||
-#define HAVE_AUX_PAGESIZE
|
||||
-
|
||||
#endif /* ldsodefs.h */
|
55
glibc-upstream-2.34-239.patch
Normal file
55
glibc-upstream-2.34-239.patch
Normal file
@ -0,0 +1,55 @@
|
||||
commit 458733fffe2c410418b5f633ffd6ed65efd2aac0
|
||||
Author: Florian Weimer <fweimer@redhat.com>
|
||||
Date: Thu Feb 3 10:58:59 2022 +0100
|
||||
|
||||
Linux: Remove DL_FIND_ARG_COMPONENTS
|
||||
|
||||
The generic definition is always used since the Native Client
|
||||
port has been removed.
|
||||
|
||||
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
|
||||
(cherry picked from commit 2d47fa68628e831a692cba8fc9050cef435afc5e)
|
||||
|
||||
diff --git a/sysdeps/unix/sysv/linux/dl-sysdep.c b/sysdeps/unix/sysv/linux/dl-sysdep.c
|
||||
index aae983777ba15fae..e36b3e6b63b1aa7e 100644
|
||||
--- a/sysdeps/unix/sysv/linux/dl-sysdep.c
|
||||
+++ b/sysdeps/unix/sysv/linux/dl-sysdep.c
|
||||
@@ -59,19 +59,6 @@ void *__libc_stack_end attribute_relro = NULL;
|
||||
rtld_hidden_data_def(__libc_stack_end)
|
||||
void *_dl_random attribute_relro = NULL;
|
||||
|
||||
-#ifndef DL_FIND_ARG_COMPONENTS
|
||||
-# define DL_FIND_ARG_COMPONENTS(cookie, argc, argv, envp, auxp) \
|
||||
- do { \
|
||||
- void **_tmp; \
|
||||
- (argc) = *(long int *) cookie; \
|
||||
- (argv) = (char **) ((long int *) cookie + 1); \
|
||||
- (envp) = (argv) + (argc) + 1; \
|
||||
- for (_tmp = (void **) (envp); *_tmp; ++_tmp) \
|
||||
- continue; \
|
||||
- (auxp) = (void *) ++_tmp; \
|
||||
- } while (0)
|
||||
-#endif
|
||||
-
|
||||
#ifndef DL_STACK_END
|
||||
# define DL_STACK_END(cookie) ((void *) (cookie))
|
||||
#endif
|
||||
@@ -90,8 +77,16 @@ _dl_sysdep_start (void **start_argptr,
|
||||
#endif
|
||||
|
||||
__libc_stack_end = DL_STACK_END (start_argptr);
|
||||
- DL_FIND_ARG_COMPONENTS (start_argptr, _dl_argc, _dl_argv, _environ,
|
||||
- GLRO(dl_auxv));
|
||||
+ _dl_argc = (intptr_t) *start_argptr;
|
||||
+ _dl_argv = (char **) (start_argptr + 1); /* Necessary aliasing violation. */
|
||||
+ _environ = _dl_argv + _dl_argc + 1;
|
||||
+ for (char **tmp = _environ + 1; ; ++tmp)
|
||||
+ if (*tmp == NULL)
|
||||
+ {
|
||||
+ /* Another necessary aliasing violation. */
|
||||
+ GLRO(dl_auxv) = (ElfW(auxv_t) *) (tmp + 1);
|
||||
+ break;
|
||||
+ }
|
||||
|
||||
user_entry = (ElfW(Addr)) ENTRY_POINT;
|
||||
GLRO(dl_platform) = NULL; /* Default to nothing known about the platform. */
|
70
glibc-upstream-2.34-240.patch
Normal file
70
glibc-upstream-2.34-240.patch
Normal file
@ -0,0 +1,70 @@
|
||||
commit 08728256faf69b159b9ecd64f7f8b734f5f456e4
|
||||
Author: Florian Weimer <fweimer@redhat.com>
|
||||
Date: Thu Feb 3 10:58:59 2022 +0100
|
||||
|
||||
Linux: Assume that NEED_DL_SYSINFO_DSO is always defined
|
||||
|
||||
The definition itself is still needed for generic code.
|
||||
|
||||
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
|
||||
(cherry picked from commit f19fc997a5754a6c0bb9e43618f0597e878061f7)
|
||||
|
||||
diff --git a/sysdeps/unix/sysv/linux/dl-sysdep.c b/sysdeps/unix/sysv/linux/dl-sysdep.c
|
||||
index e36b3e6b63b1aa7e..1829dab4f38b560c 100644
|
||||
--- a/sysdeps/unix/sysv/linux/dl-sysdep.c
|
||||
+++ b/sysdeps/unix/sysv/linux/dl-sysdep.c
|
||||
@@ -134,11 +134,9 @@ _dl_sysdep_start (void **start_argptr,
|
||||
new_sysinfo = av->a_un.a_val;
|
||||
break;
|
||||
#endif
|
||||
-#ifdef NEED_DL_SYSINFO_DSO
|
||||
case AT_SYSINFO_EHDR:
|
||||
GLRO(dl_sysinfo_dso) = (void *) av->a_un.a_val;
|
||||
break;
|
||||
-#endif
|
||||
case AT_RANDOM:
|
||||
_dl_random = (void *) av->a_un.a_val;
|
||||
break;
|
||||
@@ -153,10 +151,8 @@ _dl_sysdep_start (void **start_argptr,
|
||||
#ifdef NEED_DL_SYSINFO
|
||||
if (new_sysinfo != 0)
|
||||
{
|
||||
-# ifdef NEED_DL_SYSINFO_DSO
|
||||
/* Only set the sysinfo value if we also have the vsyscall DSO. */
|
||||
if (GLRO(dl_sysinfo_dso) != 0)
|
||||
-# endif
|
||||
GLRO(dl_sysinfo) = new_sysinfo;
|
||||
}
|
||||
#endif
|
||||
@@ -309,7 +305,7 @@ int
|
||||
attribute_hidden
|
||||
_dl_discover_osversion (void)
|
||||
{
|
||||
-#if defined NEED_DL_SYSINFO_DSO && defined SHARED
|
||||
+#ifdef SHARED
|
||||
if (GLRO(dl_sysinfo_map) != NULL)
|
||||
{
|
||||
/* If the kernel-supplied DSO contains a note indicating the kernel's
|
||||
@@ -340,7 +336,7 @@ _dl_discover_osversion (void)
|
||||
}
|
||||
}
|
||||
}
|
||||
-#endif
|
||||
+#endif /* SHARED */
|
||||
|
||||
char bufmem[64];
|
||||
char *buf = bufmem;
|
||||
diff --git a/sysdeps/unix/sysv/linux/m68k/sysdep.h b/sysdeps/unix/sysv/linux/m68k/sysdep.h
|
||||
index b29986339a7e6cc0..11b93f2fa0af0e71 100644
|
||||
--- a/sysdeps/unix/sysv/linux/m68k/sysdep.h
|
||||
+++ b/sysdeps/unix/sysv/linux/m68k/sysdep.h
|
||||
@@ -301,8 +301,6 @@ SYSCALL_ERROR_LABEL: \
|
||||
#define PTR_MANGLE(var) (void) (var)
|
||||
#define PTR_DEMANGLE(var) (void) (var)
|
||||
|
||||
-#if defined NEED_DL_SYSINFO || defined NEED_DL_SYSINFO_DSO
|
||||
/* M68K needs system-supplied DSO to access TLS helpers
|
||||
even when statically linked. */
|
||||
-# define NEED_STATIC_SYSINFO_DSO 1
|
||||
-#endif
|
||||
+#define NEED_STATIC_SYSINFO_DSO 1
|
410
glibc-upstream-2.34-241.patch
Normal file
410
glibc-upstream-2.34-241.patch
Normal file
@ -0,0 +1,410 @@
|
||||
commit 4b9cd5465d5158dad7b4f0762bc70a3a1209b481
|
||||
Author: Florian Weimer <fweimer@redhat.com>
|
||||
Date: Thu Feb 3 10:58:59 2022 +0100
|
||||
|
||||
Linux: Consolidate auxiliary vector parsing
|
||||
|
||||
And optimize it slightly.
|
||||
|
||||
The large switch statement in _dl_sysdep_start can be replaced with
|
||||
a large array. This reduces source code and binary size. On
|
||||
i686-linux-gnu:
|
||||
|
||||
Before:
|
||||
|
||||
text data bss dec hex filename
|
||||
7791 12 0 7803 1e7b elf/dl-sysdep.os
|
||||
|
||||
After:
|
||||
|
||||
text data bss dec hex filename
|
||||
7135 12 0 7147 1beb elf/dl-sysdep.os
|
||||
|
||||
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
|
||||
(cherry picked from commit 8c8510ab2790039e58995ef3a22309582413d3ff)
|
||||
|
||||
diff --git a/elf/dl-support.c b/elf/dl-support.c
|
||||
index f29dc965f4d10648..40ef07521336857d 100644
|
||||
--- a/elf/dl-support.c
|
||||
+++ b/elf/dl-support.c
|
||||
@@ -241,93 +241,21 @@ __rtld_lock_define_initialized_recursive (, _dl_load_tls_lock)
|
||||
|
||||
|
||||
#ifdef HAVE_AUX_VECTOR
|
||||
+#include <dl-parse_auxv.h>
|
||||
+
|
||||
int _dl_clktck;
|
||||
|
||||
void
|
||||
_dl_aux_init (ElfW(auxv_t) *av)
|
||||
{
|
||||
- int seen = 0;
|
||||
- uid_t uid = 0;
|
||||
- gid_t gid = 0;
|
||||
-
|
||||
#ifdef NEED_DL_SYSINFO
|
||||
/* NB: Avoid RELATIVE relocation in static PIE. */
|
||||
GL(dl_sysinfo) = DL_SYSINFO_DEFAULT;
|
||||
#endif
|
||||
|
||||
_dl_auxv = av;
|
||||
- for (; av->a_type != AT_NULL; ++av)
|
||||
- switch (av->a_type)
|
||||
- {
|
||||
- case AT_PAGESZ:
|
||||
- if (av->a_un.a_val != 0)
|
||||
- GLRO(dl_pagesize) = av->a_un.a_val;
|
||||
- break;
|
||||
- case AT_CLKTCK:
|
||||
- GLRO(dl_clktck) = av->a_un.a_val;
|
||||
- break;
|
||||
- case AT_PHDR:
|
||||
- GL(dl_phdr) = (const void *) av->a_un.a_val;
|
||||
- break;
|
||||
- case AT_PHNUM:
|
||||
- GL(dl_phnum) = av->a_un.a_val;
|
||||
- break;
|
||||
- case AT_PLATFORM:
|
||||
- GLRO(dl_platform) = (void *) av->a_un.a_val;
|
||||
- break;
|
||||
- case AT_HWCAP:
|
||||
- GLRO(dl_hwcap) = (unsigned long int) av->a_un.a_val;
|
||||
- break;
|
||||
- case AT_HWCAP2:
|
||||
- GLRO(dl_hwcap2) = (unsigned long int) av->a_un.a_val;
|
||||
- break;
|
||||
- case AT_FPUCW:
|
||||
- GLRO(dl_fpu_control) = av->a_un.a_val;
|
||||
- break;
|
||||
-#ifdef NEED_DL_SYSINFO
|
||||
- case AT_SYSINFO:
|
||||
- GL(dl_sysinfo) = av->a_un.a_val;
|
||||
- break;
|
||||
-#endif
|
||||
-#ifdef NEED_DL_SYSINFO_DSO
|
||||
- case AT_SYSINFO_EHDR:
|
||||
- GL(dl_sysinfo_dso) = (void *) av->a_un.a_val;
|
||||
- break;
|
||||
-#endif
|
||||
- case AT_UID:
|
||||
- uid ^= av->a_un.a_val;
|
||||
- seen |= 1;
|
||||
- break;
|
||||
- case AT_EUID:
|
||||
- uid ^= av->a_un.a_val;
|
||||
- seen |= 2;
|
||||
- break;
|
||||
- case AT_GID:
|
||||
- gid ^= av->a_un.a_val;
|
||||
- seen |= 4;
|
||||
- break;
|
||||
- case AT_EGID:
|
||||
- gid ^= av->a_un.a_val;
|
||||
- seen |= 8;
|
||||
- break;
|
||||
- case AT_SECURE:
|
||||
- seen = -1;
|
||||
- __libc_enable_secure = av->a_un.a_val;
|
||||
- __libc_enable_secure_decided = 1;
|
||||
- break;
|
||||
- case AT_RANDOM:
|
||||
- _dl_random = (void *) av->a_un.a_val;
|
||||
- break;
|
||||
- case AT_MINSIGSTKSZ:
|
||||
- _dl_minsigstacksize = av->a_un.a_val;
|
||||
- break;
|
||||
- DL_PLATFORM_AUXV
|
||||
- }
|
||||
- if (seen == 0xf)
|
||||
- {
|
||||
- __libc_enable_secure = uid != 0 || gid != 0;
|
||||
- __libc_enable_secure_decided = 1;
|
||||
- }
|
||||
+ dl_parse_auxv_t auxv_values = { 0, };
|
||||
+ _dl_parse_auxv (av, auxv_values);
|
||||
}
|
||||
#endif
|
||||
|
||||
diff --git a/sysdeps/unix/sysv/linux/alpha/dl-auxv.h b/sysdeps/unix/sysv/linux/alpha/dl-auxv.h
|
||||
index 1aa9dca80d189ebe..8c99e776a0af9cef 100644
|
||||
--- a/sysdeps/unix/sysv/linux/alpha/dl-auxv.h
|
||||
+++ b/sysdeps/unix/sysv/linux/alpha/dl-auxv.h
|
||||
@@ -20,16 +20,8 @@
|
||||
|
||||
extern long __libc_alpha_cache_shape[4];
|
||||
|
||||
-#define DL_PLATFORM_AUXV \
|
||||
- case AT_L1I_CACHESHAPE: \
|
||||
- __libc_alpha_cache_shape[0] = av->a_un.a_val; \
|
||||
- break; \
|
||||
- case AT_L1D_CACHESHAPE: \
|
||||
- __libc_alpha_cache_shape[1] = av->a_un.a_val; \
|
||||
- break; \
|
||||
- case AT_L2_CACHESHAPE: \
|
||||
- __libc_alpha_cache_shape[2] = av->a_un.a_val; \
|
||||
- break; \
|
||||
- case AT_L3_CACHESHAPE: \
|
||||
- __libc_alpha_cache_shape[3] = av->a_un.a_val; \
|
||||
- break;
|
||||
+#define DL_PLATFORM_AUXV \
|
||||
+ __libc_alpha_cache_shape[0] = auxv_values[AT_L1I_CACHESHAPE]; \
|
||||
+ __libc_alpha_cache_shape[1] = auxv_values[AT_L1D_CACHESHAPE]; \
|
||||
+ __libc_alpha_cache_shape[2] = auxv_values[AT_L2_CACHESHAPE]; \
|
||||
+ __libc_alpha_cache_shape[3] = auxv_values[AT_L3_CACHESHAPE];
|
||||
diff --git a/sysdeps/unix/sysv/linux/dl-parse_auxv.h b/sysdeps/unix/sysv/linux/dl-parse_auxv.h
|
||||
new file mode 100644
|
||||
index 0000000000000000..b3d82f69946d6d2c
|
||||
--- /dev/null
|
||||
+++ b/sysdeps/unix/sysv/linux/dl-parse_auxv.h
|
||||
@@ -0,0 +1,61 @@
|
||||
+/* Parse the Linux auxiliary vector.
|
||||
+ Copyright (C) 1995-2022 Free Software Foundation, Inc.
|
||||
+ This file is part of the GNU C Library.
|
||||
+
|
||||
+ The GNU C Library is free software; you can redistribute it and/or
|
||||
+ modify it under the terms of the GNU Lesser General Public
|
||||
+ License as published by the Free Software Foundation; either
|
||||
+ version 2.1 of the License, or (at your option) any later version.
|
||||
+
|
||||
+ The GNU C Library is distributed in the hope that it will be useful,
|
||||
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
|
||||
+ Lesser General Public License for more details.
|
||||
+
|
||||
+ You should have received a copy of the GNU Lesser General Public
|
||||
+ License along with the GNU C Library; if not, see
|
||||
+ <https://www.gnu.org/licenses/>. */
|
||||
+
|
||||
+#include <elf.h>
|
||||
+#include <entry.h>
|
||||
+#include <fpu_control.h>
|
||||
+#include <ldsodefs.h>
|
||||
+#include <link.h>
|
||||
+
|
||||
+typedef ElfW(Addr) dl_parse_auxv_t[AT_MINSIGSTKSZ + 1];
|
||||
+
|
||||
+/* Copy the auxiliary vector into AUX_VALUES and set up GLRO
|
||||
+ variables. */
|
||||
+static inline
|
||||
+void _dl_parse_auxv (ElfW(auxv_t) *av, dl_parse_auxv_t auxv_values)
|
||||
+{
|
||||
+ auxv_values[AT_ENTRY] = (ElfW(Addr)) ENTRY_POINT;
|
||||
+ auxv_values[AT_PAGESZ] = EXEC_PAGESIZE;
|
||||
+ auxv_values[AT_FPUCW] = _FPU_DEFAULT;
|
||||
+
|
||||
+ /* NB: Default to a constant CONSTANT_MINSIGSTKSZ. */
|
||||
+ _Static_assert (__builtin_constant_p (CONSTANT_MINSIGSTKSZ),
|
||||
+ "CONSTANT_MINSIGSTKSZ is constant");
|
||||
+ auxv_values[AT_MINSIGSTKSZ] = CONSTANT_MINSIGSTKSZ;
|
||||
+
|
||||
+ for (; av->a_type != AT_NULL; av++)
|
||||
+ if (av->a_type <= AT_MINSIGSTKSZ)
|
||||
+ auxv_values[av->a_type] = av->a_un.a_val;
|
||||
+
|
||||
+ GLRO(dl_pagesize) = auxv_values[AT_PAGESZ];
|
||||
+ __libc_enable_secure = auxv_values[AT_SECURE];
|
||||
+ GLRO(dl_platform) = (void *) auxv_values[AT_PLATFORM];
|
||||
+ GLRO(dl_hwcap) = auxv_values[AT_HWCAP];
|
||||
+ GLRO(dl_hwcap2) = auxv_values[AT_HWCAP2];
|
||||
+ GLRO(dl_clktck) = auxv_values[AT_CLKTCK];
|
||||
+ GLRO(dl_fpu_control) = auxv_values[AT_FPUCW];
|
||||
+ _dl_random = (void *) auxv_values[AT_RANDOM];
|
||||
+ GLRO(dl_minsigstacksize) = auxv_values[AT_MINSIGSTKSZ];
|
||||
+ GLRO(dl_sysinfo_dso) = (void *) auxv_values[AT_SYSINFO_EHDR];
|
||||
+#ifdef NEED_DL_SYSINFO
|
||||
+ if (GLRO(dl_sysinfo_dso) != NULL)
|
||||
+ GLRO(dl_sysinfo) = auxv_values[AT_SYSINFO];
|
||||
+#endif
|
||||
+
|
||||
+ DL_PLATFORM_AUXV
|
||||
+}
|
||||
diff --git a/sysdeps/unix/sysv/linux/dl-sysdep.c b/sysdeps/unix/sysv/linux/dl-sysdep.c
|
||||
index 1829dab4f38b560c..80aa9f6f4acb7e3c 100644
|
||||
--- a/sysdeps/unix/sysv/linux/dl-sysdep.c
|
||||
+++ b/sysdeps/unix/sysv/linux/dl-sysdep.c
|
||||
@@ -21,13 +21,12 @@
|
||||
#include <dl-auxv.h>
|
||||
#include <dl-hwcap-check.h>
|
||||
#include <dl-osinfo.h>
|
||||
+#include <dl-parse_auxv.h>
|
||||
#include <dl-procinfo.h>
|
||||
#include <dl-tunables.h>
|
||||
#include <elf.h>
|
||||
-#include <entry.h>
|
||||
#include <errno.h>
|
||||
#include <fcntl.h>
|
||||
-#include <fpu_control.h>
|
||||
#include <ldsodefs.h>
|
||||
#include <libc-internal.h>
|
||||
#include <libintl.h>
|
||||
@@ -63,24 +62,24 @@ void *_dl_random attribute_relro = NULL;
|
||||
# define DL_STACK_END(cookie) ((void *) (cookie))
|
||||
#endif
|
||||
|
||||
-ElfW(Addr)
|
||||
-_dl_sysdep_start (void **start_argptr,
|
||||
- void (*dl_main) (const ElfW(Phdr) *phdr, ElfW(Word) phnum,
|
||||
- ElfW(Addr) *user_entry, ElfW(auxv_t) *auxv))
|
||||
+/* Arguments passed to dl_main. */
|
||||
+struct dl_main_arguments
|
||||
{
|
||||
- const ElfW(Phdr) *phdr = NULL;
|
||||
- ElfW(Word) phnum = 0;
|
||||
+ const ElfW(Phdr) *phdr;
|
||||
+ ElfW(Word) phnum;
|
||||
ElfW(Addr) user_entry;
|
||||
- ElfW(auxv_t) *av;
|
||||
-#ifdef NEED_DL_SYSINFO
|
||||
- uintptr_t new_sysinfo = 0;
|
||||
-#endif
|
||||
+};
|
||||
|
||||
- __libc_stack_end = DL_STACK_END (start_argptr);
|
||||
+/* Separate function, so that dl_main can be called without the large
|
||||
+ array on the stack. */
|
||||
+static void
|
||||
+_dl_sysdep_parse_arguments (void **start_argptr,
|
||||
+ struct dl_main_arguments *args)
|
||||
+{
|
||||
_dl_argc = (intptr_t) *start_argptr;
|
||||
_dl_argv = (char **) (start_argptr + 1); /* Necessary aliasing violation. */
|
||||
_environ = _dl_argv + _dl_argc + 1;
|
||||
- for (char **tmp = _environ + 1; ; ++tmp)
|
||||
+ for (char **tmp = _environ; ; ++tmp)
|
||||
if (*tmp == NULL)
|
||||
{
|
||||
/* Another necessary aliasing violation. */
|
||||
@@ -88,74 +87,25 @@ _dl_sysdep_start (void **start_argptr,
|
||||
break;
|
||||
}
|
||||
|
||||
- user_entry = (ElfW(Addr)) ENTRY_POINT;
|
||||
- GLRO(dl_platform) = NULL; /* Default to nothing known about the platform. */
|
||||
+ dl_parse_auxv_t auxv_values = { 0, };
|
||||
+ _dl_parse_auxv (GLRO(dl_auxv), auxv_values);
|
||||
|
||||
- /* NB: Default to a constant CONSTANT_MINSIGSTKSZ. */
|
||||
- _Static_assert (__builtin_constant_p (CONSTANT_MINSIGSTKSZ),
|
||||
- "CONSTANT_MINSIGSTKSZ is constant");
|
||||
- GLRO(dl_minsigstacksize) = CONSTANT_MINSIGSTKSZ;
|
||||
+ args->phdr = (const ElfW(Phdr) *) auxv_values[AT_PHDR];
|
||||
+ args->phnum = auxv_values[AT_PHNUM];
|
||||
+ args->user_entry = auxv_values[AT_ENTRY];
|
||||
+}
|
||||
|
||||
- for (av = GLRO(dl_auxv); av->a_type != AT_NULL; av++)
|
||||
- switch (av->a_type)
|
||||
- {
|
||||
- case AT_PHDR:
|
||||
- phdr = (void *) av->a_un.a_val;
|
||||
- break;
|
||||
- case AT_PHNUM:
|
||||
- phnum = av->a_un.a_val;
|
||||
- break;
|
||||
- case AT_PAGESZ:
|
||||
- GLRO(dl_pagesize) = av->a_un.a_val;
|
||||
- break;
|
||||
- case AT_ENTRY:
|
||||
- user_entry = av->a_un.a_val;
|
||||
- break;
|
||||
- case AT_SECURE:
|
||||
- __libc_enable_secure = av->a_un.a_val;
|
||||
- break;
|
||||
- case AT_PLATFORM:
|
||||
- GLRO(dl_platform) = (void *) av->a_un.a_val;
|
||||
- break;
|
||||
- case AT_HWCAP:
|
||||
- GLRO(dl_hwcap) = (unsigned long int) av->a_un.a_val;
|
||||
- break;
|
||||
- case AT_HWCAP2:
|
||||
- GLRO(dl_hwcap2) = (unsigned long int) av->a_un.a_val;
|
||||
- break;
|
||||
- case AT_CLKTCK:
|
||||
- GLRO(dl_clktck) = av->a_un.a_val;
|
||||
- break;
|
||||
- case AT_FPUCW:
|
||||
- GLRO(dl_fpu_control) = av->a_un.a_val;
|
||||
- break;
|
||||
-#ifdef NEED_DL_SYSINFO
|
||||
- case AT_SYSINFO:
|
||||
- new_sysinfo = av->a_un.a_val;
|
||||
- break;
|
||||
-#endif
|
||||
- case AT_SYSINFO_EHDR:
|
||||
- GLRO(dl_sysinfo_dso) = (void *) av->a_un.a_val;
|
||||
- break;
|
||||
- case AT_RANDOM:
|
||||
- _dl_random = (void *) av->a_un.a_val;
|
||||
- break;
|
||||
- case AT_MINSIGSTKSZ:
|
||||
- GLRO(dl_minsigstacksize) = av->a_un.a_val;
|
||||
- break;
|
||||
- DL_PLATFORM_AUXV
|
||||
- }
|
||||
+ElfW(Addr)
|
||||
+_dl_sysdep_start (void **start_argptr,
|
||||
+ void (*dl_main) (const ElfW(Phdr) *phdr, ElfW(Word) phnum,
|
||||
+ ElfW(Addr) *user_entry, ElfW(auxv_t) *auxv))
|
||||
+{
|
||||
+ __libc_stack_end = DL_STACK_END (start_argptr);
|
||||
|
||||
- dl_hwcap_check ();
|
||||
+ struct dl_main_arguments dl_main_args;
|
||||
+ _dl_sysdep_parse_arguments (start_argptr, &dl_main_args);
|
||||
|
||||
-#ifdef NEED_DL_SYSINFO
|
||||
- if (new_sysinfo != 0)
|
||||
- {
|
||||
- /* Only set the sysinfo value if we also have the vsyscall DSO. */
|
||||
- if (GLRO(dl_sysinfo_dso) != 0)
|
||||
- GLRO(dl_sysinfo) = new_sysinfo;
|
||||
- }
|
||||
-#endif
|
||||
+ dl_hwcap_check ();
|
||||
|
||||
__tunables_init (_environ);
|
||||
|
||||
@@ -187,8 +137,9 @@ _dl_sysdep_start (void **start_argptr,
|
||||
if (__builtin_expect (__libc_enable_secure, 0))
|
||||
__libc_check_standard_fds ();
|
||||
|
||||
- (*dl_main) (phdr, phnum, &user_entry, GLRO(dl_auxv));
|
||||
- return user_entry;
|
||||
+ (*dl_main) (dl_main_args.phdr, dl_main_args.phnum,
|
||||
+ &dl_main_args.user_entry, GLRO(dl_auxv));
|
||||
+ return dl_main_args.user_entry;
|
||||
}
|
||||
|
||||
void
|
||||
diff --git a/sysdeps/unix/sysv/linux/powerpc/dl-auxv.h b/sysdeps/unix/sysv/linux/powerpc/dl-auxv.h
|
||||
index 36ba0f3e9e45f3e2..7f35fb531ba22098 100644
|
||||
--- a/sysdeps/unix/sysv/linux/powerpc/dl-auxv.h
|
||||
+++ b/sysdeps/unix/sysv/linux/powerpc/dl-auxv.h
|
||||
@@ -16,15 +16,5 @@
|
||||
License along with the GNU C Library; if not, see
|
||||
<https://www.gnu.org/licenses/>. */
|
||||
|
||||
-#include <ldsodefs.h>
|
||||
-
|
||||
-#if IS_IN (libc) && !defined SHARED
|
||||
-int GLRO(dl_cache_line_size);
|
||||
-#endif
|
||||
-
|
||||
-/* Scan the Aux Vector for the "Data Cache Block Size" entry and assign it
|
||||
- to dl_cache_line_size. */
|
||||
-#define DL_PLATFORM_AUXV \
|
||||
- case AT_DCACHEBSIZE: \
|
||||
- GLRO(dl_cache_line_size) = av->a_un.a_val; \
|
||||
- break;
|
||||
+#define DL_PLATFORM_AUXV \
|
||||
+ GLRO(dl_cache_line_size) = auxv_values[AT_DCACHEBSIZE];
|
||||
diff --git a/sysdeps/unix/sysv/linux/powerpc/dl-support.c b/sysdeps/unix/sysv/linux/powerpc/dl-support.c
|
||||
new file mode 100644
|
||||
index 0000000000000000..abe68a704946b90f
|
||||
--- /dev/null
|
||||
+++ b/sysdeps/unix/sysv/linux/powerpc/dl-support.c
|
||||
@@ -0,0 +1,4 @@
|
||||
+#include <elf/dl-support.c>
|
||||
+
|
||||
+/* Populated from the auxiliary vector. */
|
||||
+int _dl_cache_line_size;
|
399
glibc-upstream-2.34-242.patch
Normal file
399
glibc-upstream-2.34-242.patch
Normal file
@ -0,0 +1,399 @@
|
||||
commit 1cc4ddfeebdb68e0b6de7e4878eef94d3438706f
|
||||
Author: Florian Weimer <fweimer@redhat.com>
|
||||
Date: Fri Feb 11 16:01:19 2022 +0100
|
||||
|
||||
Revert "Linux: Consolidate auxiliary vector parsing"
|
||||
|
||||
This reverts commit 8c8510ab2790039e58995ef3a22309582413d3ff. The
|
||||
revert is not perfect because the commit included a bug fix for
|
||||
_dl_sysdep_start with an empty argv, introduced in commit
|
||||
2d47fa68628e831a692cba8fc9050cef435afc5e ("Linux: Remove
|
||||
DL_FIND_ARG_COMPONENTS"), and this bug fix is kept.
|
||||
|
||||
The revert is necessary because the reverted commit introduced an
|
||||
early memset call on aarch64, which leads to crash due to lack of TCB
|
||||
initialization.
|
||||
|
||||
(cherry picked from commit d96d2995c1121d3310102afda2deb1f35761b5e6)
|
||||
|
||||
diff --git a/elf/dl-support.c b/elf/dl-support.c
|
||||
index 40ef07521336857d..f29dc965f4d10648 100644
|
||||
--- a/elf/dl-support.c
|
||||
+++ b/elf/dl-support.c
|
||||
@@ -241,21 +241,93 @@ __rtld_lock_define_initialized_recursive (, _dl_load_tls_lock)
|
||||
|
||||
|
||||
#ifdef HAVE_AUX_VECTOR
|
||||
-#include <dl-parse_auxv.h>
|
||||
-
|
||||
int _dl_clktck;
|
||||
|
||||
void
|
||||
_dl_aux_init (ElfW(auxv_t) *av)
|
||||
{
|
||||
+ int seen = 0;
|
||||
+ uid_t uid = 0;
|
||||
+ gid_t gid = 0;
|
||||
+
|
||||
#ifdef NEED_DL_SYSINFO
|
||||
/* NB: Avoid RELATIVE relocation in static PIE. */
|
||||
GL(dl_sysinfo) = DL_SYSINFO_DEFAULT;
|
||||
#endif
|
||||
|
||||
_dl_auxv = av;
|
||||
- dl_parse_auxv_t auxv_values = { 0, };
|
||||
- _dl_parse_auxv (av, auxv_values);
|
||||
+ for (; av->a_type != AT_NULL; ++av)
|
||||
+ switch (av->a_type)
|
||||
+ {
|
||||
+ case AT_PAGESZ:
|
||||
+ if (av->a_un.a_val != 0)
|
||||
+ GLRO(dl_pagesize) = av->a_un.a_val;
|
||||
+ break;
|
||||
+ case AT_CLKTCK:
|
||||
+ GLRO(dl_clktck) = av->a_un.a_val;
|
||||
+ break;
|
||||
+ case AT_PHDR:
|
||||
+ GL(dl_phdr) = (const void *) av->a_un.a_val;
|
||||
+ break;
|
||||
+ case AT_PHNUM:
|
||||
+ GL(dl_phnum) = av->a_un.a_val;
|
||||
+ break;
|
||||
+ case AT_PLATFORM:
|
||||
+ GLRO(dl_platform) = (void *) av->a_un.a_val;
|
||||
+ break;
|
||||
+ case AT_HWCAP:
|
||||
+ GLRO(dl_hwcap) = (unsigned long int) av->a_un.a_val;
|
||||
+ break;
|
||||
+ case AT_HWCAP2:
|
||||
+ GLRO(dl_hwcap2) = (unsigned long int) av->a_un.a_val;
|
||||
+ break;
|
||||
+ case AT_FPUCW:
|
||||
+ GLRO(dl_fpu_control) = av->a_un.a_val;
|
||||
+ break;
|
||||
+#ifdef NEED_DL_SYSINFO
|
||||
+ case AT_SYSINFO:
|
||||
+ GL(dl_sysinfo) = av->a_un.a_val;
|
||||
+ break;
|
||||
+#endif
|
||||
+#ifdef NEED_DL_SYSINFO_DSO
|
||||
+ case AT_SYSINFO_EHDR:
|
||||
+ GL(dl_sysinfo_dso) = (void *) av->a_un.a_val;
|
||||
+ break;
|
||||
+#endif
|
||||
+ case AT_UID:
|
||||
+ uid ^= av->a_un.a_val;
|
||||
+ seen |= 1;
|
||||
+ break;
|
||||
+ case AT_EUID:
|
||||
+ uid ^= av->a_un.a_val;
|
||||
+ seen |= 2;
|
||||
+ break;
|
||||
+ case AT_GID:
|
||||
+ gid ^= av->a_un.a_val;
|
||||
+ seen |= 4;
|
||||
+ break;
|
||||
+ case AT_EGID:
|
||||
+ gid ^= av->a_un.a_val;
|
||||
+ seen |= 8;
|
||||
+ break;
|
||||
+ case AT_SECURE:
|
||||
+ seen = -1;
|
||||
+ __libc_enable_secure = av->a_un.a_val;
|
||||
+ __libc_enable_secure_decided = 1;
|
||||
+ break;
|
||||
+ case AT_RANDOM:
|
||||
+ _dl_random = (void *) av->a_un.a_val;
|
||||
+ break;
|
||||
+ case AT_MINSIGSTKSZ:
|
||||
+ _dl_minsigstacksize = av->a_un.a_val;
|
||||
+ break;
|
||||
+ DL_PLATFORM_AUXV
|
||||
+ }
|
||||
+ if (seen == 0xf)
|
||||
+ {
|
||||
+ __libc_enable_secure = uid != 0 || gid != 0;
|
||||
+ __libc_enable_secure_decided = 1;
|
||||
+ }
|
||||
}
|
||||
#endif
|
||||
|
||||
diff --git a/sysdeps/unix/sysv/linux/alpha/dl-auxv.h b/sysdeps/unix/sysv/linux/alpha/dl-auxv.h
|
||||
index 8c99e776a0af9cef..1aa9dca80d189ebe 100644
|
||||
--- a/sysdeps/unix/sysv/linux/alpha/dl-auxv.h
|
||||
+++ b/sysdeps/unix/sysv/linux/alpha/dl-auxv.h
|
||||
@@ -20,8 +20,16 @@
|
||||
|
||||
extern long __libc_alpha_cache_shape[4];
|
||||
|
||||
-#define DL_PLATFORM_AUXV \
|
||||
- __libc_alpha_cache_shape[0] = auxv_values[AT_L1I_CACHESHAPE]; \
|
||||
- __libc_alpha_cache_shape[1] = auxv_values[AT_L1D_CACHESHAPE]; \
|
||||
- __libc_alpha_cache_shape[2] = auxv_values[AT_L2_CACHESHAPE]; \
|
||||
- __libc_alpha_cache_shape[3] = auxv_values[AT_L3_CACHESHAPE];
|
||||
+#define DL_PLATFORM_AUXV \
|
||||
+ case AT_L1I_CACHESHAPE: \
|
||||
+ __libc_alpha_cache_shape[0] = av->a_un.a_val; \
|
||||
+ break; \
|
||||
+ case AT_L1D_CACHESHAPE: \
|
||||
+ __libc_alpha_cache_shape[1] = av->a_un.a_val; \
|
||||
+ break; \
|
||||
+ case AT_L2_CACHESHAPE: \
|
||||
+ __libc_alpha_cache_shape[2] = av->a_un.a_val; \
|
||||
+ break; \
|
||||
+ case AT_L3_CACHESHAPE: \
|
||||
+ __libc_alpha_cache_shape[3] = av->a_un.a_val; \
|
||||
+ break;
|
||||
diff --git a/sysdeps/unix/sysv/linux/dl-parse_auxv.h b/sysdeps/unix/sysv/linux/dl-parse_auxv.h
|
||||
deleted file mode 100644
|
||||
index b3d82f69946d6d2c..0000000000000000
|
||||
--- a/sysdeps/unix/sysv/linux/dl-parse_auxv.h
|
||||
+++ /dev/null
|
||||
@@ -1,61 +0,0 @@
|
||||
-/* Parse the Linux auxiliary vector.
|
||||
- Copyright (C) 1995-2022 Free Software Foundation, Inc.
|
||||
- This file is part of the GNU C Library.
|
||||
-
|
||||
- The GNU C Library is free software; you can redistribute it and/or
|
||||
- modify it under the terms of the GNU Lesser General Public
|
||||
- License as published by the Free Software Foundation; either
|
||||
- version 2.1 of the License, or (at your option) any later version.
|
||||
-
|
||||
- The GNU C Library is distributed in the hope that it will be useful,
|
||||
- but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||
- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
|
||||
- Lesser General Public License for more details.
|
||||
-
|
||||
- You should have received a copy of the GNU Lesser General Public
|
||||
- License along with the GNU C Library; if not, see
|
||||
- <https://www.gnu.org/licenses/>. */
|
||||
-
|
||||
-#include <elf.h>
|
||||
-#include <entry.h>
|
||||
-#include <fpu_control.h>
|
||||
-#include <ldsodefs.h>
|
||||
-#include <link.h>
|
||||
-
|
||||
-typedef ElfW(Addr) dl_parse_auxv_t[AT_MINSIGSTKSZ + 1];
|
||||
-
|
||||
-/* Copy the auxiliary vector into AUX_VALUES and set up GLRO
|
||||
- variables. */
|
||||
-static inline
|
||||
-void _dl_parse_auxv (ElfW(auxv_t) *av, dl_parse_auxv_t auxv_values)
|
||||
-{
|
||||
- auxv_values[AT_ENTRY] = (ElfW(Addr)) ENTRY_POINT;
|
||||
- auxv_values[AT_PAGESZ] = EXEC_PAGESIZE;
|
||||
- auxv_values[AT_FPUCW] = _FPU_DEFAULT;
|
||||
-
|
||||
- /* NB: Default to a constant CONSTANT_MINSIGSTKSZ. */
|
||||
- _Static_assert (__builtin_constant_p (CONSTANT_MINSIGSTKSZ),
|
||||
- "CONSTANT_MINSIGSTKSZ is constant");
|
||||
- auxv_values[AT_MINSIGSTKSZ] = CONSTANT_MINSIGSTKSZ;
|
||||
-
|
||||
- for (; av->a_type != AT_NULL; av++)
|
||||
- if (av->a_type <= AT_MINSIGSTKSZ)
|
||||
- auxv_values[av->a_type] = av->a_un.a_val;
|
||||
-
|
||||
- GLRO(dl_pagesize) = auxv_values[AT_PAGESZ];
|
||||
- __libc_enable_secure = auxv_values[AT_SECURE];
|
||||
- GLRO(dl_platform) = (void *) auxv_values[AT_PLATFORM];
|
||||
- GLRO(dl_hwcap) = auxv_values[AT_HWCAP];
|
||||
- GLRO(dl_hwcap2) = auxv_values[AT_HWCAP2];
|
||||
- GLRO(dl_clktck) = auxv_values[AT_CLKTCK];
|
||||
- GLRO(dl_fpu_control) = auxv_values[AT_FPUCW];
|
||||
- _dl_random = (void *) auxv_values[AT_RANDOM];
|
||||
- GLRO(dl_minsigstacksize) = auxv_values[AT_MINSIGSTKSZ];
|
||||
- GLRO(dl_sysinfo_dso) = (void *) auxv_values[AT_SYSINFO_EHDR];
|
||||
-#ifdef NEED_DL_SYSINFO
|
||||
- if (GLRO(dl_sysinfo_dso) != NULL)
|
||||
- GLRO(dl_sysinfo) = auxv_values[AT_SYSINFO];
|
||||
-#endif
|
||||
-
|
||||
- DL_PLATFORM_AUXV
|
||||
-}
|
||||
diff --git a/sysdeps/unix/sysv/linux/dl-sysdep.c b/sysdeps/unix/sysv/linux/dl-sysdep.c
|
||||
index 80aa9f6f4acb7e3c..facaaba3b9d091b3 100644
|
||||
--- a/sysdeps/unix/sysv/linux/dl-sysdep.c
|
||||
+++ b/sysdeps/unix/sysv/linux/dl-sysdep.c
|
||||
@@ -21,12 +21,13 @@
|
||||
#include <dl-auxv.h>
|
||||
#include <dl-hwcap-check.h>
|
||||
#include <dl-osinfo.h>
|
||||
-#include <dl-parse_auxv.h>
|
||||
#include <dl-procinfo.h>
|
||||
#include <dl-tunables.h>
|
||||
#include <elf.h>
|
||||
+#include <entry.h>
|
||||
#include <errno.h>
|
||||
#include <fcntl.h>
|
||||
+#include <fpu_control.h>
|
||||
#include <ldsodefs.h>
|
||||
#include <libc-internal.h>
|
||||
#include <libintl.h>
|
||||
@@ -62,20 +63,20 @@ void *_dl_random attribute_relro = NULL;
|
||||
# define DL_STACK_END(cookie) ((void *) (cookie))
|
||||
#endif
|
||||
|
||||
-/* Arguments passed to dl_main. */
|
||||
-struct dl_main_arguments
|
||||
+ElfW(Addr)
|
||||
+_dl_sysdep_start (void **start_argptr,
|
||||
+ void (*dl_main) (const ElfW(Phdr) *phdr, ElfW(Word) phnum,
|
||||
+ ElfW(Addr) *user_entry, ElfW(auxv_t) *auxv))
|
||||
{
|
||||
- const ElfW(Phdr) *phdr;
|
||||
- ElfW(Word) phnum;
|
||||
+ const ElfW(Phdr) *phdr = NULL;
|
||||
+ ElfW(Word) phnum = 0;
|
||||
ElfW(Addr) user_entry;
|
||||
-};
|
||||
+ ElfW(auxv_t) *av;
|
||||
+#ifdef NEED_DL_SYSINFO
|
||||
+ uintptr_t new_sysinfo = 0;
|
||||
+#endif
|
||||
|
||||
-/* Separate function, so that dl_main can be called without the large
|
||||
- array on the stack. */
|
||||
-static void
|
||||
-_dl_sysdep_parse_arguments (void **start_argptr,
|
||||
- struct dl_main_arguments *args)
|
||||
-{
|
||||
+ __libc_stack_end = DL_STACK_END (start_argptr);
|
||||
_dl_argc = (intptr_t) *start_argptr;
|
||||
_dl_argv = (char **) (start_argptr + 1); /* Necessary aliasing violation. */
|
||||
_environ = _dl_argv + _dl_argc + 1;
|
||||
@@ -87,26 +88,75 @@ _dl_sysdep_parse_arguments (void **start_argptr,
|
||||
break;
|
||||
}
|
||||
|
||||
- dl_parse_auxv_t auxv_values = { 0, };
|
||||
- _dl_parse_auxv (GLRO(dl_auxv), auxv_values);
|
||||
+ user_entry = (ElfW(Addr)) ENTRY_POINT;
|
||||
+ GLRO(dl_platform) = NULL; /* Default to nothing known about the platform. */
|
||||
|
||||
- args->phdr = (const ElfW(Phdr) *) auxv_values[AT_PHDR];
|
||||
- args->phnum = auxv_values[AT_PHNUM];
|
||||
- args->user_entry = auxv_values[AT_ENTRY];
|
||||
-}
|
||||
+ /* NB: Default to a constant CONSTANT_MINSIGSTKSZ. */
|
||||
+ _Static_assert (__builtin_constant_p (CONSTANT_MINSIGSTKSZ),
|
||||
+ "CONSTANT_MINSIGSTKSZ is constant");
|
||||
+ GLRO(dl_minsigstacksize) = CONSTANT_MINSIGSTKSZ;
|
||||
|
||||
-ElfW(Addr)
|
||||
-_dl_sysdep_start (void **start_argptr,
|
||||
- void (*dl_main) (const ElfW(Phdr) *phdr, ElfW(Word) phnum,
|
||||
- ElfW(Addr) *user_entry, ElfW(auxv_t) *auxv))
|
||||
-{
|
||||
- __libc_stack_end = DL_STACK_END (start_argptr);
|
||||
-
|
||||
- struct dl_main_arguments dl_main_args;
|
||||
- _dl_sysdep_parse_arguments (start_argptr, &dl_main_args);
|
||||
+ for (av = GLRO(dl_auxv); av->a_type != AT_NULL; av++)
|
||||
+ switch (av->a_type)
|
||||
+ {
|
||||
+ case AT_PHDR:
|
||||
+ phdr = (void *) av->a_un.a_val;
|
||||
+ break;
|
||||
+ case AT_PHNUM:
|
||||
+ phnum = av->a_un.a_val;
|
||||
+ break;
|
||||
+ case AT_PAGESZ:
|
||||
+ GLRO(dl_pagesize) = av->a_un.a_val;
|
||||
+ break;
|
||||
+ case AT_ENTRY:
|
||||
+ user_entry = av->a_un.a_val;
|
||||
+ break;
|
||||
+ case AT_SECURE:
|
||||
+ __libc_enable_secure = av->a_un.a_val;
|
||||
+ break;
|
||||
+ case AT_PLATFORM:
|
||||
+ GLRO(dl_platform) = (void *) av->a_un.a_val;
|
||||
+ break;
|
||||
+ case AT_HWCAP:
|
||||
+ GLRO(dl_hwcap) = (unsigned long int) av->a_un.a_val;
|
||||
+ break;
|
||||
+ case AT_HWCAP2:
|
||||
+ GLRO(dl_hwcap2) = (unsigned long int) av->a_un.a_val;
|
||||
+ break;
|
||||
+ case AT_CLKTCK:
|
||||
+ GLRO(dl_clktck) = av->a_un.a_val;
|
||||
+ break;
|
||||
+ case AT_FPUCW:
|
||||
+ GLRO(dl_fpu_control) = av->a_un.a_val;
|
||||
+ break;
|
||||
+#ifdef NEED_DL_SYSINFO
|
||||
+ case AT_SYSINFO:
|
||||
+ new_sysinfo = av->a_un.a_val;
|
||||
+ break;
|
||||
+#endif
|
||||
+ case AT_SYSINFO_EHDR:
|
||||
+ GLRO(dl_sysinfo_dso) = (void *) av->a_un.a_val;
|
||||
+ break;
|
||||
+ case AT_RANDOM:
|
||||
+ _dl_random = (void *) av->a_un.a_val;
|
||||
+ break;
|
||||
+ case AT_MINSIGSTKSZ:
|
||||
+ GLRO(dl_minsigstacksize) = av->a_un.a_val;
|
||||
+ break;
|
||||
+ DL_PLATFORM_AUXV
|
||||
+ }
|
||||
|
||||
dl_hwcap_check ();
|
||||
|
||||
+#ifdef NEED_DL_SYSINFO
|
||||
+ if (new_sysinfo != 0)
|
||||
+ {
|
||||
+ /* Only set the sysinfo value if we also have the vsyscall DSO. */
|
||||
+ if (GLRO(dl_sysinfo_dso) != 0)
|
||||
+ GLRO(dl_sysinfo) = new_sysinfo;
|
||||
+ }
|
||||
+#endif
|
||||
+
|
||||
__tunables_init (_environ);
|
||||
|
||||
/* Initialize DSO sorting algorithm after tunables. */
|
||||
@@ -137,9 +187,8 @@ _dl_sysdep_start (void **start_argptr,
|
||||
if (__builtin_expect (__libc_enable_secure, 0))
|
||||
__libc_check_standard_fds ();
|
||||
|
||||
- (*dl_main) (dl_main_args.phdr, dl_main_args.phnum,
|
||||
- &dl_main_args.user_entry, GLRO(dl_auxv));
|
||||
- return dl_main_args.user_entry;
|
||||
+ (*dl_main) (phdr, phnum, &user_entry, GLRO(dl_auxv));
|
||||
+ return user_entry;
|
||||
}
|
||||
|
||||
void
|
||||
diff --git a/sysdeps/unix/sysv/linux/powerpc/dl-auxv.h b/sysdeps/unix/sysv/linux/powerpc/dl-auxv.h
|
||||
index 7f35fb531ba22098..36ba0f3e9e45f3e2 100644
|
||||
--- a/sysdeps/unix/sysv/linux/powerpc/dl-auxv.h
|
||||
+++ b/sysdeps/unix/sysv/linux/powerpc/dl-auxv.h
|
||||
@@ -16,5 +16,15 @@
|
||||
License along with the GNU C Library; if not, see
|
||||
<https://www.gnu.org/licenses/>. */
|
||||
|
||||
-#define DL_PLATFORM_AUXV \
|
||||
- GLRO(dl_cache_line_size) = auxv_values[AT_DCACHEBSIZE];
|
||||
+#include <ldsodefs.h>
|
||||
+
|
||||
+#if IS_IN (libc) && !defined SHARED
|
||||
+int GLRO(dl_cache_line_size);
|
||||
+#endif
|
||||
+
|
||||
+/* Scan the Aux Vector for the "Data Cache Block Size" entry and assign it
|
||||
+ to dl_cache_line_size. */
|
||||
+#define DL_PLATFORM_AUXV \
|
||||
+ case AT_DCACHEBSIZE: \
|
||||
+ GLRO(dl_cache_line_size) = av->a_un.a_val; \
|
||||
+ break;
|
||||
diff --git a/sysdeps/unix/sysv/linux/powerpc/dl-support.c b/sysdeps/unix/sysv/linux/powerpc/dl-support.c
|
||||
deleted file mode 100644
|
||||
index abe68a704946b90f..0000000000000000
|
||||
--- a/sysdeps/unix/sysv/linux/powerpc/dl-support.c
|
||||
+++ /dev/null
|
||||
@@ -1,4 +0,0 @@
|
||||
-#include <elf/dl-support.c>
|
||||
-
|
||||
-/* Populated from the auxiliary vector. */
|
||||
-int _dl_cache_line_size;
|
36
glibc-upstream-2.34-243.patch
Normal file
36
glibc-upstream-2.34-243.patch
Normal file
@ -0,0 +1,36 @@
|
||||
commit 28bdb03b1b2bdb2d2dc62a9beeaa7d9bd2b10679
|
||||
Author: Florian Weimer <fweimer@redhat.com>
|
||||
Date: Fri Feb 11 19:03:04 2022 +0100
|
||||
|
||||
Linux: Include <dl-auxv.h> in dl-sysdep.c only for SHARED
|
||||
|
||||
Otherwise, <dl-auxv.h> on POWER ends up being included twice,
|
||||
once in dl-sysdep.c, once in dl-support.c. That leads to a linker
|
||||
failure due to multiple definitions of _dl_cache_line_size.
|
||||
|
||||
Fixes commit d96d2995c1121d3310102afda2deb1f35761b5e6
|
||||
("Revert "Linux: Consolidate auxiliary vector parsing").
|
||||
|
||||
(cherry picked from commit 098c795e85fbd05c5ef59c2d0ce59529331bea27)
|
||||
|
||||
diff --git a/sysdeps/unix/sysv/linux/dl-sysdep.c b/sysdeps/unix/sysv/linux/dl-sysdep.c
|
||||
index facaaba3b9d091b3..3487976b06ad7f58 100644
|
||||
--- a/sysdeps/unix/sysv/linux/dl-sysdep.c
|
||||
+++ b/sysdeps/unix/sysv/linux/dl-sysdep.c
|
||||
@@ -18,7 +18,6 @@
|
||||
|
||||
#include <_itoa.h>
|
||||
#include <assert.h>
|
||||
-#include <dl-auxv.h>
|
||||
#include <dl-hwcap-check.h>
|
||||
#include <dl-osinfo.h>
|
||||
#include <dl-procinfo.h>
|
||||
@@ -46,6 +45,8 @@
|
||||
#include <dl-machine.h>
|
||||
|
||||
#ifdef SHARED
|
||||
+# include <dl-auxv.h>
|
||||
+
|
||||
extern char **_environ attribute_hidden;
|
||||
extern char _end[] attribute_hidden;
|
||||
|
439
glibc-upstream-2.34-244.patch
Normal file
439
glibc-upstream-2.34-244.patch
Normal file
@ -0,0 +1,439 @@
|
||||
commit ff900fad89df7fa12750c018993a12cc02474646
|
||||
Author: Florian Weimer <fweimer@redhat.com>
|
||||
Date: Mon Feb 28 11:50:41 2022 +0100
|
||||
|
||||
Linux: Consolidate auxiliary vector parsing (redo)
|
||||
|
||||
And optimize it slightly.
|
||||
|
||||
This is commit 8c8510ab2790039e58995ef3a22309582413d3ff revised.
|
||||
|
||||
In _dl_aux_init in elf/dl-support.c, use an explicit loop
|
||||
and -fno-tree-loop-distribute-patterns to avoid memset.
|
||||
|
||||
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
|
||||
(cherry picked from commit 73fc4e28b9464f0e13edc719a5372839970e7ddb)
|
||||
|
||||
diff --git a/elf/Makefile b/elf/Makefile
|
||||
index c89a6a58690646ee..6423ebbdd7708a14 100644
|
||||
--- a/elf/Makefile
|
||||
+++ b/elf/Makefile
|
||||
@@ -148,6 +148,11 @@ ifeq (yes,$(have-loop-to-function))
|
||||
CFLAGS-rtld.c += -fno-tree-loop-distribute-patterns
|
||||
endif
|
||||
|
||||
+ifeq (yes,$(have-loop-to-function))
|
||||
+# Likewise, during static library startup, memset is not yet available.
|
||||
+CFLAGS-dl-support.c = -fno-tree-loop-distribute-patterns
|
||||
+endif
|
||||
+
|
||||
# Compile rtld itself without stack protection.
|
||||
# Also compile all routines in the static library that are elided from
|
||||
# the shared libc because they are in libc.a in the same way.
|
||||
diff --git a/elf/dl-support.c b/elf/dl-support.c
|
||||
index f29dc965f4d10648..a2e45e7b14e3a6b9 100644
|
||||
--- a/elf/dl-support.c
|
||||
+++ b/elf/dl-support.c
|
||||
@@ -43,6 +43,7 @@
|
||||
#include <dl-vdso.h>
|
||||
#include <dl-vdso-setup.h>
|
||||
#include <dl-auxv.h>
|
||||
+#include <array_length.h>
|
||||
|
||||
extern char *__progname;
|
||||
char **_dl_argv = &__progname; /* This is checked for some error messages. */
|
||||
@@ -241,93 +242,25 @@ __rtld_lock_define_initialized_recursive (, _dl_load_tls_lock)
|
||||
|
||||
|
||||
#ifdef HAVE_AUX_VECTOR
|
||||
+#include <dl-parse_auxv.h>
|
||||
+
|
||||
int _dl_clktck;
|
||||
|
||||
void
|
||||
_dl_aux_init (ElfW(auxv_t) *av)
|
||||
{
|
||||
- int seen = 0;
|
||||
- uid_t uid = 0;
|
||||
- gid_t gid = 0;
|
||||
-
|
||||
#ifdef NEED_DL_SYSINFO
|
||||
/* NB: Avoid RELATIVE relocation in static PIE. */
|
||||
GL(dl_sysinfo) = DL_SYSINFO_DEFAULT;
|
||||
#endif
|
||||
|
||||
_dl_auxv = av;
|
||||
- for (; av->a_type != AT_NULL; ++av)
|
||||
- switch (av->a_type)
|
||||
- {
|
||||
- case AT_PAGESZ:
|
||||
- if (av->a_un.a_val != 0)
|
||||
- GLRO(dl_pagesize) = av->a_un.a_val;
|
||||
- break;
|
||||
- case AT_CLKTCK:
|
||||
- GLRO(dl_clktck) = av->a_un.a_val;
|
||||
- break;
|
||||
- case AT_PHDR:
|
||||
- GL(dl_phdr) = (const void *) av->a_un.a_val;
|
||||
- break;
|
||||
- case AT_PHNUM:
|
||||
- GL(dl_phnum) = av->a_un.a_val;
|
||||
- break;
|
||||
- case AT_PLATFORM:
|
||||
- GLRO(dl_platform) = (void *) av->a_un.a_val;
|
||||
- break;
|
||||
- case AT_HWCAP:
|
||||
- GLRO(dl_hwcap) = (unsigned long int) av->a_un.a_val;
|
||||
- break;
|
||||
- case AT_HWCAP2:
|
||||
- GLRO(dl_hwcap2) = (unsigned long int) av->a_un.a_val;
|
||||
- break;
|
||||
- case AT_FPUCW:
|
||||
- GLRO(dl_fpu_control) = av->a_un.a_val;
|
||||
- break;
|
||||
-#ifdef NEED_DL_SYSINFO
|
||||
- case AT_SYSINFO:
|
||||
- GL(dl_sysinfo) = av->a_un.a_val;
|
||||
- break;
|
||||
-#endif
|
||||
-#ifdef NEED_DL_SYSINFO_DSO
|
||||
- case AT_SYSINFO_EHDR:
|
||||
- GL(dl_sysinfo_dso) = (void *) av->a_un.a_val;
|
||||
- break;
|
||||
-#endif
|
||||
- case AT_UID:
|
||||
- uid ^= av->a_un.a_val;
|
||||
- seen |= 1;
|
||||
- break;
|
||||
- case AT_EUID:
|
||||
- uid ^= av->a_un.a_val;
|
||||
- seen |= 2;
|
||||
- break;
|
||||
- case AT_GID:
|
||||
- gid ^= av->a_un.a_val;
|
||||
- seen |= 4;
|
||||
- break;
|
||||
- case AT_EGID:
|
||||
- gid ^= av->a_un.a_val;
|
||||
- seen |= 8;
|
||||
- break;
|
||||
- case AT_SECURE:
|
||||
- seen = -1;
|
||||
- __libc_enable_secure = av->a_un.a_val;
|
||||
- __libc_enable_secure_decided = 1;
|
||||
- break;
|
||||
- case AT_RANDOM:
|
||||
- _dl_random = (void *) av->a_un.a_val;
|
||||
- break;
|
||||
- case AT_MINSIGSTKSZ:
|
||||
- _dl_minsigstacksize = av->a_un.a_val;
|
||||
- break;
|
||||
- DL_PLATFORM_AUXV
|
||||
- }
|
||||
- if (seen == 0xf)
|
||||
- {
|
||||
- __libc_enable_secure = uid != 0 || gid != 0;
|
||||
- __libc_enable_secure_decided = 1;
|
||||
- }
|
||||
+ dl_parse_auxv_t auxv_values;
|
||||
+ /* Use an explicit initialization loop here because memset may not
|
||||
+ be available yet. */
|
||||
+ for (int i = 0; i < array_length (auxv_values); ++i)
|
||||
+ auxv_values[i] = 0;
|
||||
+ _dl_parse_auxv (av, auxv_values);
|
||||
}
|
||||
#endif
|
||||
|
||||
diff --git a/sysdeps/unix/sysv/linux/alpha/dl-auxv.h b/sysdeps/unix/sysv/linux/alpha/dl-auxv.h
|
||||
index 1aa9dca80d189ebe..8c99e776a0af9cef 100644
|
||||
--- a/sysdeps/unix/sysv/linux/alpha/dl-auxv.h
|
||||
+++ b/sysdeps/unix/sysv/linux/alpha/dl-auxv.h
|
||||
@@ -20,16 +20,8 @@
|
||||
|
||||
extern long __libc_alpha_cache_shape[4];
|
||||
|
||||
-#define DL_PLATFORM_AUXV \
|
||||
- case AT_L1I_CACHESHAPE: \
|
||||
- __libc_alpha_cache_shape[0] = av->a_un.a_val; \
|
||||
- break; \
|
||||
- case AT_L1D_CACHESHAPE: \
|
||||
- __libc_alpha_cache_shape[1] = av->a_un.a_val; \
|
||||
- break; \
|
||||
- case AT_L2_CACHESHAPE: \
|
||||
- __libc_alpha_cache_shape[2] = av->a_un.a_val; \
|
||||
- break; \
|
||||
- case AT_L3_CACHESHAPE: \
|
||||
- __libc_alpha_cache_shape[3] = av->a_un.a_val; \
|
||||
- break;
|
||||
+#define DL_PLATFORM_AUXV \
|
||||
+ __libc_alpha_cache_shape[0] = auxv_values[AT_L1I_CACHESHAPE]; \
|
||||
+ __libc_alpha_cache_shape[1] = auxv_values[AT_L1D_CACHESHAPE]; \
|
||||
+ __libc_alpha_cache_shape[2] = auxv_values[AT_L2_CACHESHAPE]; \
|
||||
+ __libc_alpha_cache_shape[3] = auxv_values[AT_L3_CACHESHAPE];
|
||||
diff --git a/sysdeps/unix/sysv/linux/dl-parse_auxv.h b/sysdeps/unix/sysv/linux/dl-parse_auxv.h
|
||||
new file mode 100644
|
||||
index 0000000000000000..bf9374371eb217fc
|
||||
--- /dev/null
|
||||
+++ b/sysdeps/unix/sysv/linux/dl-parse_auxv.h
|
||||
@@ -0,0 +1,61 @@
|
||||
+/* Parse the Linux auxiliary vector.
|
||||
+ Copyright (C) 1995-2022 Free Software Foundation, Inc.
|
||||
+ This file is part of the GNU C Library.
|
||||
+
|
||||
+ The GNU C Library is free software; you can redistribute it and/or
|
||||
+ modify it under the terms of the GNU Lesser General Public
|
||||
+ License as published by the Free Software Foundation; either
|
||||
+ version 2.1 of the License, or (at your option) any later version.
|
||||
+
|
||||
+ The GNU C Library is distributed in the hope that it will be useful,
|
||||
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
|
||||
+ Lesser General Public License for more details.
|
||||
+
|
||||
+ You should have received a copy of the GNU Lesser General Public
|
||||
+ License along with the GNU C Library; if not, see
|
||||
+ <https://www.gnu.org/licenses/>. */
|
||||
+
|
||||
+#include <elf.h>
|
||||
+#include <entry.h>
|
||||
+#include <fpu_control.h>
|
||||
+#include <ldsodefs.h>
|
||||
+#include <link.h>
|
||||
+
|
||||
+typedef ElfW(Addr) dl_parse_auxv_t[AT_MINSIGSTKSZ + 1];
|
||||
+
|
||||
+/* Copy the auxiliary vector into AUXV_VALUES and set up GLRO
|
||||
+ variables. */
|
||||
+static inline
|
||||
+void _dl_parse_auxv (ElfW(auxv_t) *av, dl_parse_auxv_t auxv_values)
|
||||
+{
|
||||
+ auxv_values[AT_ENTRY] = (ElfW(Addr)) ENTRY_POINT;
|
||||
+ auxv_values[AT_PAGESZ] = EXEC_PAGESIZE;
|
||||
+ auxv_values[AT_FPUCW] = _FPU_DEFAULT;
|
||||
+
|
||||
+ /* NB: Default to a constant CONSTANT_MINSIGSTKSZ. */
|
||||
+ _Static_assert (__builtin_constant_p (CONSTANT_MINSIGSTKSZ),
|
||||
+ "CONSTANT_MINSIGSTKSZ is constant");
|
||||
+ auxv_values[AT_MINSIGSTKSZ] = CONSTANT_MINSIGSTKSZ;
|
||||
+
|
||||
+ for (; av->a_type != AT_NULL; av++)
|
||||
+ if (av->a_type <= AT_MINSIGSTKSZ)
|
||||
+ auxv_values[av->a_type] = av->a_un.a_val;
|
||||
+
|
||||
+ GLRO(dl_pagesize) = auxv_values[AT_PAGESZ];
|
||||
+ __libc_enable_secure = auxv_values[AT_SECURE];
|
||||
+ GLRO(dl_platform) = (void *) auxv_values[AT_PLATFORM];
|
||||
+ GLRO(dl_hwcap) = auxv_values[AT_HWCAP];
|
||||
+ GLRO(dl_hwcap2) = auxv_values[AT_HWCAP2];
|
||||
+ GLRO(dl_clktck) = auxv_values[AT_CLKTCK];
|
||||
+ GLRO(dl_fpu_control) = auxv_values[AT_FPUCW];
|
||||
+ _dl_random = (void *) auxv_values[AT_RANDOM];
|
||||
+ GLRO(dl_minsigstacksize) = auxv_values[AT_MINSIGSTKSZ];
|
||||
+ GLRO(dl_sysinfo_dso) = (void *) auxv_values[AT_SYSINFO_EHDR];
|
||||
+#ifdef NEED_DL_SYSINFO
|
||||
+ if (GLRO(dl_sysinfo_dso) != NULL)
|
||||
+ GLRO(dl_sysinfo) = auxv_values[AT_SYSINFO];
|
||||
+#endif
|
||||
+
|
||||
+ DL_PLATFORM_AUXV
|
||||
+}
|
||||
diff --git a/sysdeps/unix/sysv/linux/dl-sysdep.c b/sysdeps/unix/sysv/linux/dl-sysdep.c
|
||||
index 3487976b06ad7f58..56db828fc6985de6 100644
|
||||
--- a/sysdeps/unix/sysv/linux/dl-sysdep.c
|
||||
+++ b/sysdeps/unix/sysv/linux/dl-sysdep.c
|
||||
@@ -18,15 +18,14 @@
|
||||
|
||||
#include <_itoa.h>
|
||||
#include <assert.h>
|
||||
-#include <dl-hwcap-check.h>
|
||||
+#include <dl-auxv.h>
|
||||
#include <dl-osinfo.h>
|
||||
+#include <dl-parse_auxv.h>
|
||||
#include <dl-procinfo.h>
|
||||
#include <dl-tunables.h>
|
||||
#include <elf.h>
|
||||
-#include <entry.h>
|
||||
#include <errno.h>
|
||||
#include <fcntl.h>
|
||||
-#include <fpu_control.h>
|
||||
#include <ldsodefs.h>
|
||||
#include <libc-internal.h>
|
||||
#include <libintl.h>
|
||||
@@ -43,10 +42,9 @@
|
||||
#include <unistd.h>
|
||||
|
||||
#include <dl-machine.h>
|
||||
+#include <dl-hwcap-check.h>
|
||||
|
||||
#ifdef SHARED
|
||||
-# include <dl-auxv.h>
|
||||
-
|
||||
extern char **_environ attribute_hidden;
|
||||
extern char _end[] attribute_hidden;
|
||||
|
||||
@@ -64,20 +62,20 @@ void *_dl_random attribute_relro = NULL;
|
||||
# define DL_STACK_END(cookie) ((void *) (cookie))
|
||||
#endif
|
||||
|
||||
-ElfW(Addr)
|
||||
-_dl_sysdep_start (void **start_argptr,
|
||||
- void (*dl_main) (const ElfW(Phdr) *phdr, ElfW(Word) phnum,
|
||||
- ElfW(Addr) *user_entry, ElfW(auxv_t) *auxv))
|
||||
+/* Arguments passed to dl_main. */
|
||||
+struct dl_main_arguments
|
||||
{
|
||||
- const ElfW(Phdr) *phdr = NULL;
|
||||
- ElfW(Word) phnum = 0;
|
||||
+ const ElfW(Phdr) *phdr;
|
||||
+ ElfW(Word) phnum;
|
||||
ElfW(Addr) user_entry;
|
||||
- ElfW(auxv_t) *av;
|
||||
-#ifdef NEED_DL_SYSINFO
|
||||
- uintptr_t new_sysinfo = 0;
|
||||
-#endif
|
||||
+};
|
||||
|
||||
- __libc_stack_end = DL_STACK_END (start_argptr);
|
||||
+/* Separate function, so that dl_main can be called without the large
|
||||
+ array on the stack. */
|
||||
+static void
|
||||
+_dl_sysdep_parse_arguments (void **start_argptr,
|
||||
+ struct dl_main_arguments *args)
|
||||
+{
|
||||
_dl_argc = (intptr_t) *start_argptr;
|
||||
_dl_argv = (char **) (start_argptr + 1); /* Necessary aliasing violation. */
|
||||
_environ = _dl_argv + _dl_argc + 1;
|
||||
@@ -89,74 +87,25 @@ _dl_sysdep_start (void **start_argptr,
|
||||
break;
|
||||
}
|
||||
|
||||
- user_entry = (ElfW(Addr)) ENTRY_POINT;
|
||||
- GLRO(dl_platform) = NULL; /* Default to nothing known about the platform. */
|
||||
+ dl_parse_auxv_t auxv_values = { 0, };
|
||||
+ _dl_parse_auxv (GLRO(dl_auxv), auxv_values);
|
||||
|
||||
- /* NB: Default to a constant CONSTANT_MINSIGSTKSZ. */
|
||||
- _Static_assert (__builtin_constant_p (CONSTANT_MINSIGSTKSZ),
|
||||
- "CONSTANT_MINSIGSTKSZ is constant");
|
||||
- GLRO(dl_minsigstacksize) = CONSTANT_MINSIGSTKSZ;
|
||||
+ args->phdr = (const ElfW(Phdr) *) auxv_values[AT_PHDR];
|
||||
+ args->phnum = auxv_values[AT_PHNUM];
|
||||
+ args->user_entry = auxv_values[AT_ENTRY];
|
||||
+}
|
||||
|
||||
- for (av = GLRO(dl_auxv); av->a_type != AT_NULL; av++)
|
||||
- switch (av->a_type)
|
||||
- {
|
||||
- case AT_PHDR:
|
||||
- phdr = (void *) av->a_un.a_val;
|
||||
- break;
|
||||
- case AT_PHNUM:
|
||||
- phnum = av->a_un.a_val;
|
||||
- break;
|
||||
- case AT_PAGESZ:
|
||||
- GLRO(dl_pagesize) = av->a_un.a_val;
|
||||
- break;
|
||||
- case AT_ENTRY:
|
||||
- user_entry = av->a_un.a_val;
|
||||
- break;
|
||||
- case AT_SECURE:
|
||||
- __libc_enable_secure = av->a_un.a_val;
|
||||
- break;
|
||||
- case AT_PLATFORM:
|
||||
- GLRO(dl_platform) = (void *) av->a_un.a_val;
|
||||
- break;
|
||||
- case AT_HWCAP:
|
||||
- GLRO(dl_hwcap) = (unsigned long int) av->a_un.a_val;
|
||||
- break;
|
||||
- case AT_HWCAP2:
|
||||
- GLRO(dl_hwcap2) = (unsigned long int) av->a_un.a_val;
|
||||
- break;
|
||||
- case AT_CLKTCK:
|
||||
- GLRO(dl_clktck) = av->a_un.a_val;
|
||||
- break;
|
||||
- case AT_FPUCW:
|
||||
- GLRO(dl_fpu_control) = av->a_un.a_val;
|
||||
- break;
|
||||
-#ifdef NEED_DL_SYSINFO
|
||||
- case AT_SYSINFO:
|
||||
- new_sysinfo = av->a_un.a_val;
|
||||
- break;
|
||||
-#endif
|
||||
- case AT_SYSINFO_EHDR:
|
||||
- GLRO(dl_sysinfo_dso) = (void *) av->a_un.a_val;
|
||||
- break;
|
||||
- case AT_RANDOM:
|
||||
- _dl_random = (void *) av->a_un.a_val;
|
||||
- break;
|
||||
- case AT_MINSIGSTKSZ:
|
||||
- GLRO(dl_minsigstacksize) = av->a_un.a_val;
|
||||
- break;
|
||||
- DL_PLATFORM_AUXV
|
||||
- }
|
||||
+ElfW(Addr)
|
||||
+_dl_sysdep_start (void **start_argptr,
|
||||
+ void (*dl_main) (const ElfW(Phdr) *phdr, ElfW(Word) phnum,
|
||||
+ ElfW(Addr) *user_entry, ElfW(auxv_t) *auxv))
|
||||
+{
|
||||
+ __libc_stack_end = DL_STACK_END (start_argptr);
|
||||
|
||||
- dl_hwcap_check ();
|
||||
+ struct dl_main_arguments dl_main_args;
|
||||
+ _dl_sysdep_parse_arguments (start_argptr, &dl_main_args);
|
||||
|
||||
-#ifdef NEED_DL_SYSINFO
|
||||
- if (new_sysinfo != 0)
|
||||
- {
|
||||
- /* Only set the sysinfo value if we also have the vsyscall DSO. */
|
||||
- if (GLRO(dl_sysinfo_dso) != 0)
|
||||
- GLRO(dl_sysinfo) = new_sysinfo;
|
||||
- }
|
||||
-#endif
|
||||
+ dl_hwcap_check ();
|
||||
|
||||
__tunables_init (_environ);
|
||||
|
||||
@@ -188,8 +137,9 @@ _dl_sysdep_start (void **start_argptr,
|
||||
if (__builtin_expect (__libc_enable_secure, 0))
|
||||
__libc_check_standard_fds ();
|
||||
|
||||
- (*dl_main) (phdr, phnum, &user_entry, GLRO(dl_auxv));
|
||||
- return user_entry;
|
||||
+ (*dl_main) (dl_main_args.phdr, dl_main_args.phnum,
|
||||
+ &dl_main_args.user_entry, GLRO(dl_auxv));
|
||||
+ return dl_main_args.user_entry;
|
||||
}
|
||||
|
||||
void
|
||||
diff --git a/sysdeps/unix/sysv/linux/powerpc/dl-auxv.h b/sysdeps/unix/sysv/linux/powerpc/dl-auxv.h
|
||||
index 36ba0f3e9e45f3e2..7f35fb531ba22098 100644
|
||||
--- a/sysdeps/unix/sysv/linux/powerpc/dl-auxv.h
|
||||
+++ b/sysdeps/unix/sysv/linux/powerpc/dl-auxv.h
|
||||
@@ -16,15 +16,5 @@
|
||||
License along with the GNU C Library; if not, see
|
||||
<https://www.gnu.org/licenses/>. */
|
||||
|
||||
-#include <ldsodefs.h>
|
||||
-
|
||||
-#if IS_IN (libc) && !defined SHARED
|
||||
-int GLRO(dl_cache_line_size);
|
||||
-#endif
|
||||
-
|
||||
-/* Scan the Aux Vector for the "Data Cache Block Size" entry and assign it
|
||||
- to dl_cache_line_size. */
|
||||
-#define DL_PLATFORM_AUXV \
|
||||
- case AT_DCACHEBSIZE: \
|
||||
- GLRO(dl_cache_line_size) = av->a_un.a_val; \
|
||||
- break;
|
||||
+#define DL_PLATFORM_AUXV \
|
||||
+ GLRO(dl_cache_line_size) = auxv_values[AT_DCACHEBSIZE];
|
||||
diff --git a/sysdeps/unix/sysv/linux/powerpc/dl-support.c b/sysdeps/unix/sysv/linux/powerpc/dl-support.c
|
||||
new file mode 100644
|
||||
index 0000000000000000..abe68a704946b90f
|
||||
--- /dev/null
|
||||
+++ b/sysdeps/unix/sysv/linux/powerpc/dl-support.c
|
||||
@@ -0,0 +1,4 @@
|
||||
+#include <elf/dl-support.c>
|
||||
+
|
||||
+/* Populated from the auxiliary vector. */
|
||||
+int _dl_cache_line_size;
|
197
glibc-upstream-2.34-245.patch
Normal file
197
glibc-upstream-2.34-245.patch
Normal file
@ -0,0 +1,197 @@
|
||||
commit be9240c84c67de44959905a829141576965a0588
|
||||
Author: Fangrui Song <maskray@google.com>
|
||||
Date: Tue Apr 19 15:52:27 2022 -0700
|
||||
|
||||
elf: Remove __libc_init_secure
|
||||
|
||||
After 73fc4e28b9464f0e13edc719a5372839970e7ddb,
|
||||
__libc_enable_secure_decided is always 0 and a statically linked
|
||||
executable may overwrite __libc_enable_secure without considering
|
||||
AT_SECURE.
|
||||
|
||||
The __libc_enable_secure has been correctly initialized in _dl_aux_init,
|
||||
so just remove __libc_enable_secure_decided and __libc_init_secure.
|
||||
This allows us to remove some startup_get*id functions from
|
||||
22b79ed7f413cd980a7af0cf258da5bf82b6d5e5.
|
||||
|
||||
Reviewed-by: Florian Weimer <fweimer@redhat.com>
|
||||
(cherry picked from commit 3e9acce8c50883b6cd8a3fb653363d9fa21e1608)
|
||||
|
||||
diff --git a/csu/libc-start.c b/csu/libc-start.c
|
||||
index d01e57ea59ceb880..a2fc2f6f9665a48f 100644
|
||||
--- a/csu/libc-start.c
|
||||
+++ b/csu/libc-start.c
|
||||
@@ -285,9 +285,6 @@ LIBC_START_MAIN (int (*main) (int, char **, char ** MAIN_AUXVEC_DECL),
|
||||
}
|
||||
}
|
||||
|
||||
- /* Initialize very early so that tunables can use it. */
|
||||
- __libc_init_secure ();
|
||||
-
|
||||
__tunables_init (__environ);
|
||||
|
||||
ARCH_INIT_CPU_FEATURES ();
|
||||
diff --git a/elf/enbl-secure.c b/elf/enbl-secure.c
|
||||
index 9e47526bd3e444e1..1208610bd0670c74 100644
|
||||
--- a/elf/enbl-secure.c
|
||||
+++ b/elf/enbl-secure.c
|
||||
@@ -26,15 +26,5 @@
|
||||
#include <startup.h>
|
||||
#include <libc-internal.h>
|
||||
|
||||
-/* If nonzero __libc_enable_secure is already set. */
|
||||
-int __libc_enable_secure_decided;
|
||||
/* Safest assumption, if somehow the initializer isn't run. */
|
||||
int __libc_enable_secure = 1;
|
||||
-
|
||||
-void
|
||||
-__libc_init_secure (void)
|
||||
-{
|
||||
- if (__libc_enable_secure_decided == 0)
|
||||
- __libc_enable_secure = (startup_geteuid () != startup_getuid ()
|
||||
- || startup_getegid () != startup_getgid ());
|
||||
-}
|
||||
diff --git a/include/libc-internal.h b/include/libc-internal.h
|
||||
index 749dfb919ce4a62d..44fcb6bdf8751c1c 100644
|
||||
--- a/include/libc-internal.h
|
||||
+++ b/include/libc-internal.h
|
||||
@@ -21,9 +21,6 @@
|
||||
|
||||
#include <hp-timing.h>
|
||||
|
||||
-/* Initialize the `__libc_enable_secure' flag. */
|
||||
-extern void __libc_init_secure (void);
|
||||
-
|
||||
/* Discover the tick frequency of the machine if something goes wrong,
|
||||
we return 0, an impossible hertz. */
|
||||
extern int __profile_frequency (void);
|
||||
diff --git a/include/unistd.h b/include/unistd.h
|
||||
index 7849562c4272e2c9..5824485629793ccb 100644
|
||||
--- a/include/unistd.h
|
||||
+++ b/include/unistd.h
|
||||
@@ -180,7 +180,6 @@ libc_hidden_proto (__sbrk)
|
||||
and some functions contained in the C library ignore various
|
||||
environment variables that normally affect them. */
|
||||
extern int __libc_enable_secure attribute_relro;
|
||||
-extern int __libc_enable_secure_decided;
|
||||
rtld_hidden_proto (__libc_enable_secure)
|
||||
|
||||
|
||||
diff --git a/sysdeps/generic/startup.h b/sysdeps/generic/startup.h
|
||||
index 04f20cde474cea89..c3be5430bd8bbaa6 100644
|
||||
--- a/sysdeps/generic/startup.h
|
||||
+++ b/sysdeps/generic/startup.h
|
||||
@@ -23,27 +23,3 @@
|
||||
|
||||
/* Use macro instead of inline function to avoid including <stdio.h>. */
|
||||
#define _startup_fatal(message) __libc_fatal ((message))
|
||||
-
|
||||
-static inline uid_t
|
||||
-startup_getuid (void)
|
||||
-{
|
||||
- return __getuid ();
|
||||
-}
|
||||
-
|
||||
-static inline uid_t
|
||||
-startup_geteuid (void)
|
||||
-{
|
||||
- return __geteuid ();
|
||||
-}
|
||||
-
|
||||
-static inline gid_t
|
||||
-startup_getgid (void)
|
||||
-{
|
||||
- return __getgid ();
|
||||
-}
|
||||
-
|
||||
-static inline gid_t
|
||||
-startup_getegid (void)
|
||||
-{
|
||||
- return __getegid ();
|
||||
-}
|
||||
diff --git a/sysdeps/mach/hurd/enbl-secure.c b/sysdeps/mach/hurd/enbl-secure.c
|
||||
deleted file mode 100644
|
||||
index 3e9a6b888d56754b..0000000000000000
|
||||
--- a/sysdeps/mach/hurd/enbl-secure.c
|
||||
+++ /dev/null
|
||||
@@ -1,30 +0,0 @@
|
||||
-/* Define and initialize the `__libc_enable_secure' flag. Hurd version.
|
||||
- Copyright (C) 1998-2021 Free Software Foundation, Inc.
|
||||
- This file is part of the GNU C Library.
|
||||
-
|
||||
- The GNU C Library is free software; you can redistribute it and/or
|
||||
- modify it under the terms of the GNU Lesser General Public
|
||||
- License as published by the Free Software Foundation; either
|
||||
- version 2.1 of the License, or (at your option) any later version.
|
||||
-
|
||||
- The GNU C Library is distributed in the hope that it will be useful,
|
||||
- but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||
- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
|
||||
- Lesser General Public License for more details.
|
||||
-
|
||||
- You should have received a copy of the GNU Lesser General Public
|
||||
- License along with the GNU C Library; if not, see
|
||||
- <https://www.gnu.org/licenses/>. */
|
||||
-
|
||||
-/* There is no need for this file in the Hurd; it is just a placeholder
|
||||
- to prevent inclusion of the sysdeps/generic version.
|
||||
- In the shared library, the `__libc_enable_secure' variable is defined
|
||||
- by the dynamic linker in dl-sysdep.c and set there.
|
||||
- In the static library, it is defined in init-first.c and set there. */
|
||||
-
|
||||
-#include <libc-internal.h>
|
||||
-
|
||||
-void
|
||||
-__libc_init_secure (void)
|
||||
-{
|
||||
-}
|
||||
diff --git a/sysdeps/mach/hurd/i386/init-first.c b/sysdeps/mach/hurd/i386/init-first.c
|
||||
index a430aae085527163..4dc9017ec8754a1a 100644
|
||||
--- a/sysdeps/mach/hurd/i386/init-first.c
|
||||
+++ b/sysdeps/mach/hurd/i386/init-first.c
|
||||
@@ -38,10 +38,6 @@ extern void __init_misc (int, char **, char **);
|
||||
unsigned long int __hurd_threadvar_stack_offset;
|
||||
unsigned long int __hurd_threadvar_stack_mask;
|
||||
|
||||
-#ifndef SHARED
|
||||
-int __libc_enable_secure;
|
||||
-#endif
|
||||
-
|
||||
extern int __libc_argc attribute_hidden;
|
||||
extern char **__libc_argv attribute_hidden;
|
||||
extern char **_dl_argv;
|
||||
diff --git a/sysdeps/unix/sysv/linux/i386/startup.h b/sysdeps/unix/sysv/linux/i386/startup.h
|
||||
index dee7a4f1d3d420be..192c765361c17ed1 100644
|
||||
--- a/sysdeps/unix/sysv/linux/i386/startup.h
|
||||
+++ b/sysdeps/unix/sysv/linux/i386/startup.h
|
||||
@@ -32,30 +32,6 @@ _startup_fatal (const char *message __attribute__ ((unused)))
|
||||
ABORT_INSTRUCTION;
|
||||
__builtin_unreachable ();
|
||||
}
|
||||
-
|
||||
-static inline uid_t
|
||||
-startup_getuid (void)
|
||||
-{
|
||||
- return (uid_t) INTERNAL_SYSCALL_CALL (getuid32);
|
||||
-}
|
||||
-
|
||||
-static inline uid_t
|
||||
-startup_geteuid (void)
|
||||
-{
|
||||
- return (uid_t) INTERNAL_SYSCALL_CALL (geteuid32);
|
||||
-}
|
||||
-
|
||||
-static inline gid_t
|
||||
-startup_getgid (void)
|
||||
-{
|
||||
- return (gid_t) INTERNAL_SYSCALL_CALL (getgid32);
|
||||
-}
|
||||
-
|
||||
-static inline gid_t
|
||||
-startup_getegid (void)
|
||||
-{
|
||||
- return (gid_t) INTERNAL_SYSCALL_CALL (getegid32);
|
||||
-}
|
||||
#else
|
||||
# include_next <startup.h>
|
||||
#endif
|
31
glibc-upstream-2.34-246.patch
Normal file
31
glibc-upstream-2.34-246.patch
Normal file
@ -0,0 +1,31 @@
|
||||
commit 1e7b011f87c653ad109b34e675f64e7a5cc3805a
|
||||
Author: Florian Weimer <fweimer@redhat.com>
|
||||
Date: Wed May 4 15:37:21 2022 +0200
|
||||
|
||||
i386: Remove OPTIMIZE_FOR_GCC_5 from Linux libc-do-syscall.S
|
||||
|
||||
After commit a78e6a10d0b50d0ca80309775980fc99944b1727
|
||||
("i386: Remove broken CAN_USE_REGISTER_ASM_EBP (bug 28771)"),
|
||||
it is never defined.
|
||||
|
||||
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
|
||||
(cherry picked from commit 6e5c7a1e262961adb52443ab91bd2c9b72316402)
|
||||
|
||||
diff --git a/sysdeps/unix/sysv/linux/i386/libc-do-syscall.S b/sysdeps/unix/sysv/linux/i386/libc-do-syscall.S
|
||||
index c95f297d6f0217ef..404435f0123b23b3 100644
|
||||
--- a/sysdeps/unix/sysv/linux/i386/libc-do-syscall.S
|
||||
+++ b/sysdeps/unix/sysv/linux/i386/libc-do-syscall.S
|
||||
@@ -18,8 +18,6 @@
|
||||
|
||||
#include <sysdep.h>
|
||||
|
||||
-#ifndef OPTIMIZE_FOR_GCC_5
|
||||
-
|
||||
/* %eax, %ecx, %edx and %esi contain the values expected by the kernel.
|
||||
%edi points to a structure with the values of %ebx, %edi and %ebp. */
|
||||
|
||||
@@ -50,4 +48,3 @@ ENTRY (__libc_do_syscall)
|
||||
cfi_restore (ebx)
|
||||
ret
|
||||
END (__libc_do_syscall)
|
||||
-#endif
|
94
glibc-upstream-2.34-247.patch
Normal file
94
glibc-upstream-2.34-247.patch
Normal file
@ -0,0 +1,94 @@
|
||||
commit 1a5b9d1a231ae788aac3520dab07dc856e404c69
|
||||
Author: Florian Weimer <fweimer@redhat.com>
|
||||
Date: Wed May 4 15:37:21 2022 +0200
|
||||
|
||||
i386: Honor I386_USE_SYSENTER for 6-argument Linux system calls
|
||||
|
||||
Introduce an int-80h-based version of __libc_do_syscall and use
|
||||
it if I386_USE_SYSENTER is defined as 0.
|
||||
|
||||
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
|
||||
(cherry picked from commit 60f0f2130d30cfd008ca39743027f1e200592dff)
|
||||
|
||||
diff --git a/sysdeps/unix/sysv/linux/i386/Makefile b/sysdeps/unix/sysv/linux/i386/Makefile
|
||||
index abd0009d58f06303..e379a2e767d96322 100644
|
||||
--- a/sysdeps/unix/sysv/linux/i386/Makefile
|
||||
+++ b/sysdeps/unix/sysv/linux/i386/Makefile
|
||||
@@ -14,7 +14,7 @@ install-bin += lddlibc4
|
||||
endif
|
||||
|
||||
ifeq ($(subdir),io)
|
||||
-sysdep_routines += libc-do-syscall
|
||||
+sysdep_routines += libc-do-syscall libc-do-syscall-int80
|
||||
endif
|
||||
|
||||
ifeq ($(subdir),stdlib)
|
||||
diff --git a/sysdeps/unix/sysv/linux/i386/libc-do-syscall-int80.S b/sysdeps/unix/sysv/linux/i386/libc-do-syscall-int80.S
|
||||
new file mode 100644
|
||||
index 0000000000000000..2c472f255734b357
|
||||
--- /dev/null
|
||||
+++ b/sysdeps/unix/sysv/linux/i386/libc-do-syscall-int80.S
|
||||
@@ -0,0 +1,25 @@
|
||||
+/* Out-of-line syscall stub for six-argument syscalls from C. For static PIE.
|
||||
+ Copyright (C) 2022 Free Software Foundation, Inc.
|
||||
+ This file is part of the GNU C Library.
|
||||
+
|
||||
+ The GNU C Library is free software; you can redistribute it and/or
|
||||
+ modify it under the terms of the GNU Lesser General Public
|
||||
+ License as published by the Free Software Foundation; either
|
||||
+ version 2.1 of the License, or (at your option) any later version.
|
||||
+
|
||||
+ The GNU C Library is distributed in the hope that it will be useful,
|
||||
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
|
||||
+ Lesser General Public License for more details.
|
||||
+
|
||||
+ You should have received a copy of the GNU Lesser General Public
|
||||
+ License along with the GNU C Library; if not, see
|
||||
+ <https://www.gnu.org/licenses/>. */
|
||||
+
|
||||
+#ifndef SHARED
|
||||
+# define I386_USE_SYSENTER 0
|
||||
+# include <sysdep.h>
|
||||
+
|
||||
+# define __libc_do_syscall __libc_do_syscall_int80
|
||||
+# include "libc-do-syscall.S"
|
||||
+#endif
|
||||
diff --git a/sysdeps/unix/sysv/linux/i386/sysdep.h b/sysdeps/unix/sysv/linux/i386/sysdep.h
|
||||
index 39d6a3c13427abb5..4c6358c7fe43fe0b 100644
|
||||
--- a/sysdeps/unix/sysv/linux/i386/sysdep.h
|
||||
+++ b/sysdeps/unix/sysv/linux/i386/sysdep.h
|
||||
@@ -43,6 +43,15 @@
|
||||
# endif
|
||||
#endif
|
||||
|
||||
+#if !I386_USE_SYSENTER && IS_IN (libc) && !defined SHARED
|
||||
+/* Inside static libc, we have two versions. For compilation units
|
||||
+ with !I386_USE_SYSENTER, the vDSO entry mechanism cannot be
|
||||
+ used. */
|
||||
+# define I386_DO_SYSCALL_STRING "__libc_do_syscall_int80"
|
||||
+#else
|
||||
+# define I386_DO_SYSCALL_STRING "__libc_do_syscall"
|
||||
+#endif
|
||||
+
|
||||
#ifdef __ASSEMBLER__
|
||||
|
||||
/* Linux uses a negative return value to indicate syscall errors,
|
||||
@@ -302,7 +311,7 @@ struct libc_do_syscall_args
|
||||
}; \
|
||||
asm volatile ( \
|
||||
"movl %1, %%eax\n\t" \
|
||||
- "call __libc_do_syscall" \
|
||||
+ "call " I386_DO_SYSCALL_STRING \
|
||||
: "=a" (resultvar) \
|
||||
: "i" (__NR_##name), "c" (arg2), "d" (arg3), "S" (arg4), "D" (&_xv) \
|
||||
: "memory", "cc")
|
||||
@@ -316,7 +325,7 @@ struct libc_do_syscall_args
|
||||
}; \
|
||||
asm volatile ( \
|
||||
"movl %1, %%eax\n\t" \
|
||||
- "call __libc_do_syscall" \
|
||||
+ "call " I386_DO_SYSCALL_STRING \
|
||||
: "=a" (resultvar) \
|
||||
: "a" (name), "c" (arg2), "d" (arg3), "S" (arg4), "D" (&_xv) \
|
||||
: "memory", "cc")
|
93
glibc-upstream-2.34-248.patch
Normal file
93
glibc-upstream-2.34-248.patch
Normal file
@ -0,0 +1,93 @@
|
||||
commit b38c9cdb58061d357cdf9bca4f6967d487becb82
|
||||
Author: Florian Weimer <fweimer@redhat.com>
|
||||
Date: Wed May 4 15:37:21 2022 +0200
|
||||
|
||||
Linux: Define MMAP_CALL_INTERNAL
|
||||
|
||||
Unlike MMAP_CALL, this avoids a TCB dependency for an errno update
|
||||
on failure.
|
||||
|
||||
<mmap_internal.h> cannot be included as is on several architectures
|
||||
due to the definition of page_unit, so introduce a separate header
|
||||
file for the definition of MMAP_CALL and MMAP_CALL_INTERNAL,
|
||||
<mmap_call.h>.
|
||||
|
||||
Reviewed-by: Stefan Liebler <stli@linux.ibm.com>
|
||||
(cherry picked from commit c1b68685d438373efe64e5f076f4215723004dfb)
|
||||
|
||||
diff --git a/sysdeps/unix/sysv/linux/mmap_call.h b/sysdeps/unix/sysv/linux/mmap_call.h
|
||||
new file mode 100644
|
||||
index 0000000000000000..3547c99e149e5064
|
||||
--- /dev/null
|
||||
+++ b/sysdeps/unix/sysv/linux/mmap_call.h
|
||||
@@ -0,0 +1,22 @@
|
||||
+/* Generic definition of MMAP_CALL and MMAP_CALL_INTERNAL.
|
||||
+ Copyright (C) 2017-2022 Free Software Foundation, Inc.
|
||||
+ This file is part of the GNU C Library.
|
||||
+
|
||||
+ The GNU C Library is free software; you can redistribute it and/or
|
||||
+ modify it under the terms of the GNU Lesser General Public
|
||||
+ License as published by the Free Software Foundation; either
|
||||
+ version 2.1 of the License, or (at your option) any later version.
|
||||
+
|
||||
+ The GNU C Library is distributed in the hope that it will be useful,
|
||||
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
|
||||
+ Lesser General Public License for more details.
|
||||
+
|
||||
+ You should have received a copy of the GNU Lesser General Public
|
||||
+ License along with the GNU C Library; if not, see
|
||||
+ <https://www.gnu.org/licenses/>. */
|
||||
+
|
||||
+#define MMAP_CALL(__nr, __addr, __len, __prot, __flags, __fd, __offset) \
|
||||
+ INLINE_SYSCALL_CALL (__nr, __addr, __len, __prot, __flags, __fd, __offset)
|
||||
+#define MMAP_CALL_INTERNAL(__nr, __addr, __len, __prot, __flags, __fd, __offset) \
|
||||
+ INTERNAL_SYSCALL_CALL (__nr, __addr, __len, __prot, __flags, __fd, __offset)
|
||||
diff --git a/sysdeps/unix/sysv/linux/mmap_internal.h b/sysdeps/unix/sysv/linux/mmap_internal.h
|
||||
index 5ca6976191137f95..989eb0c7c6b57dc1 100644
|
||||
--- a/sysdeps/unix/sysv/linux/mmap_internal.h
|
||||
+++ b/sysdeps/unix/sysv/linux/mmap_internal.h
|
||||
@@ -40,10 +40,6 @@ static uint64_t page_unit;
|
||||
/* Do not accept offset not multiple of page size. */
|
||||
#define MMAP_OFF_LOW_MASK (MMAP2_PAGE_UNIT - 1)
|
||||
|
||||
-/* An architecture may override this. */
|
||||
-#ifndef MMAP_CALL
|
||||
-# define MMAP_CALL(__nr, __addr, __len, __prot, __flags, __fd, __offset) \
|
||||
- INLINE_SYSCALL_CALL (__nr, __addr, __len, __prot, __flags, __fd, __offset)
|
||||
-#endif
|
||||
+#include <mmap_call.h>
|
||||
|
||||
#endif /* MMAP_INTERNAL_LINUX_H */
|
||||
diff --git a/sysdeps/unix/sysv/linux/s390/mmap_internal.h b/sysdeps/unix/sysv/linux/s390/mmap_call.h
|
||||
similarity index 78%
|
||||
rename from sysdeps/unix/sysv/linux/s390/mmap_internal.h
|
||||
rename to sysdeps/unix/sysv/linux/s390/mmap_call.h
|
||||
index 46f1c3769d6b586a..bdd30cc83764c2c1 100644
|
||||
--- a/sysdeps/unix/sysv/linux/s390/mmap_internal.h
|
||||
+++ b/sysdeps/unix/sysv/linux/s390/mmap_call.h
|
||||
@@ -16,9 +16,6 @@
|
||||
License along with the GNU C Library; if not, see
|
||||
<https://www.gnu.org/licenses/>. */
|
||||
|
||||
-#ifndef MMAP_S390_INTERNAL_H
|
||||
-# define MMAP_S390_INTERNAL_H
|
||||
-
|
||||
#define MMAP_CALL(__nr, __addr, __len, __prot, __flags, __fd, __offset) \
|
||||
({ \
|
||||
long int __args[6] = { (long int) (__addr), (long int) (__len), \
|
||||
@@ -26,7 +23,10 @@
|
||||
(long int) (__fd), (long int) (__offset) }; \
|
||||
INLINE_SYSCALL_CALL (__nr, __args); \
|
||||
})
|
||||
-
|
||||
-#include_next <mmap_internal.h>
|
||||
-
|
||||
-#endif
|
||||
+#define MMAP_CALL_INTERNAL(__nr, __addr, __len, __prot, __flags, __fd, __offset) \
|
||||
+ ({ \
|
||||
+ long int __args[6] = { (long int) (__addr), (long int) (__len), \
|
||||
+ (long int) (__prot), (long int) (__flags), \
|
||||
+ (long int) (__fd), (long int) (__offset) }; \
|
||||
+ INTERNAL_SYSCALL_CALL (__nr, __args); \
|
||||
+ })
|
88
glibc-upstream-2.34-249.patch
Normal file
88
glibc-upstream-2.34-249.patch
Normal file
@ -0,0 +1,88 @@
|
||||
commit b2387bea84560d286613257139aba6787f414594
|
||||
Author: Florian Weimer <fweimer@redhat.com>
|
||||
Date: Mon May 9 18:15:16 2022 +0200
|
||||
|
||||
ia64: Always define IA64_USE_NEW_STUB as a flag macro
|
||||
|
||||
And keep the previous definition if it exists. This allows
|
||||
disabling IA64_USE_NEW_STUB while keeping USE_DL_SYSINFO defined.
|
||||
|
||||
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
|
||||
(cherry picked from commit 18bd9c3d3b1b6a9182698c85354578d1d58e9d64)
|
||||
|
||||
diff --git a/sysdeps/unix/sysv/linux/ia64/brk.c b/sysdeps/unix/sysv/linux/ia64/brk.c
|
||||
index cf2c5bd667fb4432..61d8fa260eb59d1e 100644
|
||||
--- a/sysdeps/unix/sysv/linux/ia64/brk.c
|
||||
+++ b/sysdeps/unix/sysv/linux/ia64/brk.c
|
||||
@@ -16,7 +16,6 @@
|
||||
License along with the GNU C Library; if not, see
|
||||
<https://www.gnu.org/licenses/>. */
|
||||
|
||||
-#include <dl-sysdep.h>
|
||||
-/* brk is used by statup before TCB is properly set. */
|
||||
-#undef USE_DL_SYSINFO
|
||||
+/* brk is used by startup before TCB is properly set up. */
|
||||
+#define IA64_USE_NEW_STUB 0
|
||||
#include <sysdeps/unix/sysv/linux/brk.c>
|
||||
diff --git a/sysdeps/unix/sysv/linux/ia64/sysdep.h b/sysdeps/unix/sysv/linux/ia64/sysdep.h
|
||||
index 7198c192a03b7676..f1c81a66833941cc 100644
|
||||
--- a/sysdeps/unix/sysv/linux/ia64/sysdep.h
|
||||
+++ b/sysdeps/unix/sysv/linux/ia64/sysdep.h
|
||||
@@ -46,12 +46,15 @@
|
||||
#undef SYS_ify
|
||||
#define SYS_ify(syscall_name) __NR_##syscall_name
|
||||
|
||||
-#if defined USE_DL_SYSINFO \
|
||||
- && (IS_IN (libc) \
|
||||
- || IS_IN (libpthread) || IS_IN (librt))
|
||||
-# define IA64_USE_NEW_STUB
|
||||
-#else
|
||||
-# undef IA64_USE_NEW_STUB
|
||||
+#ifndef IA64_USE_NEW_STUB
|
||||
+# if defined USE_DL_SYSINFO && IS_IN (libc)
|
||||
+# define IA64_USE_NEW_STUB 1
|
||||
+# else
|
||||
+# define IA64_USE_NEW_STUB 0
|
||||
+# endif
|
||||
+#endif
|
||||
+#if IA64_USE_NEW_STUB && !USE_DL_SYSINFO
|
||||
+# error IA64_USE_NEW_STUB needs USE_DL_SYSINFO
|
||||
#endif
|
||||
|
||||
#ifdef __ASSEMBLER__
|
||||
@@ -103,7 +106,7 @@
|
||||
mov r15=num; \
|
||||
break __IA64_BREAK_SYSCALL
|
||||
|
||||
-#ifdef IA64_USE_NEW_STUB
|
||||
+#if IA64_USE_NEW_STUB
|
||||
# ifdef SHARED
|
||||
# define DO_CALL(num) \
|
||||
.prologue; \
|
||||
@@ -187,7 +190,7 @@
|
||||
(non-negative) errno on error or the return value on success.
|
||||
*/
|
||||
|
||||
-#ifdef IA64_USE_NEW_STUB
|
||||
+#if IA64_USE_NEW_STUB
|
||||
|
||||
# define INTERNAL_SYSCALL_NCS(name, nr, args...) \
|
||||
({ \
|
||||
@@ -279,7 +282,7 @@
|
||||
#define ASM_OUTARGS_5 ASM_OUTARGS_4, "=r" (_out4)
|
||||
#define ASM_OUTARGS_6 ASM_OUTARGS_5, "=r" (_out5)
|
||||
|
||||
-#ifdef IA64_USE_NEW_STUB
|
||||
+#if IA64_USE_NEW_STUB
|
||||
#define ASM_ARGS_0
|
||||
#define ASM_ARGS_1 ASM_ARGS_0, "4" (_out0)
|
||||
#define ASM_ARGS_2 ASM_ARGS_1, "5" (_out1)
|
||||
@@ -315,7 +318,7 @@
|
||||
/* Branch registers. */ \
|
||||
"b6"
|
||||
|
||||
-#ifdef IA64_USE_NEW_STUB
|
||||
+#if IA64_USE_NEW_STUB
|
||||
# define ASM_CLOBBERS_6 ASM_CLOBBERS_6_COMMON
|
||||
#else
|
||||
# define ASM_CLOBBERS_6 ASM_CLOBBERS_6_COMMON , "b7"
|
121
glibc-upstream-2.34-250.patch
Normal file
121
glibc-upstream-2.34-250.patch
Normal file
@ -0,0 +1,121 @@
|
||||
commit e7ca2a475cf2e7ffc987b8d08e1a40337840b500
|
||||
Author: Florian Weimer <fweimer@redhat.com>
|
||||
Date: Mon May 9 18:15:16 2022 +0200
|
||||
|
||||
Linux: Implement a useful version of _startup_fatal
|
||||
|
||||
On i386 and ia64, the TCB is not available at this point.
|
||||
|
||||
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
|
||||
(cherry picked from commit a2a6bce7d7e52c1c34369a7da62c501cc350bc31)
|
||||
|
||||
diff --git a/sysdeps/unix/sysv/linux/i386/startup.h b/sysdeps/unix/sysv/linux/i386/startup.h
|
||||
index 192c765361c17ed1..213805d7d2d459be 100644
|
||||
--- a/sysdeps/unix/sysv/linux/i386/startup.h
|
||||
+++ b/sysdeps/unix/sysv/linux/i386/startup.h
|
||||
@@ -1,5 +1,5 @@
|
||||
/* Linux/i386 definitions of functions used by static libc main startup.
|
||||
- Copyright (C) 2017-2021 Free Software Foundation, Inc.
|
||||
+ Copyright (C) 2022 Free Software Foundation, Inc.
|
||||
This file is part of the GNU C Library.
|
||||
|
||||
The GNU C Library is free software; you can redistribute it and/or
|
||||
@@ -16,22 +16,7 @@
|
||||
License along with the GNU C Library; if not, see
|
||||
<https://www.gnu.org/licenses/>. */
|
||||
|
||||
-#if BUILD_PIE_DEFAULT
|
||||
-/* Can't use "call *%gs:SYSINFO_OFFSET" during statup in static PIE. */
|
||||
-# define I386_USE_SYSENTER 0
|
||||
+/* Can't use "call *%gs:SYSINFO_OFFSET" during startup. */
|
||||
+#define I386_USE_SYSENTER 0
|
||||
|
||||
-# include <sysdep.h>
|
||||
-# include <abort-instr.h>
|
||||
-
|
||||
-__attribute__ ((__noreturn__))
|
||||
-static inline void
|
||||
-_startup_fatal (const char *message __attribute__ ((unused)))
|
||||
-{
|
||||
- /* This is only called very early during startup in static PIE.
|
||||
- FIXME: How can it be improved? */
|
||||
- ABORT_INSTRUCTION;
|
||||
- __builtin_unreachable ();
|
||||
-}
|
||||
-#else
|
||||
-# include_next <startup.h>
|
||||
-#endif
|
||||
+#include_next <startup.h>
|
||||
diff --git a/sysdeps/unix/sysv/linux/ia64/startup.h b/sysdeps/unix/sysv/linux/ia64/startup.h
|
||||
new file mode 100644
|
||||
index 0000000000000000..77f29f15a2103ed5
|
||||
--- /dev/null
|
||||
+++ b/sysdeps/unix/sysv/linux/ia64/startup.h
|
||||
@@ -0,0 +1,22 @@
|
||||
+/* Linux/ia64 definitions of functions used by static libc main startup.
|
||||
+ Copyright (C) 2022 Free Software Foundation, Inc.
|
||||
+ This file is part of the GNU C Library.
|
||||
+
|
||||
+ The GNU C Library is free software; you can redistribute it and/or
|
||||
+ modify it under the terms of the GNU Lesser General Public
|
||||
+ License as published by the Free Software Foundation; either
|
||||
+ version 2.1 of the License, or (at your option) any later version.
|
||||
+
|
||||
+ The GNU C Library is distributed in the hope that it will be useful,
|
||||
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
|
||||
+ Lesser General Public License for more details.
|
||||
+
|
||||
+ You should have received a copy of the GNU Lesser General Public
|
||||
+ License along with the GNU C Library; if not, see
|
||||
+ <https://www.gnu.org/licenses/>. */
|
||||
+
|
||||
+/* This code is used before the TCB is set up. */
|
||||
+#define IA64_USE_NEW_STUB 0
|
||||
+
|
||||
+#include_next <startup.h>
|
||||
diff --git a/sysdeps/unix/sysv/linux/startup.h b/sysdeps/unix/sysv/linux/startup.h
|
||||
new file mode 100644
|
||||
index 0000000000000000..39859b404a84798b
|
||||
--- /dev/null
|
||||
+++ b/sysdeps/unix/sysv/linux/startup.h
|
||||
@@ -0,0 +1,39 @@
|
||||
+/* Linux definitions of functions used by static libc main startup.
|
||||
+ Copyright (C) 2017-2022 Free Software Foundation, Inc.
|
||||
+ This file is part of the GNU C Library.
|
||||
+
|
||||
+ The GNU C Library is free software; you can redistribute it and/or
|
||||
+ modify it under the terms of the GNU Lesser General Public
|
||||
+ License as published by the Free Software Foundation; either
|
||||
+ version 2.1 of the License, or (at your option) any later version.
|
||||
+
|
||||
+ The GNU C Library is distributed in the hope that it will be useful,
|
||||
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
|
||||
+ Lesser General Public License for more details.
|
||||
+
|
||||
+ You should have received a copy of the GNU Lesser General Public
|
||||
+ License along with the GNU C Library; if not, see
|
||||
+ <https://www.gnu.org/licenses/>. */
|
||||
+
|
||||
+#ifdef SHARED
|
||||
+# include_next <startup.h>
|
||||
+#else
|
||||
+# include <sysdep.h>
|
||||
+
|
||||
+/* Avoid a run-time invocation of strlen. */
|
||||
+#define _startup_fatal(message) \
|
||||
+ do \
|
||||
+ { \
|
||||
+ size_t __message_length = __builtin_strlen (message); \
|
||||
+ if (! __builtin_constant_p (__message_length)) \
|
||||
+ { \
|
||||
+ extern void _startup_fatal_not_constant (void); \
|
||||
+ _startup_fatal_not_constant (); \
|
||||
+ } \
|
||||
+ INTERNAL_SYSCALL_CALL (write, STDERR_FILENO, (message), \
|
||||
+ __message_length); \
|
||||
+ INTERNAL_SYSCALL_CALL (exit_group, 127); \
|
||||
+ } \
|
||||
+ while (0)
|
||||
+#endif /* !SHARED */
|
150
glibc-upstream-2.34-251.patch
Normal file
150
glibc-upstream-2.34-251.patch
Normal file
@ -0,0 +1,150 @@
|
||||
commit 43d77ef9b87533221890423e491eed1b8ca81f0c
|
||||
Author: Florian Weimer <fweimer@redhat.com>
|
||||
Date: Mon May 16 18:41:43 2022 +0200
|
||||
|
||||
Linux: Introduce __brk_call for invoking the brk system call
|
||||
|
||||
Alpha and sparc can now use the generic implementation.
|
||||
|
||||
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
|
||||
(cherry picked from commit b57ab258c1140bc45464b4b9908713e3e0ee35aa)
|
||||
|
||||
diff --git a/sysdeps/unix/sysv/linux/alpha/brk_call.h b/sysdeps/unix/sysv/linux/alpha/brk_call.h
|
||||
new file mode 100644
|
||||
index 0000000000000000..b8088cf13f938c88
|
||||
--- /dev/null
|
||||
+++ b/sysdeps/unix/sysv/linux/alpha/brk_call.h
|
||||
@@ -0,0 +1,28 @@
|
||||
+/* Invoke the brk system call. Alpha version.
|
||||
+ Copyright (C) 2022 Free Software Foundation, Inc.
|
||||
+ This file is part of the GNU C Library.
|
||||
+
|
||||
+ The GNU C Library is free software; you can redistribute it and/or
|
||||
+ modify it under the terms of the GNU Lesser General Public
|
||||
+ License as published by the Free Software Foundation; either
|
||||
+ version 2.1 of the License, or (at your option) any later version.
|
||||
+
|
||||
+ The GNU C Library is distributed in the hope that it will be useful,
|
||||
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
|
||||
+ Lesser General Public License for more details.
|
||||
+
|
||||
+ You should have received a copy of the GNU Lesser General Public
|
||||
+ License along with the GNU C Library. If not, see
|
||||
+ <https://www.gnu.org/licenses/>. */
|
||||
+
|
||||
+static inline void *
|
||||
+__brk_call (void *addr)
|
||||
+{
|
||||
+ unsigned long int result = INTERNAL_SYSCALL_CALL (brk, addr);
|
||||
+ if (result == -ENOMEM)
|
||||
+ /* Mimic the default error reporting behavior. */
|
||||
+ return addr;
|
||||
+ else
|
||||
+ return (void *) result;
|
||||
+}
|
||||
diff --git a/sysdeps/unix/sysv/linux/brk.c b/sysdeps/unix/sysv/linux/brk.c
|
||||
index 2d70d824fc72d32d..20b11c15caae148d 100644
|
||||
--- a/sysdeps/unix/sysv/linux/brk.c
|
||||
+++ b/sysdeps/unix/sysv/linux/brk.c
|
||||
@@ -19,6 +19,7 @@
|
||||
#include <errno.h>
|
||||
#include <unistd.h>
|
||||
#include <sysdep.h>
|
||||
+#include <brk_call.h>
|
||||
|
||||
/* This must be initialized data because commons can't have aliases. */
|
||||
void *__curbrk = 0;
|
||||
@@ -33,7 +34,7 @@ weak_alias (__curbrk, ___brk_addr)
|
||||
int
|
||||
__brk (void *addr)
|
||||
{
|
||||
- __curbrk = (void *) INTERNAL_SYSCALL_CALL (brk, addr);
|
||||
+ __curbrk = __brk_call (addr);
|
||||
if (__curbrk < addr)
|
||||
{
|
||||
__set_errno (ENOMEM);
|
||||
diff --git a/sysdeps/unix/sysv/linux/brk_call.h b/sysdeps/unix/sysv/linux/brk_call.h
|
||||
new file mode 100644
|
||||
index 0000000000000000..72370c25d785a9ab
|
||||
--- /dev/null
|
||||
+++ b/sysdeps/unix/sysv/linux/brk_call.h
|
||||
@@ -0,0 +1,25 @@
|
||||
+/* Invoke the brk system call. Generic Linux version.
|
||||
+ Copyright (C) 2022 Free Software Foundation, Inc.
|
||||
+ This file is part of the GNU C Library.
|
||||
+
|
||||
+ The GNU C Library is free software; you can redistribute it and/or
|
||||
+ modify it under the terms of the GNU Lesser General Public
|
||||
+ License as published by the Free Software Foundation; either
|
||||
+ version 2.1 of the License, or (at your option) any later version.
|
||||
+
|
||||
+ The GNU C Library is distributed in the hope that it will be useful,
|
||||
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
|
||||
+ Lesser General Public License for more details.
|
||||
+
|
||||
+ You should have received a copy of the GNU Lesser General Public
|
||||
+ License along with the GNU C Library. If not, see
|
||||
+ <https://www.gnu.org/licenses/>. */
|
||||
+
|
||||
+static inline void *
|
||||
+__brk_call (void *addr)
|
||||
+{
|
||||
+ /* The default implementation reports errors through an unchanged
|
||||
+ break. */
|
||||
+ return (void *) INTERNAL_SYSCALL_CALL (brk, addr);
|
||||
+}
|
||||
diff --git a/sysdeps/unix/sysv/linux/alpha/brk.c b/sysdeps/unix/sysv/linux/sparc/brk_call.h
|
||||
similarity index 61%
|
||||
rename from sysdeps/unix/sysv/linux/alpha/brk.c
|
||||
rename to sysdeps/unix/sysv/linux/sparc/brk_call.h
|
||||
index 074c47e054bfeb11..59ce5216601143fb 100644
|
||||
--- a/sysdeps/unix/sysv/linux/alpha/brk.c
|
||||
+++ b/sysdeps/unix/sysv/linux/sparc/brk_call.h
|
||||
@@ -1,5 +1,5 @@
|
||||
-/* Change data segment size. Linux/Alpha.
|
||||
- Copyright (C) 2020-2021 Free Software Foundation, Inc.
|
||||
+/* Invoke the brk system call. Sparc version.
|
||||
+ Copyright (C) 2022 Free Software Foundation, Inc.
|
||||
This file is part of the GNU C Library.
|
||||
|
||||
The GNU C Library is free software; you can redistribute it and/or
|
||||
@@ -16,23 +16,20 @@
|
||||
License along with the GNU C Library. If not, see
|
||||
<https://www.gnu.org/licenses/>. */
|
||||
|
||||
-#include <errno.h>
|
||||
-#include <unistd.h>
|
||||
-#include <sysdep.h>
|
||||
+#ifdef __arch64__
|
||||
+# define SYSCALL_NUM "0x6d"
|
||||
+#else
|
||||
+# define SYSCALL_NUM "0x10"
|
||||
+#endif
|
||||
|
||||
-void *__curbrk = 0;
|
||||
-
|
||||
-int
|
||||
-__brk (void *addr)
|
||||
+static inline void *
|
||||
+__brk_call (void *addr)
|
||||
{
|
||||
- /* Alpha brk returns -ENOMEM in case of failure. */
|
||||
- __curbrk = (void *) INTERNAL_SYSCALL_CALL (brk, addr);
|
||||
- if ((unsigned long) __curbrk == -ENOMEM)
|
||||
- {
|
||||
- __set_errno (ENOMEM);
|
||||
- return -1;
|
||||
- }
|
||||
-
|
||||
- return 0;
|
||||
+ register long int g1 asm ("g1") = __NR_brk;
|
||||
+ register long int o0 asm ("o0") = (long int) addr;
|
||||
+ asm volatile ("ta " SYSCALL_NUM
|
||||
+ : "=r"(o0)
|
||||
+ : "r"(g1), "0"(o0)
|
||||
+ : "cc");
|
||||
+ return (void *) o0;
|
||||
}
|
||||
-weak_alias (__brk, brk)
|
510
glibc-upstream-2.34-252.patch
Normal file
510
glibc-upstream-2.34-252.patch
Normal file
@ -0,0 +1,510 @@
|
||||
commit ede8d94d154157d269b18f3601440ac576c1f96a
|
||||
Author: Florian Weimer <fweimer@redhat.com>
|
||||
Date: Mon May 16 18:41:43 2022 +0200
|
||||
|
||||
csu: Implement and use _dl_early_allocate during static startup
|
||||
|
||||
This implements mmap fallback for a brk failure during TLS
|
||||
allocation.
|
||||
|
||||
scripts/tls-elf-edit.py is updated to support the new patching method.
|
||||
The script no longer requires that in the input object is of ET_DYN
|
||||
type.
|
||||
|
||||
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
|
||||
(cherry picked from commit f787e138aa0bf677bf74fa2a08595c446292f3d7)
|
||||
|
||||
Conflicts:
|
||||
elf/Makefile
|
||||
(missing ld.so static execve backport upstream)
|
||||
sysdeps/generic/ldsodefs.h
|
||||
(missing ld.so dependency sorting optimization upstream)
|
||||
|
||||
diff --git a/csu/libc-tls.c b/csu/libc-tls.c
|
||||
index d83e69f6257ae981..738f59f46b62c31c 100644
|
||||
--- a/csu/libc-tls.c
|
||||
+++ b/csu/libc-tls.c
|
||||
@@ -145,11 +145,16 @@ __libc_setup_tls (void)
|
||||
_dl_allocate_tls_storage (in elf/dl-tls.c) does using __libc_memalign
|
||||
and dl_tls_static_align. */
|
||||
tcb_offset = roundup (memsz + GLRO(dl_tls_static_surplus), max_align);
|
||||
- tlsblock = __sbrk (tcb_offset + TLS_INIT_TCB_SIZE + max_align);
|
||||
+ tlsblock = _dl_early_allocate (tcb_offset + TLS_INIT_TCB_SIZE + max_align);
|
||||
+ if (tlsblock == NULL)
|
||||
+ _startup_fatal ("Fatal glibc error: Cannot allocate TLS block\n");
|
||||
#elif TLS_DTV_AT_TP
|
||||
tcb_offset = roundup (TLS_INIT_TCB_SIZE, align ?: 1);
|
||||
- tlsblock = __sbrk (tcb_offset + memsz + max_align
|
||||
- + TLS_PRE_TCB_SIZE + GLRO(dl_tls_static_surplus));
|
||||
+ tlsblock = _dl_early_allocate (tcb_offset + memsz + max_align
|
||||
+ + TLS_PRE_TCB_SIZE
|
||||
+ + GLRO(dl_tls_static_surplus));
|
||||
+ if (tlsblock == NULL)
|
||||
+ _startup_fatal ("Fatal glibc error: Cannot allocate TLS block\n");
|
||||
tlsblock += TLS_PRE_TCB_SIZE;
|
||||
#else
|
||||
/* In case a model with a different layout for the TCB and DTV
|
||||
diff --git a/elf/Makefile b/elf/Makefile
|
||||
index 6423ebbdd7708a14..ea1512549be3f628 100644
|
||||
--- a/elf/Makefile
|
||||
+++ b/elf/Makefile
|
||||
@@ -33,6 +33,7 @@ routines = \
|
||||
$(all-dl-routines) \
|
||||
dl-addr \
|
||||
dl-addr-obj \
|
||||
+ dl-early_allocate \
|
||||
dl-error \
|
||||
dl-iteratephdr \
|
||||
dl-libc \
|
||||
@@ -104,6 +105,7 @@ all-dl-routines = $(dl-routines) $(sysdep-dl-routines)
|
||||
# But they are absent from the shared libc, because that code is in ld.so.
|
||||
elide-routines.os = \
|
||||
$(all-dl-routines) \
|
||||
+ dl-early_allocate \
|
||||
dl-exception \
|
||||
dl-origin \
|
||||
dl-reloc-static-pie \
|
||||
@@ -264,6 +266,7 @@ tests-static-normal := \
|
||||
tst-linkall-static \
|
||||
tst-single_threaded-pthread-static \
|
||||
tst-single_threaded-static \
|
||||
+ tst-tls-allocation-failure-static \
|
||||
tst-tlsalign-extern-static \
|
||||
tst-tlsalign-static \
|
||||
# tests-static-normal
|
||||
@@ -1101,6 +1104,10 @@ $(objpfx)tst-glibcelf.out: tst-glibcelf.py elf.h $(..)/scripts/glibcelf.py \
|
||||
--cc="$(CC) $(patsubst -DMODULE_NAME=%,-DMODULE_NAME=testsuite,$(CPPFLAGS))" \
|
||||
< /dev/null > $@ 2>&1; $(evaluate-test)
|
||||
|
||||
+ifeq ($(run-built-tests),yes)
|
||||
+tests-special += $(objpfx)tst-tls-allocation-failure-static-patched.out
|
||||
+endif
|
||||
+
|
||||
# The test requires shared _and_ PIE because the executable
|
||||
# unit test driver must be able to link with the shared object
|
||||
# that is going to eventually go into an installed DSO.
|
||||
@@ -2637,3 +2644,15 @@ $(objpfx)tst-ro-dynamic-mod.so: $(objpfx)tst-ro-dynamic-mod.os \
|
||||
$(objpfx)tst-ro-dynamic-mod.os
|
||||
|
||||
$(objpfx)tst-rtld-run-static.out: $(objpfx)/ldconfig
|
||||
+
|
||||
+$(objpfx)tst-tls-allocation-failure-static-patched: \
|
||||
+ $(objpfx)tst-tls-allocation-failure-static $(..)scripts/tst-elf-edit.py
|
||||
+ cp $< $@
|
||||
+ $(PYTHON) $(..)scripts/tst-elf-edit.py --maximize-tls-size $@
|
||||
+
|
||||
+$(objpfx)tst-tls-allocation-failure-static-patched.out: \
|
||||
+ $(objpfx)tst-tls-allocation-failure-static-patched
|
||||
+ $< > $@ 2>&1; echo "status: $$?" >> $@
|
||||
+ grep -q '^Fatal glibc error: Cannot allocate TLS block$$' $@ \
|
||||
+ && grep -q '^status: 127$$' $@; \
|
||||
+ $(evaluate-test)
|
||||
diff --git a/elf/dl-early_allocate.c b/elf/dl-early_allocate.c
|
||||
new file mode 100644
|
||||
index 0000000000000000..61677aaa0364c209
|
||||
--- /dev/null
|
||||
+++ b/elf/dl-early_allocate.c
|
||||
@@ -0,0 +1,30 @@
|
||||
+/* Early memory allocation for the dynamic loader. Generic version.
|
||||
+ Copyright (C) 2022 Free Software Foundation, Inc.
|
||||
+ This file is part of the GNU C Library.
|
||||
+
|
||||
+ The GNU C Library is free software; you can redistribute it and/or
|
||||
+ modify it under the terms of the GNU Lesser General Public
|
||||
+ License as published by the Free Software Foundation; either
|
||||
+ version 2.1 of the License, or (at your option) any later version.
|
||||
+
|
||||
+ The GNU C Library is distributed in the hope that it will be useful,
|
||||
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
|
||||
+ Lesser General Public License for more details.
|
||||
+
|
||||
+ You should have received a copy of the GNU Lesser General Public
|
||||
+ License along with the GNU C Library; if not, see
|
||||
+ <https://www.gnu.org/licenses/>. */
|
||||
+
|
||||
+#include <ldsodefs.h>
|
||||
+#include <stddef.h>
|
||||
+#include <unistd.h>
|
||||
+
|
||||
+void *
|
||||
+_dl_early_allocate (size_t size)
|
||||
+{
|
||||
+ void *result = __sbrk (size);
|
||||
+ if (result == (void *) -1)
|
||||
+ result = NULL;
|
||||
+ return result;
|
||||
+}
|
||||
diff --git a/elf/tst-tls-allocation-failure-static.c b/elf/tst-tls-allocation-failure-static.c
|
||||
new file mode 100644
|
||||
index 0000000000000000..8de831b2469ba390
|
||||
--- /dev/null
|
||||
+++ b/elf/tst-tls-allocation-failure-static.c
|
||||
@@ -0,0 +1,31 @@
|
||||
+/* Base for test program with impossiblyh large PT_TLS segment.
|
||||
+ Copyright (C) 2022 Free Software Foundation, Inc.
|
||||
+ This file is part of the GNU C Library.
|
||||
+
|
||||
+ The GNU C Library is free software; you can redistribute it and/or
|
||||
+ modify it under the terms of the GNU Lesser General Public
|
||||
+ License as published by the Free Software Foundation; either
|
||||
+ version 2.1 of the License, or (at your option) any later version.
|
||||
+
|
||||
+ The GNU C Library is distributed in the hope that it will be useful,
|
||||
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
|
||||
+ Lesser General Public License for more details.
|
||||
+
|
||||
+ You should have received a copy of the GNU Lesser General Public
|
||||
+ License along with the GNU C Library; if not, see
|
||||
+ <https://www.gnu.org/licenses/>. */
|
||||
+
|
||||
+/* The test actual binary is patched using scripts/tst-elf-edit.py
|
||||
+ --maximize-tls-size, and this introduces the expected test
|
||||
+ allocation failure due to an excessive PT_LS p_memsz value.
|
||||
+
|
||||
+ Patching the binary is required because on some 64-bit targets, TLS
|
||||
+ relocations can only cover a 32-bit range, and glibc-internal TLS
|
||||
+ variables such as errno end up outside that range. */
|
||||
+
|
||||
+int
|
||||
+main (void)
|
||||
+{
|
||||
+ return 0;
|
||||
+}
|
||||
diff --git a/scripts/tst-elf-edit.py b/scripts/tst-elf-edit.py
|
||||
new file mode 100644
|
||||
index 0000000000000000..0e19ce1e7392f3ca
|
||||
--- /dev/null
|
||||
+++ b/scripts/tst-elf-edit.py
|
||||
@@ -0,0 +1,226 @@
|
||||
+#!/usr/bin/python3
|
||||
+# ELF editor for load align tests.
|
||||
+# Copyright (C) 2022 Free Software Foundation, Inc.
|
||||
+# Copyright The GNU Toolchain Authors.
|
||||
+# This file is part of the GNU C Library.
|
||||
+#
|
||||
+# The GNU C Library is free software; you can redistribute it and/or
|
||||
+# modify it under the terms of the GNU Lesser General Public
|
||||
+# License as published by the Free Software Foundation; either
|
||||
+# version 2.1 of the License, or (at your option) any later version.
|
||||
+#
|
||||
+# The GNU C Library is distributed in the hope that it will be useful,
|
||||
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
|
||||
+# Lesser General Public License for more details.
|
||||
+#
|
||||
+# You should have received a copy of the GNU Lesser General Public
|
||||
+# License along with the GNU C Library; if not, see
|
||||
+# <https://www.gnu.org/licenses/>.
|
||||
+
|
||||
+import argparse
|
||||
+import os
|
||||
+import sys
|
||||
+import struct
|
||||
+
|
||||
+EI_NIDENT=16
|
||||
+
|
||||
+EI_MAG0=0
|
||||
+ELFMAG0=b'\x7f'
|
||||
+EI_MAG1=1
|
||||
+ELFMAG1=b'E'
|
||||
+EI_MAG2=2
|
||||
+ELFMAG2=b'L'
|
||||
+EI_MAG3=3
|
||||
+ELFMAG3=b'F'
|
||||
+
|
||||
+EI_CLASS=4
|
||||
+ELFCLASSNONE=b'0'
|
||||
+ELFCLASS32=b'\x01'
|
||||
+ELFCLASS64=b'\x02'
|
||||
+
|
||||
+EI_DATA=5
|
||||
+ELFDATA2LSB=b'\x01'
|
||||
+ELFDATA2MSB=b'\x02'
|
||||
+
|
||||
+ET_EXEC=2
|
||||
+ET_DYN=3
|
||||
+
|
||||
+PT_LOAD=1
|
||||
+PT_TLS=7
|
||||
+
|
||||
+def elf_types_fmts(e_ident):
|
||||
+ endian = '<' if e_ident[EI_DATA] == ELFDATA2LSB else '>'
|
||||
+ addr = 'I' if e_ident[EI_CLASS] == ELFCLASS32 else 'Q'
|
||||
+ off = 'I' if e_ident[EI_CLASS] == ELFCLASS32 else 'Q'
|
||||
+ return (endian, addr, off)
|
||||
+
|
||||
+class Elf_Ehdr:
|
||||
+ def __init__(self, e_ident):
|
||||
+ endian, addr, off = elf_types_fmts(e_ident)
|
||||
+ self.fmt = '{0}HHI{1}{2}{2}IHHHHHH'.format(endian, addr, off)
|
||||
+ self.len = struct.calcsize(self.fmt)
|
||||
+
|
||||
+ def read(self, f):
|
||||
+ buf = f.read(self.len)
|
||||
+ if not buf:
|
||||
+ error('{}: header too small'.format(f.name))
|
||||
+ data = struct.unpack(self.fmt, buf)
|
||||
+ self.e_type = data[0]
|
||||
+ self.e_machine = data[1]
|
||||
+ self.e_version = data[2]
|
||||
+ self.e_entry = data[3]
|
||||
+ self.e_phoff = data[4]
|
||||
+ self.e_shoff = data[5]
|
||||
+ self.e_flags = data[6]
|
||||
+ self.e_ehsize = data[7]
|
||||
+ self.e_phentsize= data[8]
|
||||
+ self.e_phnum = data[9]
|
||||
+ self.e_shstrndx = data[10]
|
||||
+
|
||||
+
|
||||
+class Elf_Phdr:
|
||||
+ def __init__(self, e_ident):
|
||||
+ endian, addr, off = elf_types_fmts(e_ident)
|
||||
+ self.ei_class = e_ident[EI_CLASS]
|
||||
+ if self.ei_class == ELFCLASS32:
|
||||
+ self.fmt = '{0}I{2}{1}{1}IIII'.format(endian, addr, off)
|
||||
+ else:
|
||||
+ self.fmt = '{0}II{2}{1}{1}QQQ'.format(endian, addr, off)
|
||||
+ self.len = struct.calcsize(self.fmt)
|
||||
+
|
||||
+ def read(self, f):
|
||||
+ buf = f.read(self.len)
|
||||
+ if len(buf) < self.len:
|
||||
+ error('{}: program header too small'.format(f.name))
|
||||
+ data = struct.unpack(self.fmt, buf)
|
||||
+ if self.ei_class == ELFCLASS32:
|
||||
+ self.p_type = data[0]
|
||||
+ self.p_offset = data[1]
|
||||
+ self.p_vaddr = data[2]
|
||||
+ self.p_paddr = data[3]
|
||||
+ self.p_filesz = data[4]
|
||||
+ self.p_memsz = data[5]
|
||||
+ self.p_flags = data[6]
|
||||
+ self.p_align = data[7]
|
||||
+ else:
|
||||
+ self.p_type = data[0]
|
||||
+ self.p_flags = data[1]
|
||||
+ self.p_offset = data[2]
|
||||
+ self.p_vaddr = data[3]
|
||||
+ self.p_paddr = data[4]
|
||||
+ self.p_filesz = data[5]
|
||||
+ self.p_memsz = data[6]
|
||||
+ self.p_align = data[7]
|
||||
+
|
||||
+ def write(self, f):
|
||||
+ if self.ei_class == ELFCLASS32:
|
||||
+ data = struct.pack(self.fmt,
|
||||
+ self.p_type,
|
||||
+ self.p_offset,
|
||||
+ self.p_vaddr,
|
||||
+ self.p_paddr,
|
||||
+ self.p_filesz,
|
||||
+ self.p_memsz,
|
||||
+ self.p_flags,
|
||||
+ self.p_align)
|
||||
+ else:
|
||||
+ data = struct.pack(self.fmt,
|
||||
+ self.p_type,
|
||||
+ self.p_flags,
|
||||
+ self.p_offset,
|
||||
+ self.p_vaddr,
|
||||
+ self.p_paddr,
|
||||
+ self.p_filesz,
|
||||
+ self.p_memsz,
|
||||
+ self.p_align)
|
||||
+ f.write(data)
|
||||
+
|
||||
+
|
||||
+def error(msg):
|
||||
+ print(msg, file=sys.stderr)
|
||||
+ sys.exit(1)
|
||||
+
|
||||
+
|
||||
+def elf_edit_align(phdr, align):
|
||||
+ if align == 'half':
|
||||
+ phdr.p_align = phdr.p_align >> 1
|
||||
+ else:
|
||||
+ phdr.p_align = int(align)
|
||||
+
|
||||
+def elf_edit_maximize_tls_size(phdr, elfclass):
|
||||
+ if elfclass == ELFCLASS32:
|
||||
+ # It is possible that the kernel can allocate half of the
|
||||
+ # address space, so use something larger.
|
||||
+ phdr.p_memsz = 0xfff00000
|
||||
+ else:
|
||||
+ phdr.p_memsz = 1 << 63
|
||||
+
|
||||
+def elf_edit(f, opts):
|
||||
+ ei_nident_fmt = 'c' * EI_NIDENT
|
||||
+ ei_nident_len = struct.calcsize(ei_nident_fmt)
|
||||
+
|
||||
+ data = f.read(ei_nident_len)
|
||||
+ if len(data) < ei_nident_len:
|
||||
+ error('{}: e_nident too small'.format(f.name))
|
||||
+ e_ident = struct.unpack(ei_nident_fmt, data)
|
||||
+
|
||||
+ if e_ident[EI_MAG0] != ELFMAG0 \
|
||||
+ or e_ident[EI_MAG1] != ELFMAG1 \
|
||||
+ or e_ident[EI_MAG2] != ELFMAG2 \
|
||||
+ or e_ident[EI_MAG3] != ELFMAG3:
|
||||
+ error('{}: bad ELF header'.format(f.name))
|
||||
+
|
||||
+ if e_ident[EI_CLASS] != ELFCLASS32 \
|
||||
+ and e_ident[EI_CLASS] != ELFCLASS64:
|
||||
+ error('{}: unsupported ELF class: {}'.format(f.name, e_ident[EI_CLASS]))
|
||||
+
|
||||
+ if e_ident[EI_DATA] != ELFDATA2LSB \
|
||||
+ and e_ident[EI_DATA] != ELFDATA2MSB: \
|
||||
+ error('{}: unsupported ELF data: {}'.format(f.name, e_ident[EI_DATA]))
|
||||
+
|
||||
+ ehdr = Elf_Ehdr(e_ident)
|
||||
+ ehdr.read(f)
|
||||
+ if ehdr.e_type not in (ET_EXEC, ET_DYN):
|
||||
+ error('{}: not an executable or shared library'.format(f.name))
|
||||
+
|
||||
+ phdr = Elf_Phdr(e_ident)
|
||||
+ maximize_tls_size_done = False
|
||||
+ for i in range(0, ehdr.e_phnum):
|
||||
+ f.seek(ehdr.e_phoff + i * phdr.len)
|
||||
+ phdr.read(f)
|
||||
+ if phdr.p_type == PT_LOAD and opts.align is not None:
|
||||
+ elf_edit_align(phdr, opts.align)
|
||||
+ f.seek(ehdr.e_phoff + i * phdr.len)
|
||||
+ phdr.write(f)
|
||||
+ break
|
||||
+ if phdr.p_type == PT_TLS and opts.maximize_tls_size:
|
||||
+ elf_edit_maximize_tls_size(phdr, e_ident[EI_CLASS])
|
||||
+ f.seek(ehdr.e_phoff + i * phdr.len)
|
||||
+ phdr.write(f)
|
||||
+ maximize_tls_size_done = True
|
||||
+ break
|
||||
+
|
||||
+ if opts.maximize_tls_size and not maximize_tls_size_done:
|
||||
+ error('{}: TLS maximum size was not updated'.format(f.name))
|
||||
+
|
||||
+def get_parser():
|
||||
+ parser = argparse.ArgumentParser(description=__doc__)
|
||||
+ parser.add_argument('-a', dest='align',
|
||||
+ help='How to set the LOAD alignment')
|
||||
+ parser.add_argument('--maximize-tls-size', action='store_true',
|
||||
+ help='Set maximum PT_TLS size')
|
||||
+ parser.add_argument('output',
|
||||
+ help='ELF file to edit')
|
||||
+ return parser
|
||||
+
|
||||
+
|
||||
+def main(argv):
|
||||
+ parser = get_parser()
|
||||
+ opts = parser.parse_args(argv)
|
||||
+ with open(opts.output, 'r+b') as fout:
|
||||
+ elf_edit(fout, opts)
|
||||
+
|
||||
+
|
||||
+if __name__ == '__main__':
|
||||
+ main(sys.argv[1:])
|
||||
diff --git a/sysdeps/generic/ldsodefs.h b/sysdeps/generic/ldsodefs.h
|
||||
index a38de94bf7ea8e93..87ad2f3f4d89eb7d 100644
|
||||
--- a/sysdeps/generic/ldsodefs.h
|
||||
+++ b/sysdeps/generic/ldsodefs.h
|
||||
@@ -1238,6 +1238,11 @@ extern struct link_map * _dl_get_dl_main_map (void)
|
||||
/* Initialize the DSO sort algorithm to use. */
|
||||
extern void _dl_sort_maps_init (void) attribute_hidden;
|
||||
|
||||
+/* Perform early memory allocation, avoding a TCB dependency.
|
||||
+ Terminate the process if allocation fails. May attempt to use
|
||||
+ brk. */
|
||||
+void *_dl_early_allocate (size_t size) attribute_hidden;
|
||||
+
|
||||
/* Initialization of libpthread for statically linked applications.
|
||||
If libpthread is not linked in, this is an empty function. */
|
||||
void __pthread_initialize_minimal (void) weak_function;
|
||||
diff --git a/sysdeps/unix/sysv/linux/dl-early_allocate.c b/sysdeps/unix/sysv/linux/dl-early_allocate.c
|
||||
new file mode 100644
|
||||
index 0000000000000000..52c538e85afa8522
|
||||
--- /dev/null
|
||||
+++ b/sysdeps/unix/sysv/linux/dl-early_allocate.c
|
||||
@@ -0,0 +1,82 @@
|
||||
+/* Early memory allocation for the dynamic loader. Generic version.
|
||||
+ Copyright (C) 2022 Free Software Foundation, Inc.
|
||||
+ This file is part of the GNU C Library.
|
||||
+
|
||||
+ The GNU C Library is free software; you can redistribute it and/or
|
||||
+ modify it under the terms of the GNU Lesser General Public
|
||||
+ License as published by the Free Software Foundation; either
|
||||
+ version 2.1 of the License, or (at your option) any later version.
|
||||
+
|
||||
+ The GNU C Library is distributed in the hope that it will be useful,
|
||||
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
|
||||
+ Lesser General Public License for more details.
|
||||
+
|
||||
+ You should have received a copy of the GNU Lesser General Public
|
||||
+ License along with the GNU C Library; if not, see
|
||||
+ <https://www.gnu.org/licenses/>. */
|
||||
+
|
||||
+/* Mark symbols hidden in static PIE for early self relocation to work. */
|
||||
+#if BUILD_PIE_DEFAULT
|
||||
+# pragma GCC visibility push(hidden)
|
||||
+#endif
|
||||
+#include <startup.h>
|
||||
+
|
||||
+#include <ldsodefs.h>
|
||||
+#include <stddef.h>
|
||||
+#include <string.h>
|
||||
+#include <sysdep.h>
|
||||
+#include <unistd.h>
|
||||
+
|
||||
+#include <brk_call.h>
|
||||
+#include <mmap_call.h>
|
||||
+
|
||||
+/* Defined in brk.c. */
|
||||
+extern void *__curbrk;
|
||||
+
|
||||
+void *
|
||||
+_dl_early_allocate (size_t size)
|
||||
+{
|
||||
+ void *result;
|
||||
+
|
||||
+ if (__curbrk != NULL)
|
||||
+ /* If the break has been initialized, brk must have run before,
|
||||
+ so just call it once more. */
|
||||
+ {
|
||||
+ result = __sbrk (size);
|
||||
+ if (result == (void *) -1)
|
||||
+ result = NULL;
|
||||
+ }
|
||||
+ else
|
||||
+ {
|
||||
+ /* If brk has not been invoked, there is no need to update
|
||||
+ __curbrk. The first call to brk will take care of that. */
|
||||
+ void *previous = __brk_call (0);
|
||||
+ result = __brk_call (previous + size);
|
||||
+ if (result == previous)
|
||||
+ result = NULL;
|
||||
+ else
|
||||
+ result = previous;
|
||||
+ }
|
||||
+
|
||||
+ /* If brk fails, fall back to mmap. This can happen due to
|
||||
+ unfortunate ASLR layout decisions and kernel bugs, particularly
|
||||
+ for static PIE. */
|
||||
+ if (result == NULL)
|
||||
+ {
|
||||
+ long int ret;
|
||||
+ int prot = PROT_READ | PROT_WRITE;
|
||||
+ int flags = MAP_PRIVATE | MAP_ANONYMOUS;
|
||||
+#ifdef __NR_mmap2
|
||||
+ ret = MMAP_CALL_INTERNAL (mmap2, 0, size, prot, flags, -1, 0);
|
||||
+#else
|
||||
+ ret = MMAP_CALL_INTERNAL (mmap, 0, size, prot, flags, -1, 0);
|
||||
+#endif
|
||||
+ if (INTERNAL_SYSCALL_ERROR_P (ret))
|
||||
+ result = NULL;
|
||||
+ else
|
||||
+ result = (void *) ret;
|
||||
+ }
|
||||
+
|
||||
+ return result;
|
||||
+}
|
350
glibc-upstream-2.34-253.patch
Normal file
350
glibc-upstream-2.34-253.patch
Normal file
@ -0,0 +1,350 @@
|
||||
commit 89b638f48ac5c9af5b1fe9caa6287d70127b66a5
|
||||
Author: Stefan Liebler <stli@linux.ibm.com>
|
||||
Date: Tue May 17 16:12:18 2022 +0200
|
||||
|
||||
S390: Enable static PIE
|
||||
|
||||
This commit enables static PIE on 64bit. On 31bit, static PIE is
|
||||
not supported.
|
||||
|
||||
A new configure check in sysdeps/s390/s390-64/configure.ac also performs
|
||||
a minimal test for requirements in ld:
|
||||
Ensure you also have those patches for:
|
||||
- binutils (ld)
|
||||
- "[PR ld/22263] s390: Avoid dynamic TLS relocs in PIE"
|
||||
https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=26b1426577b5dcb32d149c64cca3e603b81948a9
|
||||
(Tested by configure check above)
|
||||
Otherwise there will be a R_390_TLS_TPOFF relocation, which fails to
|
||||
be processed in _dl_relocate_static_pie() as static TLS map is not setup.
|
||||
- "s390: Add DT_JMPREL pointing to .rela.[i]plt with static-pie"
|
||||
https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=d942d8db12adf4c9e5c7d9ed6496a779ece7149e
|
||||
(We can't test it in configure as we are not able to link a static PIE
|
||||
executable if the system glibc lacks static PIE support)
|
||||
Otherwise there won't be DT_JMPREL, DT_PLTRELA, DT_PLTRELASZ entries
|
||||
and the IFUNC symbols are not processed, which leads to crashes.
|
||||
|
||||
- kernel (the mentioned links to the commits belong to 5.19 merge window):
|
||||
- "s390/mmap: increase stack/mmap gap to 128MB"
|
||||
https://git.kernel.org/pub/scm/linux/kernel/git/s390/linux.git/commit/?h=features&id=f2f47d0ef72c30622e62471903ea19446ea79ee2
|
||||
- "s390/vdso: move vdso mapping to its own function"
|
||||
https://git.kernel.org/pub/scm/linux/kernel/git/s390/linux.git/commit/?h=features&id=57761da4dc5cd60bed2c81ba0edb7495c3c740b8
|
||||
- "s390/vdso: map vdso above stack"
|
||||
https://git.kernel.org/pub/scm/linux/kernel/git/s390/linux.git/commit/?h=features&id=9e37a2e8546f9e48ea76c839116fa5174d14e033
|
||||
- "s390/vdso: add vdso randomization"
|
||||
https://git.kernel.org/pub/scm/linux/kernel/git/s390/linux.git/commit/?h=features&id=41cd81abafdc4e58a93fcb677712a76885e3ca25
|
||||
(We can't test the kernel of the target system)
|
||||
Otherwise if /proc/sys/kernel/randomize_va_space is turned off (0),
|
||||
static PIE executables like ldconfig will crash. While startup sbrk is
|
||||
used to enlarge the HEAP. Unfortunately the underlying brk syscall fails
|
||||
as there is not enough space after the HEAP. Then the address of the TLS
|
||||
image is invalid and the following memcpy in __libc_setup_tls() leads
|
||||
to a segfault.
|
||||
If /proc/sys/kernel/randomize_va_space is activated (default: 2), there
|
||||
is enough space after HEAP.
|
||||
|
||||
- glibc
|
||||
- "Linux: Define MMAP_CALL_INTERNAL"
|
||||
https://sourceware.org/git/?p=glibc.git;a=commit;h=c1b68685d438373efe64e5f076f4215723004dfb
|
||||
- "i386: Remove OPTIMIZE_FOR_GCC_5 from Linux libc-do-syscall.S"
|
||||
https://sourceware.org/git/?p=glibc.git;a=commit;h=6e5c7a1e262961adb52443ab91bd2c9b72316402
|
||||
- "i386: Honor I386_USE_SYSENTER for 6-argument Linux system calls"
|
||||
https://sourceware.org/git/?p=glibc.git;a=commit;h=60f0f2130d30cfd008ca39743027f1e200592dff
|
||||
- "ia64: Always define IA64_USE_NEW_STUB as a flag macro"
|
||||
https://sourceware.org/git/?p=glibc.git;a=commit;h=18bd9c3d3b1b6a9182698c85354578d1d58e9d64
|
||||
- "Linux: Implement a useful version of _startup_fatal"
|
||||
https://sourceware.org/git/?p=glibc.git;a=commit;h=a2a6bce7d7e52c1c34369a7da62c501cc350bc31
|
||||
- "Linux: Introduce __brk_call for invoking the brk system call"
|
||||
https://sourceware.org/git/?p=glibc.git;a=commit;h=b57ab258c1140bc45464b4b9908713e3e0ee35aa
|
||||
- "csu: Implement and use _dl_early_allocate during static startup"
|
||||
https://sourceware.org/git/?p=glibc.git;a=commit;h=f787e138aa0bf677bf74fa2a08595c446292f3d7
|
||||
The mentioned patch series by Florian Weimer avoids the mentioned failing
|
||||
sbrk syscall by falling back to mmap.
|
||||
|
||||
This commit also adjusts startup code in start.S to be ready for static PIE.
|
||||
We have to add a wrapper function for main as we are not allowed to use
|
||||
GOT relocations before __libc_start_main is called.
|
||||
(Compare also to:
|
||||
- commit 14d886edbd3d80b771e1c42fbd9217f9074de9c6
|
||||
"aarch64: fix start code for static pie"
|
||||
- commit 3d1d79283e6de4f7c434cb67fb53a4fd28359669
|
||||
"aarch64: fix static pie enabled libc when main is in a shared library"
|
||||
)
|
||||
|
||||
(cherry picked from commit 728894dba4a19578bd803906de184a8dd51ed13c)
|
||||
|
||||
diff --git a/sysdeps/s390/s390-64/configure b/sysdeps/s390/s390-64/configure
|
||||
new file mode 100644
|
||||
index 0000000000000000..101c570d2e62da25
|
||||
--- /dev/null
|
||||
+++ b/sysdeps/s390/s390-64/configure
|
||||
@@ -0,0 +1,122 @@
|
||||
+# This file is generated from configure.ac by Autoconf. DO NOT EDIT!
|
||||
+ # Local configure fragment for sysdeps/s390/s390-64.
|
||||
+
|
||||
+# Minimal checking for static PIE support in ld.
|
||||
+# Compare to ld testcase/bugzilla:
|
||||
+# <binutils-source>/ld/testsuite/ld-elf/pr22263-1.rd
|
||||
+{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for s390-specific static PIE requirements" >&5
|
||||
+$as_echo_n "checking for s390-specific static PIE requirements... " >&6; }
|
||||
+if { as_var=\
|
||||
+libc_cv_s390x_staticpie_req; eval \${$as_var+:} false; }; then :
|
||||
+ $as_echo_n "(cached) " >&6
|
||||
+else
|
||||
+ cat > conftest1.c <<EOF
|
||||
+__thread int * foo;
|
||||
+
|
||||
+void
|
||||
+bar (void)
|
||||
+{
|
||||
+ *foo = 1;
|
||||
+}
|
||||
+EOF
|
||||
+ cat > conftest2.c <<EOF
|
||||
+extern __thread int *foo;
|
||||
+extern void bar (void);
|
||||
+static int x;
|
||||
+
|
||||
+int
|
||||
+main ()
|
||||
+{
|
||||
+ foo = &x;
|
||||
+ return 0;
|
||||
+}
|
||||
+EOF
|
||||
+ libc_cv_s390x_staticpie_req=no
|
||||
+ if { ac_try='${CC-cc} $CFLAGS $CPPFLAGS $LDFLAGS -fPIE -c conftest1.c -o conftest1.o'
|
||||
+ { { eval echo "\"\$as_me\":${as_lineno-$LINENO}: \"$ac_try\""; } >&5
|
||||
+ (eval $ac_try) 2>&5
|
||||
+ ac_status=$?
|
||||
+ $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
|
||||
+ test $ac_status = 0; }; } \
|
||||
+ && { ac_try='${CC-cc} $CFLAGS $CPPFLAGS $LDFLAGS -fPIE -c conftest2.c -o conftest2.o'
|
||||
+ { { eval echo "\"\$as_me\":${as_lineno-$LINENO}: \"$ac_try\""; } >&5
|
||||
+ (eval $ac_try) 2>&5
|
||||
+ ac_status=$?
|
||||
+ $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
|
||||
+ test $ac_status = 0; }; } \
|
||||
+ && { ac_try='${CC-cc} $CFLAGS $CPPFLAGS $LDFLAGS -pie -o conftest conftest1.o conftest2.o'
|
||||
+ { { eval echo "\"\$as_me\":${as_lineno-$LINENO}: \"$ac_try\""; } >&5
|
||||
+ (eval $ac_try) 2>&5
|
||||
+ ac_status=$?
|
||||
+ $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
|
||||
+ test $ac_status = 0; }; } \
|
||||
+ && { ac_try='! readelf -Wr conftest | grep R_390_TLS_TPOFF'
|
||||
+ { { eval echo "\"\$as_me\":${as_lineno-$LINENO}: \"$ac_try\""; } >&5
|
||||
+ (eval $ac_try) 2>&5
|
||||
+ ac_status=$?
|
||||
+ $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
|
||||
+ test $ac_status = 0; }; }
|
||||
+ then
|
||||
+ libc_cv_s390x_staticpie_req=yes
|
||||
+ fi
|
||||
+ rm -rf conftest.*
|
||||
+fi
|
||||
+eval ac_res=\$\
|
||||
+libc_cv_s390x_staticpie_req
|
||||
+ { $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_res" >&5
|
||||
+$as_echo "$ac_res" >&6; }
|
||||
+if test $libc_cv_s390x_staticpie_req = yes; then
|
||||
+ # Static PIE is supported only on 64bit.
|
||||
+ # Ensure you also have those patches for:
|
||||
+ # - binutils (ld)
|
||||
+ # - "[PR ld/22263] s390: Avoid dynamic TLS relocs in PIE"
|
||||
+ # https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=26b1426577b5dcb32d149c64cca3e603b81948a9
|
||||
+ # (Tested by configure check above)
|
||||
+ # Otherwise there will be a R_390_TLS_TPOFF relocation, which fails to
|
||||
+ # be processed in _dl_relocate_static_pie() as static TLS map is not setup.
|
||||
+ # - "s390: Add DT_JMPREL pointing to .rela.[i]plt with static-pie"
|
||||
+ # https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=d942d8db12adf4c9e5c7d9ed6496a779ece7149e
|
||||
+ # (We can't test it in configure as we are not able to link a static PIE
|
||||
+ # executable if the system glibc lacks static PIE support)
|
||||
+ # Otherwise there won't be DT_JMPREL, DT_PLTRELA, DT_PLTRELASZ entries
|
||||
+ # and the IFUNC symbols are not processed, which leads to crashes.
|
||||
+ #
|
||||
+ # - kernel (the mentioned links to the commits belong to 5.19 merge window):
|
||||
+ # - "s390/mmap: increase stack/mmap gap to 128MB"
|
||||
+ # https://git.kernel.org/pub/scm/linux/kernel/git/s390/linux.git/commit/?h=features&id=f2f47d0ef72c30622e62471903ea19446ea79ee2
|
||||
+ # - "s390/vdso: move vdso mapping to its own function"
|
||||
+ # https://git.kernel.org/pub/scm/linux/kernel/git/s390/linux.git/commit/?h=features&id=57761da4dc5cd60bed2c81ba0edb7495c3c740b8
|
||||
+ # - "s390/vdso: map vdso above stack"
|
||||
+ # https://git.kernel.org/pub/scm/linux/kernel/git/s390/linux.git/commit/?h=features&id=9e37a2e8546f9e48ea76c839116fa5174d14e033
|
||||
+ # - "s390/vdso: add vdso randomization"
|
||||
+ # https://git.kernel.org/pub/scm/linux/kernel/git/s390/linux.git/commit/?h=features&id=41cd81abafdc4e58a93fcb677712a76885e3ca25
|
||||
+ # (We can't test the kernel of the target system)
|
||||
+ # Otherwise if /proc/sys/kernel/randomize_va_space is turned off (0),
|
||||
+ # static PIE executables like ldconfig will crash. While startup sbrk is
|
||||
+ # used to enlarge the HEAP. Unfortunately the underlying brk syscall fails
|
||||
+ # as there is not enough space after the HEAP. Then the address of the TLS
|
||||
+ # image is invalid and the following memcpy in __libc_setup_tls() leads
|
||||
+ # to a segfault.
|
||||
+ # If /proc/sys/kernel/randomize_va_space is activated (default: 2), there
|
||||
+ # is enough space after HEAP.
|
||||
+ #
|
||||
+ # - glibc
|
||||
+ # - "Linux: Define MMAP_CALL_INTERNAL"
|
||||
+ # https://sourceware.org/git/?p=glibc.git;a=commit;h=c1b68685d438373efe64e5f076f4215723004dfb
|
||||
+ # - "i386: Remove OPTIMIZE_FOR_GCC_5 from Linux libc-do-syscall.S"
|
||||
+ # https://sourceware.org/git/?p=glibc.git;a=commit;h=6e5c7a1e262961adb52443ab91bd2c9b72316402
|
||||
+ # - "i386: Honor I386_USE_SYSENTER for 6-argument Linux system calls"
|
||||
+ # https://sourceware.org/git/?p=glibc.git;a=commit;h=60f0f2130d30cfd008ca39743027f1e200592dff
|
||||
+ # - "ia64: Always define IA64_USE_NEW_STUB as a flag macro"
|
||||
+ # https://sourceware.org/git/?p=glibc.git;a=commit;h=18bd9c3d3b1b6a9182698c85354578d1d58e9d64
|
||||
+ # - "Linux: Implement a useful version of _startup_fatal"
|
||||
+ # https://sourceware.org/git/?p=glibc.git;a=commit;h=a2a6bce7d7e52c1c34369a7da62c501cc350bc31
|
||||
+ # - "Linux: Introduce __brk_call for invoking the brk system call"
|
||||
+ # https://sourceware.org/git/?p=glibc.git;a=commit;h=b57ab258c1140bc45464b4b9908713e3e0ee35aa
|
||||
+ # - "csu: Implement and use _dl_early_allocate during static startup"
|
||||
+ # https://sourceware.org/git/?p=glibc.git;a=commit;h=f787e138aa0bf677bf74fa2a08595c446292f3d7
|
||||
+ # The mentioned patch series by Florian Weimer avoids the mentioned failing
|
||||
+ # sbrk syscall by falling back to mmap.
|
||||
+ $as_echo "#define SUPPORT_STATIC_PIE 1" >>confdefs.h
|
||||
+
|
||||
+fi
|
||||
diff --git a/sysdeps/s390/s390-64/configure.ac b/sysdeps/s390/s390-64/configure.ac
|
||||
new file mode 100644
|
||||
index 0000000000000000..2583a4a3350ac11f
|
||||
--- /dev/null
|
||||
+++ b/sysdeps/s390/s390-64/configure.ac
|
||||
@@ -0,0 +1,92 @@
|
||||
+GLIBC_PROVIDES dnl See aclocal.m4 in the top level source directory.
|
||||
+# Local configure fragment for sysdeps/s390/s390-64.
|
||||
+
|
||||
+# Minimal checking for static PIE support in ld.
|
||||
+# Compare to ld testcase/bugzilla:
|
||||
+# <binutils-source>/ld/testsuite/ld-elf/pr22263-1.rd
|
||||
+AC_CACHE_CHECK([for s390-specific static PIE requirements], \
|
||||
+[libc_cv_s390x_staticpie_req], [dnl
|
||||
+ cat > conftest1.c <<EOF
|
||||
+__thread int * foo;
|
||||
+
|
||||
+void
|
||||
+bar (void)
|
||||
+{
|
||||
+ *foo = 1;
|
||||
+}
|
||||
+EOF
|
||||
+ cat > conftest2.c <<EOF
|
||||
+extern __thread int *foo;
|
||||
+extern void bar (void);
|
||||
+static int x;
|
||||
+
|
||||
+int
|
||||
+main ()
|
||||
+{
|
||||
+ foo = &x;
|
||||
+ return 0;
|
||||
+}
|
||||
+EOF
|
||||
+ libc_cv_s390x_staticpie_req=no
|
||||
+ if AC_TRY_COMMAND([${CC-cc} $CFLAGS $CPPFLAGS $LDFLAGS -fPIE -c conftest1.c -o conftest1.o]) \
|
||||
+ && AC_TRY_COMMAND([${CC-cc} $CFLAGS $CPPFLAGS $LDFLAGS -fPIE -c conftest2.c -o conftest2.o]) \
|
||||
+ && AC_TRY_COMMAND([${CC-cc} $CFLAGS $CPPFLAGS $LDFLAGS -pie -o conftest conftest1.o conftest2.o]) \
|
||||
+ && AC_TRY_COMMAND([! readelf -Wr conftest | grep R_390_TLS_TPOFF])
|
||||
+ then
|
||||
+ libc_cv_s390x_staticpie_req=yes
|
||||
+ fi
|
||||
+ rm -rf conftest.*])
|
||||
+if test $libc_cv_s390x_staticpie_req = yes; then
|
||||
+ # Static PIE is supported only on 64bit.
|
||||
+ # Ensure you also have those patches for:
|
||||
+ # - binutils (ld)
|
||||
+ # - "[PR ld/22263] s390: Avoid dynamic TLS relocs in PIE"
|
||||
+ # https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=26b1426577b5dcb32d149c64cca3e603b81948a9
|
||||
+ # (Tested by configure check above)
|
||||
+ # Otherwise there will be a R_390_TLS_TPOFF relocation, which fails to
|
||||
+ # be processed in _dl_relocate_static_pie() as static TLS map is not setup.
|
||||
+ # - "s390: Add DT_JMPREL pointing to .rela.[i]plt with static-pie"
|
||||
+ # https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=d942d8db12adf4c9e5c7d9ed6496a779ece7149e
|
||||
+ # (We can't test it in configure as we are not able to link a static PIE
|
||||
+ # executable if the system glibc lacks static PIE support)
|
||||
+ # Otherwise there won't be DT_JMPREL, DT_PLTRELA, DT_PLTRELASZ entries
|
||||
+ # and the IFUNC symbols are not processed, which leads to crashes.
|
||||
+ #
|
||||
+ # - kernel (the mentioned links to the commits belong to 5.19 merge window):
|
||||
+ # - "s390/mmap: increase stack/mmap gap to 128MB"
|
||||
+ # https://git.kernel.org/pub/scm/linux/kernel/git/s390/linux.git/commit/?h=features&id=f2f47d0ef72c30622e62471903ea19446ea79ee2
|
||||
+ # - "s390/vdso: move vdso mapping to its own function"
|
||||
+ # https://git.kernel.org/pub/scm/linux/kernel/git/s390/linux.git/commit/?h=features&id=57761da4dc5cd60bed2c81ba0edb7495c3c740b8
|
||||
+ # - "s390/vdso: map vdso above stack"
|
||||
+ # https://git.kernel.org/pub/scm/linux/kernel/git/s390/linux.git/commit/?h=features&id=9e37a2e8546f9e48ea76c839116fa5174d14e033
|
||||
+ # - "s390/vdso: add vdso randomization"
|
||||
+ # https://git.kernel.org/pub/scm/linux/kernel/git/s390/linux.git/commit/?h=features&id=41cd81abafdc4e58a93fcb677712a76885e3ca25
|
||||
+ # (We can't test the kernel of the target system)
|
||||
+ # Otherwise if /proc/sys/kernel/randomize_va_space is turned off (0),
|
||||
+ # static PIE executables like ldconfig will crash. While startup sbrk is
|
||||
+ # used to enlarge the HEAP. Unfortunately the underlying brk syscall fails
|
||||
+ # as there is not enough space after the HEAP. Then the address of the TLS
|
||||
+ # image is invalid and the following memcpy in __libc_setup_tls() leads
|
||||
+ # to a segfault.
|
||||
+ # If /proc/sys/kernel/randomize_va_space is activated (default: 2), there
|
||||
+ # is enough space after HEAP.
|
||||
+ #
|
||||
+ # - glibc
|
||||
+ # - "Linux: Define MMAP_CALL_INTERNAL"
|
||||
+ # https://sourceware.org/git/?p=glibc.git;a=commit;h=c1b68685d438373efe64e5f076f4215723004dfb
|
||||
+ # - "i386: Remove OPTIMIZE_FOR_GCC_5 from Linux libc-do-syscall.S"
|
||||
+ # https://sourceware.org/git/?p=glibc.git;a=commit;h=6e5c7a1e262961adb52443ab91bd2c9b72316402
|
||||
+ # - "i386: Honor I386_USE_SYSENTER for 6-argument Linux system calls"
|
||||
+ # https://sourceware.org/git/?p=glibc.git;a=commit;h=60f0f2130d30cfd008ca39743027f1e200592dff
|
||||
+ # - "ia64: Always define IA64_USE_NEW_STUB as a flag macro"
|
||||
+ # https://sourceware.org/git/?p=glibc.git;a=commit;h=18bd9c3d3b1b6a9182698c85354578d1d58e9d64
|
||||
+ # - "Linux: Implement a useful version of _startup_fatal"
|
||||
+ # https://sourceware.org/git/?p=glibc.git;a=commit;h=a2a6bce7d7e52c1c34369a7da62c501cc350bc31
|
||||
+ # - "Linux: Introduce __brk_call for invoking the brk system call"
|
||||
+ # https://sourceware.org/git/?p=glibc.git;a=commit;h=b57ab258c1140bc45464b4b9908713e3e0ee35aa
|
||||
+ # - "csu: Implement and use _dl_early_allocate during static startup"
|
||||
+ # https://sourceware.org/git/?p=glibc.git;a=commit;h=f787e138aa0bf677bf74fa2a08595c446292f3d7
|
||||
+ # The mentioned patch series by Florian Weimer avoids the mentioned failing
|
||||
+ # sbrk syscall by falling back to mmap.
|
||||
+ AC_DEFINE(SUPPORT_STATIC_PIE)
|
||||
+fi
|
||||
diff --git a/sysdeps/s390/s390-64/start.S b/sysdeps/s390/s390-64/start.S
|
||||
index 4e6526308aee3c00..b4a66e4a97b83397 100644
|
||||
--- a/sysdeps/s390/s390-64/start.S
|
||||
+++ b/sysdeps/s390/s390-64/start.S
|
||||
@@ -85,10 +85,25 @@ _start:
|
||||
|
||||
/* Ok, now branch to the libc main routine. */
|
||||
#ifdef PIC
|
||||
+# ifdef SHARED
|
||||
+ /* Used for dynamic linked position independent executable.
|
||||
+ => Scrt1.o */
|
||||
larl %r2,main@GOTENT # load pointer to main
|
||||
lg %r2,0(%r2)
|
||||
+# else
|
||||
+ /* Used for dynamic linked position dependent executable.
|
||||
+ => crt1.o (glibc configured without --disable-default-pie:
|
||||
+ PIC is defined)
|
||||
+ Or for static linked position independent executable.
|
||||
+ => rcrt1.o (only available if glibc configured without
|
||||
+ --disable-default-pie: PIC is defined) */
|
||||
+ larl %r2,__wrap_main
|
||||
+# endif
|
||||
brasl %r14,__libc_start_main@plt
|
||||
#else
|
||||
+ /* Used for dynamic/static linked position dependent executable.
|
||||
+ => crt1.o (glibc configured with --disable-default-pie:
|
||||
+ PIC and SHARED are not defined) */
|
||||
larl %r2,main # load pointer to main
|
||||
brasl %r14,__libc_start_main
|
||||
#endif
|
||||
@@ -98,6 +113,19 @@ _start:
|
||||
|
||||
cfi_endproc
|
||||
|
||||
+#if defined PIC && !defined SHARED
|
||||
+ /* When main is not defined in the executable but in a shared library
|
||||
+ then a wrapper is needed in crt1.o of the static-pie enabled libc,
|
||||
+ because crt1.o and rcrt1.o share code and the later must avoid the
|
||||
+ use of GOT relocations before __libc_start_main is called. */
|
||||
+__wrap_main:
|
||||
+ cfi_startproc
|
||||
+ larl %r1,main@GOTENT # load pointer to main
|
||||
+ lg %r1,0(%r1)
|
||||
+ br %r1
|
||||
+ cfi_endproc
|
||||
+#endif
|
||||
+
|
||||
/* Define a symbol for the first piece of initialized data. */
|
||||
.data
|
||||
.globl __data_start
|
301
glibc-upstream-2.34-254.patch
Normal file
301
glibc-upstream-2.34-254.patch
Normal file
@ -0,0 +1,301 @@
|
||||
commit c73c79af7d6f1124fbfa5d935b4f620217d6a2ec
|
||||
Author: Szabolcs Nagy <szabolcs.nagy@arm.com>
|
||||
Date: Fri Jun 15 16:14:58 2018 +0100
|
||||
|
||||
rtld: Use generic argv adjustment in ld.so [BZ #23293]
|
||||
|
||||
When an executable is invoked as
|
||||
|
||||
./ld.so [ld.so-args] ./exe [exe-args]
|
||||
|
||||
then the argv is adujusted in ld.so before calling the entry point of
|
||||
the executable so ld.so args are not visible to it. On most targets
|
||||
this requires moving argv, env and auxv on the stack to ensure correct
|
||||
stack alignment at the entry point. This had several issues:
|
||||
|
||||
- The code for this adjustment on the stack is written in asm as part
|
||||
of the target specific ld.so _start code which is hard to maintain.
|
||||
|
||||
- The adjustment is done after _dl_start returns, where it's too late
|
||||
to update GLRO(dl_auxv), as it is already readonly, so it points to
|
||||
memory that was clobbered by the adjustment. This is bug 23293.
|
||||
|
||||
- _environ is also wrong in ld.so after the adjustment, but it is
|
||||
likely not used after _dl_start returns so this is not user visible.
|
||||
|
||||
- _dl_argv was updated, but for this it was moved out of relro, which
|
||||
changes security properties across targets unnecessarily.
|
||||
|
||||
This patch introduces a generic _dl_start_args_adjust function that
|
||||
handles the argument adjustments after ld.so processed its own args
|
||||
and before relro protection is applied.
|
||||
|
||||
The same algorithm is used on all targets, _dl_skip_args is now 0, so
|
||||
existing target specific adjustment code is no longer used. The bug
|
||||
affects aarch64, alpha, arc, arm, csky, ia64, nios2, s390-32 and sparc,
|
||||
other targets don't need the change in principle, only for consistency.
|
||||
|
||||
The GNU Hurd start code relied on _dl_skip_args after dl_main returned,
|
||||
now it checks directly if args were adjusted and fixes the Hurd startup
|
||||
data accordingly.
|
||||
|
||||
Follow up patches can remove _dl_skip_args and DL_ARGV_NOT_RELRO.
|
||||
|
||||
Tested on aarch64-linux-gnu and cross tested on i686-gnu.
|
||||
|
||||
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
|
||||
(cherry picked from commit ad43cac44a6860eaefcadadfb2acb349921e96bf)
|
||||
|
||||
Conflicts:
|
||||
elf/rtld.c
|
||||
(Downstream-only backport of glibc-rh2023422-1.patch)
|
||||
|
||||
diff --git a/elf/rtld.c b/elf/rtld.c
|
||||
index 434fbeddd5cce74d..9de53ccaed420a57 100644
|
||||
--- a/elf/rtld.c
|
||||
+++ b/elf/rtld.c
|
||||
@@ -1121,6 +1121,62 @@ rtld_chain_load (struct link_map *main_map, char *argv0)
|
||||
rtld_soname, pathname, errcode);
|
||||
}
|
||||
|
||||
+/* Adjusts the contents of the stack and related globals for the user
|
||||
+ entry point. The ld.so processed skip_args arguments and bumped
|
||||
+ _dl_argv and _dl_argc accordingly. Those arguments are removed from
|
||||
+ argv here. */
|
||||
+static void
|
||||
+_dl_start_args_adjust (int skip_args)
|
||||
+{
|
||||
+ void **sp = (void **) (_dl_argv - skip_args - 1);
|
||||
+ void **p = sp + skip_args;
|
||||
+
|
||||
+ if (skip_args == 0)
|
||||
+ return;
|
||||
+
|
||||
+ /* Sanity check. */
|
||||
+ intptr_t argc = (intptr_t) sp[0] - skip_args;
|
||||
+ assert (argc == _dl_argc);
|
||||
+
|
||||
+ /* Adjust argc on stack. */
|
||||
+ sp[0] = (void *) (intptr_t) _dl_argc;
|
||||
+
|
||||
+ /* Update globals in rtld. */
|
||||
+ _dl_argv -= skip_args;
|
||||
+ _environ -= skip_args;
|
||||
+
|
||||
+ /* Shuffle argv down. */
|
||||
+ do
|
||||
+ *++sp = *++p;
|
||||
+ while (*p != NULL);
|
||||
+
|
||||
+ assert (_environ == (char **) (sp + 1));
|
||||
+
|
||||
+ /* Shuffle envp down. */
|
||||
+ do
|
||||
+ *++sp = *++p;
|
||||
+ while (*p != NULL);
|
||||
+
|
||||
+#ifdef HAVE_AUX_VECTOR
|
||||
+ void **auxv = (void **) GLRO(dl_auxv) - skip_args;
|
||||
+ GLRO(dl_auxv) = (ElfW(auxv_t) *) auxv; /* Aliasing violation. */
|
||||
+ assert (auxv == sp + 1);
|
||||
+
|
||||
+ /* Shuffle auxv down. */
|
||||
+ ElfW(auxv_t) ax;
|
||||
+ char *oldp = (char *) (p + 1);
|
||||
+ char *newp = (char *) (sp + 1);
|
||||
+ do
|
||||
+ {
|
||||
+ memcpy (&ax, oldp, sizeof (ax));
|
||||
+ memcpy (newp, &ax, sizeof (ax));
|
||||
+ oldp += sizeof (ax);
|
||||
+ newp += sizeof (ax);
|
||||
+ }
|
||||
+ while (ax.a_type != AT_NULL);
|
||||
+#endif
|
||||
+}
|
||||
+
|
||||
static void
|
||||
dl_main (const ElfW(Phdr) *phdr,
|
||||
ElfW(Word) phnum,
|
||||
@@ -1177,6 +1233,7 @@ dl_main (const ElfW(Phdr) *phdr,
|
||||
rtld_is_main = true;
|
||||
|
||||
char *argv0 = NULL;
|
||||
+ char **orig_argv = _dl_argv;
|
||||
|
||||
/* Note the place where the dynamic linker actually came from. */
|
||||
GL(dl_rtld_map).l_name = rtld_progname;
|
||||
@@ -1191,7 +1248,6 @@ dl_main (const ElfW(Phdr) *phdr,
|
||||
GLRO(dl_lazy) = -1;
|
||||
}
|
||||
|
||||
- ++_dl_skip_args;
|
||||
--_dl_argc;
|
||||
++_dl_argv;
|
||||
}
|
||||
@@ -1200,14 +1256,12 @@ dl_main (const ElfW(Phdr) *phdr,
|
||||
if (state.mode != rtld_mode_help)
|
||||
state.mode = rtld_mode_verify;
|
||||
|
||||
- ++_dl_skip_args;
|
||||
--_dl_argc;
|
||||
++_dl_argv;
|
||||
}
|
||||
else if (! strcmp (_dl_argv[1], "--inhibit-cache"))
|
||||
{
|
||||
GLRO(dl_inhibit_cache) = 1;
|
||||
- ++_dl_skip_args;
|
||||
--_dl_argc;
|
||||
++_dl_argv;
|
||||
}
|
||||
@@ -1217,7 +1271,6 @@ dl_main (const ElfW(Phdr) *phdr,
|
||||
state.library_path = _dl_argv[2];
|
||||
state.library_path_source = "--library-path";
|
||||
|
||||
- _dl_skip_args += 2;
|
||||
_dl_argc -= 2;
|
||||
_dl_argv += 2;
|
||||
}
|
||||
@@ -1226,7 +1279,6 @@ dl_main (const ElfW(Phdr) *phdr,
|
||||
{
|
||||
GLRO(dl_inhibit_rpath) = _dl_argv[2];
|
||||
|
||||
- _dl_skip_args += 2;
|
||||
_dl_argc -= 2;
|
||||
_dl_argv += 2;
|
||||
}
|
||||
@@ -1234,14 +1286,12 @@ dl_main (const ElfW(Phdr) *phdr,
|
||||
{
|
||||
audit_list_add_string (&state.audit_list, _dl_argv[2]);
|
||||
|
||||
- _dl_skip_args += 2;
|
||||
_dl_argc -= 2;
|
||||
_dl_argv += 2;
|
||||
}
|
||||
else if (! strcmp (_dl_argv[1], "--preload") && _dl_argc > 2)
|
||||
{
|
||||
state.preloadarg = _dl_argv[2];
|
||||
- _dl_skip_args += 2;
|
||||
_dl_argc -= 2;
|
||||
_dl_argv += 2;
|
||||
}
|
||||
@@ -1249,7 +1299,6 @@ dl_main (const ElfW(Phdr) *phdr,
|
||||
{
|
||||
argv0 = _dl_argv[2];
|
||||
|
||||
- _dl_skip_args += 2;
|
||||
_dl_argc -= 2;
|
||||
_dl_argv += 2;
|
||||
}
|
||||
@@ -1257,7 +1306,6 @@ dl_main (const ElfW(Phdr) *phdr,
|
||||
&& _dl_argc > 2)
|
||||
{
|
||||
state.glibc_hwcaps_prepend = _dl_argv[2];
|
||||
- _dl_skip_args += 2;
|
||||
_dl_argc -= 2;
|
||||
_dl_argv += 2;
|
||||
}
|
||||
@@ -1265,7 +1313,6 @@ dl_main (const ElfW(Phdr) *phdr,
|
||||
&& _dl_argc > 2)
|
||||
{
|
||||
state.glibc_hwcaps_mask = _dl_argv[2];
|
||||
- _dl_skip_args += 2;
|
||||
_dl_argc -= 2;
|
||||
_dl_argv += 2;
|
||||
}
|
||||
@@ -1274,7 +1321,6 @@ dl_main (const ElfW(Phdr) *phdr,
|
||||
{
|
||||
state.mode = rtld_mode_list_tunables;
|
||||
|
||||
- ++_dl_skip_args;
|
||||
--_dl_argc;
|
||||
++_dl_argv;
|
||||
}
|
||||
@@ -1283,7 +1329,6 @@ dl_main (const ElfW(Phdr) *phdr,
|
||||
{
|
||||
state.mode = rtld_mode_list_diagnostics;
|
||||
|
||||
- ++_dl_skip_args;
|
||||
--_dl_argc;
|
||||
++_dl_argv;
|
||||
}
|
||||
@@ -1329,7 +1374,6 @@ dl_main (const ElfW(Phdr) *phdr,
|
||||
_dl_usage (ld_so_name, NULL);
|
||||
}
|
||||
|
||||
- ++_dl_skip_args;
|
||||
--_dl_argc;
|
||||
++_dl_argv;
|
||||
|
||||
@@ -1428,6 +1472,9 @@ dl_main (const ElfW(Phdr) *phdr,
|
||||
/* Set the argv[0] string now that we've processed the executable. */
|
||||
if (argv0 != NULL)
|
||||
_dl_argv[0] = argv0;
|
||||
+
|
||||
+ /* Adjust arguments for the application entry point. */
|
||||
+ _dl_start_args_adjust (_dl_argv - orig_argv);
|
||||
}
|
||||
else
|
||||
{
|
||||
diff --git a/sysdeps/mach/hurd/dl-sysdep.c b/sysdeps/mach/hurd/dl-sysdep.c
|
||||
index 4b2072e5d5e3bfd2..5c0f8e46bfbd4753 100644
|
||||
--- a/sysdeps/mach/hurd/dl-sysdep.c
|
||||
+++ b/sysdeps/mach/hurd/dl-sysdep.c
|
||||
@@ -106,6 +106,7 @@ _dl_sysdep_start (void **start_argptr,
|
||||
{
|
||||
void go (intptr_t *argdata)
|
||||
{
|
||||
+ char *orig_argv0;
|
||||
char **p;
|
||||
|
||||
/* Cache the information in various global variables. */
|
||||
@@ -114,6 +115,8 @@ _dl_sysdep_start (void **start_argptr,
|
||||
_environ = &_dl_argv[_dl_argc + 1];
|
||||
for (p = _environ; *p++;); /* Skip environ pointers and terminator. */
|
||||
|
||||
+ orig_argv0 = _dl_argv[0];
|
||||
+
|
||||
if ((void *) p == _dl_argv[0])
|
||||
{
|
||||
static struct hurd_startup_data nodata;
|
||||
@@ -204,30 +207,23 @@ unfmh(); /* XXX */
|
||||
|
||||
/* The call above might screw a few things up.
|
||||
|
||||
- First of all, if _dl_skip_args is nonzero, we are ignoring
|
||||
- the first few arguments. However, if we have no Hurd startup
|
||||
- data, it is the magical convention that ARGV[0] == P. The
|
||||
+ P is the location after the terminating NULL of the list of
|
||||
+ environment variables. It has to point to the Hurd startup
|
||||
+ data or if that's missing then P == ARGV[0] must hold. The
|
||||
startup code in init-first.c will get confused if this is not
|
||||
the case, so we must rearrange things to make it so. We'll
|
||||
- overwrite the origional ARGV[0] at P with ARGV[_dl_skip_args].
|
||||
+ recompute P and move the Hurd data or the new ARGV[0] there.
|
||||
|
||||
- Secondly, if we need to be secure, it removes some dangerous
|
||||
- environment variables. If we have no Hurd startup date this
|
||||
- changes P (since that's the location after the terminating
|
||||
- NULL in the list of environment variables). We do the same
|
||||
- thing as in the first case but make sure we recalculate P.
|
||||
- If we do have Hurd startup data, we have to move the data
|
||||
- such that it starts just after the terminating NULL in the
|
||||
- environment list.
|
||||
+ Note: directly invoked ld.so can move arguments and env vars.
|
||||
|
||||
We use memmove, since the locations might overlap. */
|
||||
- if (__libc_enable_secure || _dl_skip_args)
|
||||
- {
|
||||
- char **newp;
|
||||
|
||||
- for (newp = _environ; *newp++;);
|
||||
+ char **newp;
|
||||
+ for (newp = _environ; *newp++;);
|
||||
|
||||
- if (_dl_argv[-_dl_skip_args] == (char *) p)
|
||||
+ if (newp != p || _dl_argv[0] != orig_argv0)
|
||||
+ {
|
||||
+ if (orig_argv0 == (char *) p)
|
||||
{
|
||||
if ((char *) newp != _dl_argv[0])
|
||||
{
|
105
glibc-upstream-2.34-255.patch
Normal file
105
glibc-upstream-2.34-255.patch
Normal file
@ -0,0 +1,105 @@
|
||||
commit b2585cae2854d7d2868fb2e51e2796042c5e0679
|
||||
Author: Szabolcs Nagy <szabolcs.nagy@arm.com>
|
||||
Date: Tue May 3 13:18:04 2022 +0100
|
||||
|
||||
linux: Add a getauxval test [BZ #23293]
|
||||
|
||||
This is for bug 23293 and it relies on the glibc test system running
|
||||
tests via explicit ld.so invokation by default.
|
||||
|
||||
Reviewed-by: Florian Weimer <fweimer@redhat.com>
|
||||
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
|
||||
(cherry picked from commit 9faf5262c77487c96da8a3e961b88c0b1879e186)
|
||||
|
||||
diff --git a/sysdeps/unix/sysv/linux/Makefile b/sysdeps/unix/sysv/linux/Makefile
|
||||
index 0657f4003e7116c6..5c772f69d1b1f1f1 100644
|
||||
--- a/sysdeps/unix/sysv/linux/Makefile
|
||||
+++ b/sysdeps/unix/sysv/linux/Makefile
|
||||
@@ -123,6 +123,7 @@ tests += tst-clone tst-clone2 tst-clone3 tst-fanotify tst-personality \
|
||||
tst-close_range \
|
||||
tst-prctl \
|
||||
tst-scm_rights \
|
||||
+ tst-getauxval \
|
||||
# tests
|
||||
|
||||
# Test for the symbol version of fcntl that was replaced in glibc 2.28.
|
||||
diff --git a/sysdeps/unix/sysv/linux/tst-getauxval.c b/sysdeps/unix/sysv/linux/tst-getauxval.c
|
||||
new file mode 100644
|
||||
index 0000000000000000..c4b619574369f4c5
|
||||
--- /dev/null
|
||||
+++ b/sysdeps/unix/sysv/linux/tst-getauxval.c
|
||||
@@ -0,0 +1,74 @@
|
||||
+/* Basic test for getauxval.
|
||||
+ Copyright (C) 2022 Free Software Foundation, Inc.
|
||||
+ This file is part of the GNU C Library.
|
||||
+
|
||||
+ The GNU C Library is free software; you can redistribute it and/or
|
||||
+ modify it under the terms of the GNU Lesser General Public
|
||||
+ License as published by the Free Software Foundation; either
|
||||
+ version 2.1 of the License, or (at your option) any later version.
|
||||
+
|
||||
+ The GNU C Library is distributed in the hope that it will be useful,
|
||||
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
|
||||
+ Lesser General Public License for more details.
|
||||
+
|
||||
+ You should have received a copy of the GNU Lesser General Public
|
||||
+ License along with the GNU C Library; if not, see
|
||||
+ <https://www.gnu.org/licenses/>. */
|
||||
+
|
||||
+#include <unistd.h>
|
||||
+#include <stdio.h>
|
||||
+#include <support/check.h>
|
||||
+#include <sys/auxv.h>
|
||||
+
|
||||
+static int missing;
|
||||
+static int mismatch;
|
||||
+
|
||||
+static void
|
||||
+check_nonzero (unsigned long t, const char *s)
|
||||
+{
|
||||
+ unsigned long v = getauxval (t);
|
||||
+ printf ("%s: %lu (0x%lx)\n", s, v, v);
|
||||
+ if (v == 0)
|
||||
+ missing++;
|
||||
+}
|
||||
+
|
||||
+static void
|
||||
+check_eq (unsigned long t, const char *s, unsigned long want)
|
||||
+{
|
||||
+ unsigned long v = getauxval (t);
|
||||
+ printf ("%s: %lu want: %lu\n", s, v, want);
|
||||
+ if (v != want)
|
||||
+ mismatch++;
|
||||
+}
|
||||
+
|
||||
+#define NZ(x) check_nonzero (x, #x)
|
||||
+#define EQ(x, want) check_eq (x, #x, want)
|
||||
+
|
||||
+static int
|
||||
+do_test (void)
|
||||
+{
|
||||
+ /* These auxv entries should be non-zero on Linux. */
|
||||
+ NZ (AT_PHDR);
|
||||
+ NZ (AT_PHENT);
|
||||
+ NZ (AT_PHNUM);
|
||||
+ NZ (AT_PAGESZ);
|
||||
+ NZ (AT_ENTRY);
|
||||
+ NZ (AT_CLKTCK);
|
||||
+ NZ (AT_RANDOM);
|
||||
+ NZ (AT_EXECFN);
|
||||
+ if (missing)
|
||||
+ FAIL_EXIT1 ("Found %d missing auxv entries.\n", missing);
|
||||
+
|
||||
+ /* Check against syscalls. */
|
||||
+ EQ (AT_UID, getuid ());
|
||||
+ EQ (AT_EUID, geteuid ());
|
||||
+ EQ (AT_GID, getgid ());
|
||||
+ EQ (AT_EGID, getegid ());
|
||||
+ if (mismatch)
|
||||
+ FAIL_EXIT1 ("Found %d mismatching auxv entries.\n", mismatch);
|
||||
+
|
||||
+ return 0;
|
||||
+}
|
||||
+
|
||||
+#include <support/test-driver.c>
|
39
glibc-upstream-2.34-256.patch
Normal file
39
glibc-upstream-2.34-256.patch
Normal file
@ -0,0 +1,39 @@
|
||||
commit 14770f3e0462721b317f138197e1fbf4db542c94
|
||||
Author: Sergei Trofimovich <slyich@gmail.com>
|
||||
Date: Mon May 23 13:56:43 2022 +0530
|
||||
|
||||
string.h: fix __fortified_attr_access macro call [BZ #29162]
|
||||
|
||||
commit e938c0274 "Don't add access size hints to fortifiable functions"
|
||||
converted a few '__attr_access ((...))' into '__fortified_attr_access (...)'
|
||||
calls.
|
||||
|
||||
But one of conversions had double parentheses of '__fortified_attr_access (...)'.
|
||||
|
||||
Noticed as a gnat6 build failure:
|
||||
|
||||
/<<NIX>>-glibc-2.34-210-dev/include/bits/string_fortified.h:110:50: error: macro "__fortified_attr_access" requires 3 arguments, but only 1 given
|
||||
|
||||
The change fixes parentheses.
|
||||
|
||||
This is seen when using compilers that do not support
|
||||
__builtin___stpncpy_chk, e.g. gcc older than 4.7, clang older than 2.6
|
||||
or some compiler not derived from gcc or clang.
|
||||
|
||||
Signed-off-by: Sergei Trofimovich <slyich@gmail.com>
|
||||
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
|
||||
(cherry picked from commit 5a5f94af0542f9a35aaa7992c18eb4e2403a29b9)
|
||||
|
||||
diff --git a/string/bits/string_fortified.h b/string/bits/string_fortified.h
|
||||
index 218006c9ba882d9c..4e66e0bd1ebb572a 100644
|
||||
--- a/string/bits/string_fortified.h
|
||||
+++ b/string/bits/string_fortified.h
|
||||
@@ -107,7 +107,7 @@ __NTH (stpncpy (char *__dest, const char *__src, size_t __n))
|
||||
# else
|
||||
extern char *__stpncpy_chk (char *__dest, const char *__src, size_t __n,
|
||||
size_t __destlen) __THROW
|
||||
- __fortified_attr_access ((__write_only__, 1, 3))
|
||||
+ __fortified_attr_access (__write_only__, 1, 3)
|
||||
__attr_access ((__read_only__, 2));
|
||||
extern char *__REDIRECT_NTH (__stpncpy_alias, (char *__dest, const char *__src,
|
||||
size_t __n), stpncpy);
|
51
glibc-upstream-2.34-257.patch
Normal file
51
glibc-upstream-2.34-257.patch
Normal file
@ -0,0 +1,51 @@
|
||||
commit 83ae8287c1c3009459ff29241b647ff61363b22c
|
||||
Author: Noah Goldstein <goldstein.w.n@gmail.com>
|
||||
Date: Tue Feb 15 08:18:15 2022 -0600
|
||||
|
||||
x86: Fallback {str|wcs}cmp RTM in the ncmp overflow case [BZ #29127]
|
||||
|
||||
Re-cherry-pick commit c627209832 for strcmp-avx2.S change which was
|
||||
omitted in intial cherry pick because at the time this bug was not
|
||||
present on release branch.
|
||||
|
||||
Fixes BZ #29127.
|
||||
|
||||
In the overflow fallback strncmp-avx2-rtm and wcsncmp-avx2-rtm would
|
||||
call strcmp-avx2 and wcscmp-avx2 respectively. This would have
|
||||
not checks around vzeroupper and would trigger spurious
|
||||
aborts. This commit fixes that.
|
||||
|
||||
test-strcmp, test-strncmp, test-wcscmp, and test-wcsncmp all pass on
|
||||
AVX2 machines with and without RTM.
|
||||
|
||||
Co-authored-by: H.J. Lu <hjl.tools@gmail.com>
|
||||
(cherry picked from commit c6272098323153db373f2986c67786ea8c85f1cf)
|
||||
|
||||
diff --git a/sysdeps/x86_64/multiarch/strcmp-avx2.S b/sysdeps/x86_64/multiarch/strcmp-avx2.S
|
||||
index aa91f6e48a0e1ce5..a9806daadbbfd18b 100644
|
||||
--- a/sysdeps/x86_64/multiarch/strcmp-avx2.S
|
||||
+++ b/sysdeps/x86_64/multiarch/strcmp-avx2.S
|
||||
@@ -345,10 +345,10 @@ L(one_or_less):
|
||||
movq %LOCALE_REG, %rdx
|
||||
# endif
|
||||
jb L(ret_zero)
|
||||
-# ifdef USE_AS_WCSCMP
|
||||
/* 'nbe' covers the case where length is negative (large
|
||||
unsigned). */
|
||||
- jnbe __wcscmp_avx2
|
||||
+ jnbe OVERFLOW_STRCMP
|
||||
+# ifdef USE_AS_WCSCMP
|
||||
movl (%rdi), %edx
|
||||
xorl %eax, %eax
|
||||
cmpl (%rsi), %edx
|
||||
@@ -357,10 +357,6 @@ L(one_or_less):
|
||||
negl %eax
|
||||
orl $1, %eax
|
||||
# else
|
||||
- /* 'nbe' covers the case where length is negative (large
|
||||
- unsigned). */
|
||||
-
|
||||
- jnbe __strcmp_avx2
|
||||
movzbl (%rdi), %eax
|
||||
movzbl (%rsi), %ecx
|
||||
TOLOWER_gpr (%rax, %eax)
|
737
glibc-upstream-2.34-258.patch
Normal file
737
glibc-upstream-2.34-258.patch
Normal file
@ -0,0 +1,737 @@
|
||||
commit ff450cdbdee0b8cb6b9d653d6d2fa892de29be31
|
||||
Author: Arjun Shankar <arjun@redhat.com>
|
||||
Date: Tue May 24 17:57:36 2022 +0200
|
||||
|
||||
Fix deadlock when pthread_atfork handler calls pthread_atfork or dlclose
|
||||
|
||||
In multi-threaded programs, registering via pthread_atfork,
|
||||
de-registering implicitly via dlclose, or running pthread_atfork
|
||||
handlers during fork was protected by an internal lock. This meant
|
||||
that a pthread_atfork handler attempting to register another handler or
|
||||
dlclose a dynamically loaded library would lead to a deadlock.
|
||||
|
||||
This commit fixes the deadlock in the following way:
|
||||
|
||||
During the execution of handlers at fork time, the atfork lock is
|
||||
released prior to the execution of each handler and taken again upon its
|
||||
return. Any handler registrations or de-registrations that occurred
|
||||
during the execution of the handler are accounted for before proceeding
|
||||
with further handler execution.
|
||||
|
||||
If a handler that hasn't been executed yet gets de-registered by another
|
||||
handler during fork, it will not be executed. If a handler gets
|
||||
registered by another handler during fork, it will not be executed
|
||||
during that particular fork.
|
||||
|
||||
The possibility that handlers may now be registered or deregistered
|
||||
during handler execution means that identifying the next handler to be
|
||||
run after a given handler may register/de-register others requires some
|
||||
bookkeeping. The fork_handler struct has an additional field, 'id',
|
||||
which is assigned sequentially during registration. Thus, handlers are
|
||||
executed in ascending order of 'id' during 'prepare', and descending
|
||||
order of 'id' during parent/child handler execution after the fork.
|
||||
|
||||
Two tests are included:
|
||||
|
||||
* tst-atfork3: Adhemerval Zanella <adhemerval.zanella@linaro.org>
|
||||
This test exercises calling dlclose from prepare, parent, and child
|
||||
handlers.
|
||||
|
||||
* tst-atfork4: This test exercises calling pthread_atfork and dlclose
|
||||
from the prepare handler.
|
||||
|
||||
[BZ #24595, BZ #27054]
|
||||
|
||||
Co-authored-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
|
||||
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
|
||||
(cherry picked from commit 52a103e237329b9f88a28513fe7506ffc3bd8ced)
|
||||
|
||||
diff --git a/include/register-atfork.h b/include/register-atfork.h
|
||||
index fadde14700947ac6..6d7bfd87688d6530 100644
|
||||
--- a/include/register-atfork.h
|
||||
+++ b/include/register-atfork.h
|
||||
@@ -26,6 +26,7 @@ struct fork_handler
|
||||
void (*parent_handler) (void);
|
||||
void (*child_handler) (void);
|
||||
void *dso_handle;
|
||||
+ uint64_t id;
|
||||
};
|
||||
|
||||
/* Function to call to unregister fork handlers. */
|
||||
@@ -39,19 +40,18 @@ enum __run_fork_handler_type
|
||||
atfork_run_parent
|
||||
};
|
||||
|
||||
-/* Run the atfork handlers and lock/unlock the internal lock depending
|
||||
- of the WHO argument:
|
||||
-
|
||||
- - atfork_run_prepare: run all the PREPARE_HANDLER in reverse order of
|
||||
- insertion and locks the internal lock.
|
||||
- - atfork_run_child: run all the CHILD_HANDLER and unlocks the internal
|
||||
- lock.
|
||||
- - atfork_run_parent: run all the PARENT_HANDLER and unlocks the internal
|
||||
- lock.
|
||||
-
|
||||
- Perform locking only if DO_LOCKING. */
|
||||
-extern void __run_fork_handlers (enum __run_fork_handler_type who,
|
||||
- _Bool do_locking) attribute_hidden;
|
||||
+/* Run the atfork prepare handlers in the reverse order of registration and
|
||||
+ return the ID of the last registered handler. If DO_LOCKING is true, the
|
||||
+ internal lock is held locked upon return. */
|
||||
+extern uint64_t __run_prefork_handlers (_Bool do_locking) attribute_hidden;
|
||||
+
|
||||
+/* Given a handler type (parent or child), run all the atfork handlers in
|
||||
+ the order of registration up to and including the handler with id equal
|
||||
+ to LASTRUN. If DO_LOCKING is true, the internal lock is unlocked prior
|
||||
+ to return. */
|
||||
+extern void __run_postfork_handlers (enum __run_fork_handler_type who,
|
||||
+ _Bool do_locking,
|
||||
+ uint64_t lastrun) attribute_hidden;
|
||||
|
||||
/* C library side function to register new fork handlers. */
|
||||
extern int __register_atfork (void (*__prepare) (void),
|
||||
diff --git a/posix/fork.c b/posix/fork.c
|
||||
index 021691b9b7441f15..890b806eb48cb75a 100644
|
||||
--- a/posix/fork.c
|
||||
+++ b/posix/fork.c
|
||||
@@ -46,8 +46,9 @@ __libc_fork (void)
|
||||
best effort to make is async-signal-safe at least for single-thread
|
||||
case. */
|
||||
bool multiple_threads = __libc_single_threaded == 0;
|
||||
+ uint64_t lastrun;
|
||||
|
||||
- __run_fork_handlers (atfork_run_prepare, multiple_threads);
|
||||
+ lastrun = __run_prefork_handlers (multiple_threads);
|
||||
|
||||
struct nss_database_data nss_database_data;
|
||||
|
||||
@@ -105,7 +106,7 @@ __libc_fork (void)
|
||||
reclaim_stacks ();
|
||||
|
||||
/* Run the handlers registered for the child. */
|
||||
- __run_fork_handlers (atfork_run_child, multiple_threads);
|
||||
+ __run_postfork_handlers (atfork_run_child, multiple_threads, lastrun);
|
||||
}
|
||||
else
|
||||
{
|
||||
@@ -123,7 +124,7 @@ __libc_fork (void)
|
||||
}
|
||||
|
||||
/* Run the handlers registered for the parent. */
|
||||
- __run_fork_handlers (atfork_run_parent, multiple_threads);
|
||||
+ __run_postfork_handlers (atfork_run_parent, multiple_threads, lastrun);
|
||||
|
||||
if (pid < 0)
|
||||
__set_errno (save_errno);
|
||||
diff --git a/posix/register-atfork.c b/posix/register-atfork.c
|
||||
index 6fd9e4c56aafd7cc..6370437aa68e039e 100644
|
||||
--- a/posix/register-atfork.c
|
||||
+++ b/posix/register-atfork.c
|
||||
@@ -19,6 +19,8 @@
|
||||
#include <libc-lock.h>
|
||||
#include <stdbool.h>
|
||||
#include <register-atfork.h>
|
||||
+#include <intprops.h>
|
||||
+#include <stdio.h>
|
||||
|
||||
#define DYNARRAY_ELEMENT struct fork_handler
|
||||
#define DYNARRAY_STRUCT fork_handler_list
|
||||
@@ -27,7 +29,7 @@
|
||||
#include <malloc/dynarray-skeleton.c>
|
||||
|
||||
static struct fork_handler_list fork_handlers;
|
||||
-static bool fork_handler_init = false;
|
||||
+static uint64_t fork_handler_counter;
|
||||
|
||||
static int atfork_lock = LLL_LOCK_INITIALIZER;
|
||||
|
||||
@@ -37,11 +39,8 @@ __register_atfork (void (*prepare) (void), void (*parent) (void),
|
||||
{
|
||||
lll_lock (atfork_lock, LLL_PRIVATE);
|
||||
|
||||
- if (!fork_handler_init)
|
||||
- {
|
||||
- fork_handler_list_init (&fork_handlers);
|
||||
- fork_handler_init = true;
|
||||
- }
|
||||
+ if (fork_handler_counter == 0)
|
||||
+ fork_handler_list_init (&fork_handlers);
|
||||
|
||||
struct fork_handler *newp = fork_handler_list_emplace (&fork_handlers);
|
||||
if (newp != NULL)
|
||||
@@ -50,6 +49,13 @@ __register_atfork (void (*prepare) (void), void (*parent) (void),
|
||||
newp->parent_handler = parent;
|
||||
newp->child_handler = child;
|
||||
newp->dso_handle = dso_handle;
|
||||
+
|
||||
+ /* IDs assigned to handlers start at 1 and increment with handler
|
||||
+ registration. Un-registering a handlers discards the corresponding
|
||||
+ ID. It is not reused in future registrations. */
|
||||
+ if (INT_ADD_OVERFLOW (fork_handler_counter, 1))
|
||||
+ __libc_fatal ("fork handler counter overflow");
|
||||
+ newp->id = ++fork_handler_counter;
|
||||
}
|
||||
|
||||
/* Release the lock. */
|
||||
@@ -104,37 +110,111 @@ __unregister_atfork (void *dso_handle)
|
||||
lll_unlock (atfork_lock, LLL_PRIVATE);
|
||||
}
|
||||
|
||||
-void
|
||||
-__run_fork_handlers (enum __run_fork_handler_type who, _Bool do_locking)
|
||||
+uint64_t
|
||||
+__run_prefork_handlers (_Bool do_locking)
|
||||
{
|
||||
- struct fork_handler *runp;
|
||||
+ uint64_t lastrun;
|
||||
|
||||
- if (who == atfork_run_prepare)
|
||||
+ if (do_locking)
|
||||
+ lll_lock (atfork_lock, LLL_PRIVATE);
|
||||
+
|
||||
+ /* We run prepare handlers from last to first. After fork, only
|
||||
+ handlers up to the last handler found here (pre-fork) will be run.
|
||||
+ Handlers registered during __run_prefork_handlers or
|
||||
+ __run_postfork_handlers will be positioned after this last handler, and
|
||||
+ since their prepare handlers won't be run now, their parent/child
|
||||
+ handlers should also be ignored. */
|
||||
+ lastrun = fork_handler_counter;
|
||||
+
|
||||
+ size_t sl = fork_handler_list_size (&fork_handlers);
|
||||
+ for (size_t i = sl; i > 0;)
|
||||
{
|
||||
- if (do_locking)
|
||||
- lll_lock (atfork_lock, LLL_PRIVATE);
|
||||
- size_t sl = fork_handler_list_size (&fork_handlers);
|
||||
- for (size_t i = sl; i > 0; i--)
|
||||
- {
|
||||
- runp = fork_handler_list_at (&fork_handlers, i - 1);
|
||||
- if (runp->prepare_handler != NULL)
|
||||
- runp->prepare_handler ();
|
||||
- }
|
||||
+ struct fork_handler *runp
|
||||
+ = fork_handler_list_at (&fork_handlers, i - 1);
|
||||
+
|
||||
+ uint64_t id = runp->id;
|
||||
+
|
||||
+ if (runp->prepare_handler != NULL)
|
||||
+ {
|
||||
+ if (do_locking)
|
||||
+ lll_unlock (atfork_lock, LLL_PRIVATE);
|
||||
+
|
||||
+ runp->prepare_handler ();
|
||||
+
|
||||
+ if (do_locking)
|
||||
+ lll_lock (atfork_lock, LLL_PRIVATE);
|
||||
+ }
|
||||
+
|
||||
+ /* We unlocked, ran the handler, and locked again. In the
|
||||
+ meanwhile, one or more deregistrations could have occurred leading
|
||||
+ to the current (just run) handler being moved up the list or even
|
||||
+ removed from the list itself. Since handler IDs are guaranteed to
|
||||
+ to be in increasing order, the next handler has to have: */
|
||||
+
|
||||
+ /* A. An earlier position than the current one has. */
|
||||
+ i--;
|
||||
+
|
||||
+ /* B. A lower ID than the current one does. The code below skips
|
||||
+ any newly added handlers with higher IDs. */
|
||||
+ while (i > 0
|
||||
+ && fork_handler_list_at (&fork_handlers, i - 1)->id >= id)
|
||||
+ i--;
|
||||
}
|
||||
- else
|
||||
+
|
||||
+ return lastrun;
|
||||
+}
|
||||
+
|
||||
+void
|
||||
+__run_postfork_handlers (enum __run_fork_handler_type who, _Bool do_locking,
|
||||
+ uint64_t lastrun)
|
||||
+{
|
||||
+ size_t sl = fork_handler_list_size (&fork_handlers);
|
||||
+ for (size_t i = 0; i < sl;)
|
||||
{
|
||||
- size_t sl = fork_handler_list_size (&fork_handlers);
|
||||
- for (size_t i = 0; i < sl; i++)
|
||||
- {
|
||||
- runp = fork_handler_list_at (&fork_handlers, i);
|
||||
- if (who == atfork_run_child && runp->child_handler)
|
||||
- runp->child_handler ();
|
||||
- else if (who == atfork_run_parent && runp->parent_handler)
|
||||
- runp->parent_handler ();
|
||||
- }
|
||||
+ struct fork_handler *runp = fork_handler_list_at (&fork_handlers, i);
|
||||
+ uint64_t id = runp->id;
|
||||
+
|
||||
+ /* prepare handlers were not run for handlers with ID > LASTRUN.
|
||||
+ Thus, parent/child handlers will also not be run. */
|
||||
+ if (id > lastrun)
|
||||
+ break;
|
||||
+
|
||||
if (do_locking)
|
||||
- lll_unlock (atfork_lock, LLL_PRIVATE);
|
||||
+ lll_unlock (atfork_lock, LLL_PRIVATE);
|
||||
+
|
||||
+ if (who == atfork_run_child && runp->child_handler)
|
||||
+ runp->child_handler ();
|
||||
+ else if (who == atfork_run_parent && runp->parent_handler)
|
||||
+ runp->parent_handler ();
|
||||
+
|
||||
+ if (do_locking)
|
||||
+ lll_lock (atfork_lock, LLL_PRIVATE);
|
||||
+
|
||||
+ /* We unlocked, ran the handler, and locked again. In the meanwhile,
|
||||
+ one or more [de]registrations could have occurred. Due to this,
|
||||
+ the list size must be updated. */
|
||||
+ sl = fork_handler_list_size (&fork_handlers);
|
||||
+
|
||||
+ /* The just-run handler could also have moved up the list. */
|
||||
+
|
||||
+ if (sl > i && fork_handler_list_at (&fork_handlers, i)->id == id)
|
||||
+ /* The position of the recently run handler hasn't changed. The
|
||||
+ next handler to be run is an easy increment away. */
|
||||
+ i++;
|
||||
+ else
|
||||
+ {
|
||||
+ /* The next handler to be run is the first handler in the list
|
||||
+ to have an ID higher than the current one. */
|
||||
+ for (i = 0; i < sl; i++)
|
||||
+ {
|
||||
+ if (fork_handler_list_at (&fork_handlers, i)->id > id)
|
||||
+ break;
|
||||
+ }
|
||||
+ }
|
||||
}
|
||||
+
|
||||
+ if (do_locking)
|
||||
+ lll_unlock (atfork_lock, LLL_PRIVATE);
|
||||
}
|
||||
|
||||
|
||||
diff --git a/sysdeps/pthread/Makefile b/sysdeps/pthread/Makefile
|
||||
index 00419c4d199df912..5147588c130c9415 100644
|
||||
--- a/sysdeps/pthread/Makefile
|
||||
+++ b/sysdeps/pthread/Makefile
|
||||
@@ -154,16 +154,36 @@ tests += tst-cancelx2 tst-cancelx3 tst-cancelx6 tst-cancelx8 tst-cancelx9 \
|
||||
tst-cleanupx0 tst-cleanupx1 tst-cleanupx2 tst-cleanupx3
|
||||
|
||||
ifeq ($(build-shared),yes)
|
||||
-tests += tst-atfork2 tst-pt-tls4 tst-_res1 tst-fini1 tst-create1
|
||||
+tests += \
|
||||
+ tst-atfork2 \
|
||||
+ tst-pt-tls4 \
|
||||
+ tst-_res1 \
|
||||
+ tst-fini1 \
|
||||
+ tst-create1 \
|
||||
+ tst-atfork3 \
|
||||
+ tst-atfork4 \
|
||||
+# tests
|
||||
+
|
||||
tests-nolibpthread += tst-fini1
|
||||
endif
|
||||
|
||||
-modules-names += tst-atfork2mod tst-tls4moda tst-tls4modb \
|
||||
- tst-_res1mod1 tst-_res1mod2 tst-fini1mod \
|
||||
- tst-create1mod
|
||||
+modules-names += \
|
||||
+ tst-atfork2mod \
|
||||
+ tst-tls4moda \
|
||||
+ tst-tls4modb \
|
||||
+ tst-_res1mod1 \
|
||||
+ tst-_res1mod2 \
|
||||
+ tst-fini1mod \
|
||||
+ tst-create1mod \
|
||||
+ tst-atfork3mod \
|
||||
+ tst-atfork4mod \
|
||||
+# module-names
|
||||
+
|
||||
test-modules = $(addprefix $(objpfx),$(addsuffix .so,$(modules-names)))
|
||||
|
||||
tst-atfork2mod.so-no-z-defs = yes
|
||||
+tst-atfork3mod.so-no-z-defs = yes
|
||||
+tst-atfork4mod.so-no-z-defs = yes
|
||||
tst-create1mod.so-no-z-defs = yes
|
||||
|
||||
ifeq ($(build-shared),yes)
|
||||
@@ -226,8 +246,18 @@ tst-atfork2-ENV = MALLOC_TRACE=$(objpfx)tst-atfork2.mtrace \
|
||||
LD_PRELOAD=$(common-objpfx)/malloc/libc_malloc_debug.so
|
||||
$(objpfx)tst-atfork2mod.so: $(shared-thread-library)
|
||||
|
||||
+$(objpfx)tst-atfork3: $(shared-thread-library)
|
||||
+LDFLAGS-tst-atfork3 = -rdynamic
|
||||
+$(objpfx)tst-atfork3mod.so: $(shared-thread-library)
|
||||
+
|
||||
+$(objpfx)tst-atfork4: $(shared-thread-library)
|
||||
+LDFLAGS-tst-atfork4 = -rdynamic
|
||||
+$(objpfx)tst-atfork4mod.so: $(shared-thread-library)
|
||||
+
|
||||
ifeq ($(build-shared),yes)
|
||||
$(objpfx)tst-atfork2.out: $(objpfx)tst-atfork2mod.so
|
||||
+$(objpfx)tst-atfork3.out: $(objpfx)tst-atfork3mod.so
|
||||
+$(objpfx)tst-atfork4.out: $(objpfx)tst-atfork4mod.so
|
||||
endif
|
||||
|
||||
ifeq ($(build-shared),yes)
|
||||
diff --git a/sysdeps/pthread/tst-atfork3.c b/sysdeps/pthread/tst-atfork3.c
|
||||
new file mode 100644
|
||||
index 0000000000000000..bb2250e432ab79ad
|
||||
--- /dev/null
|
||||
+++ b/sysdeps/pthread/tst-atfork3.c
|
||||
@@ -0,0 +1,118 @@
|
||||
+/* Check if pthread_atfork handler can call dlclose (BZ#24595).
|
||||
+ Copyright (C) 2022 Free Software Foundation, Inc.
|
||||
+ This file is part of the GNU C Library.
|
||||
+
|
||||
+ The GNU C Library is free software; you can redistribute it and/or
|
||||
+ modify it under the terms of the GNU Lesser General Public
|
||||
+ License as published by the Free Software Foundation; either
|
||||
+ version 2.1 of the License, or (at your option) any later version.
|
||||
+
|
||||
+ The GNU C Library is distributed in the hope that it will be useful,
|
||||
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
|
||||
+ Lesser General Public License for more details.
|
||||
+
|
||||
+ You should have received a copy of the GNU Lesser General Public
|
||||
+ License along with the GNU C Library; if not, see
|
||||
+ <http://www.gnu.org/licenses/>. */
|
||||
+
|
||||
+#include <stdio.h>
|
||||
+#include <pthread.h>
|
||||
+#include <unistd.h>
|
||||
+#include <stdlib.h>
|
||||
+#include <stdbool.h>
|
||||
+
|
||||
+#include <support/check.h>
|
||||
+#include <support/xthread.h>
|
||||
+#include <support/capture_subprocess.h>
|
||||
+#include <support/xdlfcn.h>
|
||||
+
|
||||
+/* Check if pthread_atfork handlers do not deadlock when calling a function
|
||||
+ that might alter the internal fork handle list, such as dlclose.
|
||||
+
|
||||
+ The test registers a callback set with pthread_atfork(), dlopen() a shared
|
||||
+ library (nptl/tst-atfork3mod.c), calls an exported symbol from the library
|
||||
+ (which in turn also registers atfork handlers), and calls fork to trigger
|
||||
+ the callbacks. */
|
||||
+
|
||||
+static void *handler;
|
||||
+static bool run_dlclose_prepare;
|
||||
+static bool run_dlclose_parent;
|
||||
+static bool run_dlclose_child;
|
||||
+
|
||||
+static void
|
||||
+prepare (void)
|
||||
+{
|
||||
+ if (run_dlclose_prepare)
|
||||
+ xdlclose (handler);
|
||||
+}
|
||||
+
|
||||
+static void
|
||||
+parent (void)
|
||||
+{
|
||||
+ if (run_dlclose_parent)
|
||||
+ xdlclose (handler);
|
||||
+}
|
||||
+
|
||||
+static void
|
||||
+child (void)
|
||||
+{
|
||||
+ if (run_dlclose_child)
|
||||
+ xdlclose (handler);
|
||||
+}
|
||||
+
|
||||
+static void
|
||||
+proc_func (void *closure)
|
||||
+{
|
||||
+}
|
||||
+
|
||||
+static void
|
||||
+do_test_generic (bool dlclose_prepare, bool dlclose_parent, bool dlclose_child)
|
||||
+{
|
||||
+ run_dlclose_prepare = dlclose_prepare;
|
||||
+ run_dlclose_parent = dlclose_parent;
|
||||
+ run_dlclose_child = dlclose_child;
|
||||
+
|
||||
+ handler = xdlopen ("tst-atfork3mod.so", RTLD_NOW);
|
||||
+
|
||||
+ int (*atfork3mod_func)(void);
|
||||
+ atfork3mod_func = xdlsym (handler, "atfork3mod_func");
|
||||
+
|
||||
+ atfork3mod_func ();
|
||||
+
|
||||
+ struct support_capture_subprocess proc
|
||||
+ = support_capture_subprocess (proc_func, NULL);
|
||||
+ support_capture_subprocess_check (&proc, "tst-atfork3", 0, sc_allow_none);
|
||||
+
|
||||
+ handler = atfork3mod_func = NULL;
|
||||
+
|
||||
+ support_capture_subprocess_free (&proc);
|
||||
+}
|
||||
+
|
||||
+static void *
|
||||
+thread_func (void *closure)
|
||||
+{
|
||||
+ return NULL;
|
||||
+}
|
||||
+
|
||||
+static int
|
||||
+do_test (void)
|
||||
+{
|
||||
+ {
|
||||
+ /* Make the process acts as multithread. */
|
||||
+ pthread_attr_t attr;
|
||||
+ xpthread_attr_init (&attr);
|
||||
+ xpthread_attr_setdetachstate (&attr, PTHREAD_CREATE_DETACHED);
|
||||
+ xpthread_create (&attr, thread_func, NULL);
|
||||
+ }
|
||||
+
|
||||
+ TEST_COMPARE (pthread_atfork (prepare, parent, child), 0);
|
||||
+
|
||||
+ do_test_generic (true /* prepare */, false /* parent */, false /* child */);
|
||||
+ do_test_generic (false /* prepare */, true /* parent */, false /* child */);
|
||||
+ do_test_generic (false /* prepare */, false /* parent */, true /* child */);
|
||||
+
|
||||
+ return 0;
|
||||
+}
|
||||
+
|
||||
+#include <support/test-driver.c>
|
||||
diff --git a/sysdeps/pthread/tst-atfork3mod.c b/sysdeps/pthread/tst-atfork3mod.c
|
||||
new file mode 100644
|
||||
index 0000000000000000..6d0658cb9efdecbc
|
||||
--- /dev/null
|
||||
+++ b/sysdeps/pthread/tst-atfork3mod.c
|
||||
@@ -0,0 +1,44 @@
|
||||
+/* Copyright (C) 2022 Free Software Foundation, Inc.
|
||||
+ This file is part of the GNU C Library.
|
||||
+
|
||||
+ The GNU C Library is free software; you can redistribute it and/or
|
||||
+ modify it under the terms of the GNU Lesser General Public
|
||||
+ License as published by the Free Software Foundation; either
|
||||
+ version 2.1 of the License, or (at your option) any later version.
|
||||
+
|
||||
+ The GNU C Library is distributed in the hope that it will be useful,
|
||||
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
|
||||
+ Lesser General Public License for more details.
|
||||
+
|
||||
+ You should have received a copy of the GNU Lesser General Public
|
||||
+ License along with the GNU C Library; if not, see
|
||||
+ <http://www.gnu.org/licenses/>. */
|
||||
+
|
||||
+#include <unistd.h>
|
||||
+#include <stdlib.h>
|
||||
+#include <pthread.h>
|
||||
+
|
||||
+#include <support/check.h>
|
||||
+
|
||||
+static void
|
||||
+mod_prepare (void)
|
||||
+{
|
||||
+}
|
||||
+
|
||||
+static void
|
||||
+mod_parent (void)
|
||||
+{
|
||||
+}
|
||||
+
|
||||
+static void
|
||||
+mod_child (void)
|
||||
+{
|
||||
+}
|
||||
+
|
||||
+int atfork3mod_func (void)
|
||||
+{
|
||||
+ TEST_COMPARE (pthread_atfork (mod_prepare, mod_parent, mod_child), 0);
|
||||
+
|
||||
+ return 0;
|
||||
+}
|
||||
diff --git a/sysdeps/pthread/tst-atfork4.c b/sysdeps/pthread/tst-atfork4.c
|
||||
new file mode 100644
|
||||
index 0000000000000000..52dc87e73b846ab9
|
||||
--- /dev/null
|
||||
+++ b/sysdeps/pthread/tst-atfork4.c
|
||||
@@ -0,0 +1,128 @@
|
||||
+/* pthread_atfork supports handlers that call pthread_atfork or dlclose.
|
||||
+ Copyright (C) 2022 Free Software Foundation, Inc.
|
||||
+ This file is part of the GNU C Library.
|
||||
+
|
||||
+ The GNU C Library is free software; you can redistribute it and/or
|
||||
+ modify it under the terms of the GNU Lesser General Public
|
||||
+ License as published by the Free Software Foundation; either
|
||||
+ version 2.1 of the License, or (at your option) any later version.
|
||||
+
|
||||
+ The GNU C Library is distributed in the hope that it will be useful,
|
||||
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
|
||||
+ Lesser General Public License for more details.
|
||||
+
|
||||
+ You should have received a copy of the GNU Lesser General Public
|
||||
+ License along with the GNU C Library; if not, see
|
||||
+ <https://www.gnu.org/licenses/>. */
|
||||
+
|
||||
+#include <support/xdlfcn.h>
|
||||
+#include <stdio.h>
|
||||
+#include <support/xthread.h>
|
||||
+#include <sys/types.h>
|
||||
+#include <sys/wait.h>
|
||||
+#include <support/xunistd.h>
|
||||
+#include <support/check.h>
|
||||
+#include <stdlib.h>
|
||||
+
|
||||
+static void *
|
||||
+thread_func (void *x)
|
||||
+{
|
||||
+ return NULL;
|
||||
+}
|
||||
+
|
||||
+static unsigned int second_atfork_handler_runcount = 0;
|
||||
+
|
||||
+static void
|
||||
+second_atfork_handler (void)
|
||||
+{
|
||||
+ second_atfork_handler_runcount++;
|
||||
+}
|
||||
+
|
||||
+static void *h = NULL;
|
||||
+
|
||||
+static unsigned int atfork_handler_runcount = 0;
|
||||
+
|
||||
+static void
|
||||
+prepare (void)
|
||||
+{
|
||||
+ /* These atfork handlers are registered while atfork handlers are being
|
||||
+ executed and thus will not be executed during the corresponding
|
||||
+ fork. */
|
||||
+ TEST_VERIFY_EXIT (pthread_atfork (second_atfork_handler,
|
||||
+ second_atfork_handler,
|
||||
+ second_atfork_handler) == 0);
|
||||
+
|
||||
+ /* This will de-register the atfork handlers registered by the dlopen'd
|
||||
+ library and so they will not be executed. */
|
||||
+ if (h != NULL)
|
||||
+ {
|
||||
+ xdlclose (h);
|
||||
+ h = NULL;
|
||||
+ }
|
||||
+
|
||||
+ atfork_handler_runcount++;
|
||||
+}
|
||||
+
|
||||
+static void
|
||||
+after (void)
|
||||
+{
|
||||
+ atfork_handler_runcount++;
|
||||
+}
|
||||
+
|
||||
+static int
|
||||
+do_test (void)
|
||||
+{
|
||||
+ /* Make sure __libc_single_threaded is 0. */
|
||||
+ pthread_attr_t attr;
|
||||
+ xpthread_attr_init (&attr);
|
||||
+ xpthread_attr_setdetachstate (&attr, PTHREAD_CREATE_DETACHED);
|
||||
+ xpthread_create (&attr, thread_func, NULL);
|
||||
+
|
||||
+ void (*reg_atfork_handlers) (void);
|
||||
+
|
||||
+ h = xdlopen ("tst-atfork4mod.so", RTLD_LAZY);
|
||||
+
|
||||
+ reg_atfork_handlers = xdlsym (h, "reg_atfork_handlers");
|
||||
+
|
||||
+ reg_atfork_handlers ();
|
||||
+
|
||||
+ /* We register our atfork handlers *after* loading the module so that our
|
||||
+ prepare handler is called first at fork, where we then dlclose the
|
||||
+ module before its prepare handler has a chance to be called. */
|
||||
+ TEST_VERIFY_EXIT (pthread_atfork (prepare, after, after) == 0);
|
||||
+
|
||||
+ pid_t pid = xfork ();
|
||||
+
|
||||
+ /* Both the parent and the child processes should observe this. */
|
||||
+ TEST_VERIFY_EXIT (atfork_handler_runcount == 2);
|
||||
+ TEST_VERIFY_EXIT (second_atfork_handler_runcount == 0);
|
||||
+
|
||||
+ if (pid > 0)
|
||||
+ {
|
||||
+ int childstat;
|
||||
+
|
||||
+ xwaitpid (-1, &childstat, 0);
|
||||
+ TEST_VERIFY_EXIT (WIFEXITED (childstat)
|
||||
+ && WEXITSTATUS (childstat) == 0);
|
||||
+
|
||||
+ /* This time, the second set of atfork handlers should also be called
|
||||
+ since the handlers are already in place before fork is called. */
|
||||
+
|
||||
+ pid = xfork ();
|
||||
+
|
||||
+ TEST_VERIFY_EXIT (atfork_handler_runcount == 4);
|
||||
+ TEST_VERIFY_EXIT (second_atfork_handler_runcount == 2);
|
||||
+
|
||||
+ if (pid > 0)
|
||||
+ {
|
||||
+ xwaitpid (-1, &childstat, 0);
|
||||
+ TEST_VERIFY_EXIT (WIFEXITED (childstat)
|
||||
+ && WEXITSTATUS (childstat) == 0);
|
||||
+ }
|
||||
+ }
|
||||
+
|
||||
+ return 0;
|
||||
+}
|
||||
+
|
||||
+#include <support/test-driver.c>
|
||||
diff --git a/sysdeps/pthread/tst-atfork4mod.c b/sysdeps/pthread/tst-atfork4mod.c
|
||||
new file mode 100644
|
||||
index 0000000000000000..e111efeb185916e0
|
||||
--- /dev/null
|
||||
+++ b/sysdeps/pthread/tst-atfork4mod.c
|
||||
@@ -0,0 +1,48 @@
|
||||
+/* pthread_atfork supports handlers that call pthread_atfork or dlclose.
|
||||
+ Copyright (C) 2022 Free Software Foundation, Inc.
|
||||
+ This file is part of the GNU C Library.
|
||||
+
|
||||
+ The GNU C Library is free software; you can redistribute it and/or
|
||||
+ modify it under the terms of the GNU Lesser General Public
|
||||
+ License as published by the Free Software Foundation; either
|
||||
+ version 2.1 of the License, or (at your option) any later version.
|
||||
+
|
||||
+ The GNU C Library is distributed in the hope that it will be useful,
|
||||
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
|
||||
+ Lesser General Public License for more details.
|
||||
+
|
||||
+ You should have received a copy of the GNU Lesser General Public
|
||||
+ License along with the GNU C Library; if not, see
|
||||
+ <https://www.gnu.org/licenses/>. */
|
||||
+
|
||||
+#include <pthread.h>
|
||||
+#include <stdlib.h>
|
||||
+
|
||||
+/* This dynamically loaded library simply registers its atfork handlers when
|
||||
+ asked to. The atfork handlers should never be executed because the
|
||||
+ library is unloaded before fork is called by the test program. */
|
||||
+
|
||||
+static void
|
||||
+prepare (void)
|
||||
+{
|
||||
+ abort ();
|
||||
+}
|
||||
+
|
||||
+static void
|
||||
+parent (void)
|
||||
+{
|
||||
+ abort ();
|
||||
+}
|
||||
+
|
||||
+static void
|
||||
+child (void)
|
||||
+{
|
||||
+ abort ();
|
||||
+}
|
||||
+
|
||||
+void
|
||||
+reg_atfork_handlers (void)
|
||||
+{
|
||||
+ pthread_atfork (prepare, parent, child);
|
||||
+}
|
150
glibc.spec
150
glibc.spec
@ -148,7 +148,7 @@ end \
|
||||
Summary: The GNU libc libraries
|
||||
Name: glibc
|
||||
Version: %{glibcversion}
|
||||
Release: 32%{?dist}
|
||||
Release: 35%{?dist}
|
||||
|
||||
# In general, GPLv2+ is used by programs, LGPLv2+ is used for
|
||||
# libraries.
|
||||
@ -461,6 +461,74 @@ Patch253: glibc-upstream-2.34-187.patch
|
||||
Patch254: glibc-upstream-2.34-188.patch
|
||||
Patch255: glibc-upstream-2.34-189.patch
|
||||
Patch256: glibc-upstream-2.34-190.patch
|
||||
Patch257: glibc-upstream-2.34-191.patch
|
||||
Patch258: glibc-upstream-2.34-192.patch
|
||||
Patch259: glibc-upstream-2.34-193.patch
|
||||
Patch260: glibc-upstream-2.34-194.patch
|
||||
Patch261: glibc-upstream-2.34-195.patch
|
||||
Patch262: glibc-upstream-2.34-196.patch
|
||||
Patch263: glibc-upstream-2.34-197.patch
|
||||
Patch264: glibc-upstream-2.34-198.patch
|
||||
Patch265: glibc-upstream-2.34-199.patch
|
||||
Patch266: glibc-upstream-2.34-200.patch
|
||||
Patch267: glibc-upstream-2.34-201.patch
|
||||
Patch268: glibc-upstream-2.34-202.patch
|
||||
Patch269: glibc-upstream-2.34-203.patch
|
||||
Patch270: glibc-upstream-2.34-204.patch
|
||||
Patch271: glibc-upstream-2.34-205.patch
|
||||
Patch272: glibc-upstream-2.34-206.patch
|
||||
Patch273: glibc-upstream-2.34-207.patch
|
||||
Patch274: glibc-upstream-2.34-208.patch
|
||||
Patch275: glibc-upstream-2.34-209.patch
|
||||
Patch276: glibc-upstream-2.34-210.patch
|
||||
Patch277: glibc-upstream-2.34-211.patch
|
||||
Patch278: glibc-upstream-2.34-212.patch
|
||||
Patch279: glibc-upstream-2.34-213.patch
|
||||
Patch280: glibc-upstream-2.34-214.patch
|
||||
Patch281: glibc-upstream-2.34-215.patch
|
||||
Patch282: glibc-upstream-2.34-216.patch
|
||||
Patch283: glibc-upstream-2.34-217.patch
|
||||
Patch284: glibc-upstream-2.34-218.patch
|
||||
Patch285: glibc-upstream-2.34-219.patch
|
||||
Patch286: glibc-upstream-2.34-220.patch
|
||||
Patch287: glibc-upstream-2.34-221.patch
|
||||
Patch288: glibc-upstream-2.34-222.patch
|
||||
Patch289: glibc-upstream-2.34-223.patch
|
||||
Patch290: glibc-upstream-2.34-224.patch
|
||||
Patch291: glibc-upstream-2.34-225.patch
|
||||
Patch292: glibc-upstream-2.34-226.patch
|
||||
Patch293: glibc-upstream-2.34-227.patch
|
||||
Patch294: glibc-upstream-2.34-228.patch
|
||||
Patch295: glibc-upstream-2.34-229.patch
|
||||
Patch296: glibc-upstream-2.34-230.patch
|
||||
Patch297: glibc-upstream-2.34-231.patch
|
||||
Patch298: glibc-upstream-2.34-232.patch
|
||||
Patch299: glibc-upstream-2.34-233.patch
|
||||
Patch300: glibc-upstream-2.34-234.patch
|
||||
Patch301: glibc-upstream-2.34-235.patch
|
||||
Patch302: glibc-upstream-2.34-236.patch
|
||||
Patch303: glibc-upstream-2.34-237.patch
|
||||
Patch304: glibc-upstream-2.34-238.patch
|
||||
Patch305: glibc-upstream-2.34-239.patch
|
||||
Patch306: glibc-upstream-2.34-240.patch
|
||||
Patch307: glibc-upstream-2.34-241.patch
|
||||
Patch308: glibc-upstream-2.34-242.patch
|
||||
Patch309: glibc-upstream-2.34-243.patch
|
||||
Patch310: glibc-upstream-2.34-244.patch
|
||||
Patch311: glibc-upstream-2.34-245.patch
|
||||
Patch312: glibc-upstream-2.34-246.patch
|
||||
Patch313: glibc-upstream-2.34-247.patch
|
||||
Patch314: glibc-upstream-2.34-248.patch
|
||||
Patch315: glibc-upstream-2.34-249.patch
|
||||
Patch316: glibc-upstream-2.34-250.patch
|
||||
Patch317: glibc-upstream-2.34-251.patch
|
||||
Patch318: glibc-upstream-2.34-252.patch
|
||||
Patch319: glibc-upstream-2.34-253.patch
|
||||
Patch320: glibc-upstream-2.34-254.patch
|
||||
Patch321: glibc-upstream-2.34-255.patch
|
||||
Patch322: glibc-upstream-2.34-256.patch
|
||||
Patch323: glibc-upstream-2.34-257.patch
|
||||
Patch324: glibc-upstream-2.34-258.patch
|
||||
|
||||
##############################################################################
|
||||
# Continued list of core "glibc" package information:
|
||||
@ -2517,6 +2585,86 @@ fi
|
||||
%files -f compat-libpthread-nonshared.filelist -n compat-libpthread-nonshared
|
||||
|
||||
%changelog
|
||||
* Tue May 31 2022 Arjun Shankar <arjun@redhat.com> - 2.34-35
|
||||
- Sync with upstream branch release/2.34/master,
|
||||
commit ff450cdbdee0b8cb6b9d653d6d2fa892de29be31:
|
||||
- Fix deadlock when pthread_atfork handler calls pthread_atfork or dlclose
|
||||
- x86: Fallback {str|wcs}cmp RTM in the ncmp overflow case [BZ #29127]
|
||||
- string.h: fix __fortified_attr_access macro call [BZ #29162]
|
||||
- linux: Add a getauxval test [BZ #23293]
|
||||
- rtld: Use generic argv adjustment in ld.so [BZ #23293]
|
||||
- S390: Enable static PIE
|
||||
|
||||
* Thu May 19 2022 Florian Weimer <fweimer@redhat.com> - 2.34-34
|
||||
- Sync with upstream branch release/2.34/master,
|
||||
commit ede8d94d154157d269b18f3601440ac576c1f96a:
|
||||
- csu: Implement and use _dl_early_allocate during static startup
|
||||
- Linux: Introduce __brk_call for invoking the brk system call
|
||||
- Linux: Implement a useful version of _startup_fatal
|
||||
- ia64: Always define IA64_USE_NEW_STUB as a flag macro
|
||||
- Linux: Define MMAP_CALL_INTERNAL
|
||||
- i386: Honor I386_USE_SYSENTER for 6-argument Linux system calls
|
||||
- i386: Remove OPTIMIZE_FOR_GCC_5 from Linux libc-do-syscall.S
|
||||
- elf: Remove __libc_init_secure
|
||||
- Linux: Consolidate auxiliary vector parsing (redo)
|
||||
- Linux: Include <dl-auxv.h> in dl-sysdep.c only for SHARED
|
||||
- Revert "Linux: Consolidate auxiliary vector parsing"
|
||||
- Linux: Consolidate auxiliary vector parsing
|
||||
- Linux: Assume that NEED_DL_SYSINFO_DSO is always defined
|
||||
- Linux: Remove DL_FIND_ARG_COMPONENTS
|
||||
- Linux: Remove HAVE_AUX_SECURE, HAVE_AUX_XID, HAVE_AUX_PAGESIZE
|
||||
- elf: Merge dl-sysdep.c into the Linux version
|
||||
- elf: Remove unused NEED_DL_BASE_ADDR and _dl_base_addr
|
||||
- x86: Optimize {str|wcs}rchr-evex
|
||||
- x86: Optimize {str|wcs}rchr-avx2
|
||||
- x86: Optimize {str|wcs}rchr-sse2
|
||||
- x86: Cleanup page cross code in memcmp-avx2-movbe.S
|
||||
- x86: Remove memcmp-sse4.S
|
||||
- x86: Small improvements for wcslen
|
||||
- x86: Remove AVX str{n}casecmp
|
||||
- x86: Add EVEX optimized str{n}casecmp
|
||||
- x86: Add AVX2 optimized str{n}casecmp
|
||||
- x86: Optimize str{n}casecmp TOLOWER logic in strcmp-sse42.S
|
||||
- x86: Optimize str{n}casecmp TOLOWER logic in strcmp.S
|
||||
- x86: Remove strspn-sse2.S and use the generic implementation
|
||||
- x86: Remove strpbrk-sse2.S and use the generic implementation
|
||||
- x86: Remove strcspn-sse2.S and use the generic implementation
|
||||
- x86: Optimize strspn in strspn-c.c
|
||||
- x86: Optimize strcspn and strpbrk in strcspn-c.c
|
||||
- x86: Code cleanup in strchr-evex and comment justifying branch
|
||||
- x86: Code cleanup in strchr-avx2 and comment justifying branch
|
||||
- x86_64: Remove bcopy optimizations
|
||||
- x86-64: Remove bzero weak alias in SS2 memset
|
||||
- x86_64/multiarch: Sort sysdep_routines and put one entry per line
|
||||
- x86: Improve L to support L(XXX_SYMBOL (YYY, ZZZ))
|
||||
- fortify: Ensure that __glibc_fortify condition is a constant [BZ #29141]
|
||||
|
||||
* Thu May 12 2022 Florian Weimer <fweimer@redhat.com> - 2.34-33
|
||||
- Sync with upstream branch release/2.34/master,
|
||||
commit 91c2e6c3db44297bf4cb3a2e3c40236c5b6a0b23:
|
||||
- dlfcn: Implement the RTLD_DI_PHDR request type for dlinfo
|
||||
- manual: Document the dlinfo function
|
||||
- x86: Fix fallback for wcsncmp_avx2 in strcmp-avx2.S [BZ #28896]
|
||||
- x86: Fix bug in strncmp-evex and strncmp-avx2 [BZ #28895]
|
||||
- x86: Set .text section in memset-vec-unaligned-erms
|
||||
- x86-64: Optimize bzero
|
||||
- x86: Remove SSSE3 instruction for broadcast in memset.S (SSE2 Only)
|
||||
- x86: Improve vec generation in memset-vec-unaligned-erms.S
|
||||
- x86-64: Fix strcmp-evex.S
|
||||
- x86-64: Fix strcmp-avx2.S
|
||||
- x86: Optimize strcmp-evex.S
|
||||
- x86: Optimize strcmp-avx2.S
|
||||
- manual: Clarify that abbreviations of long options are allowed
|
||||
- Add HWCAP2_AFP, HWCAP2_RPRES from Linux 5.17 to AArch64 bits/hwcap.h
|
||||
- aarch64: Add HWCAP2_ECV from Linux 5.16
|
||||
- Add SOL_MPTCP, SOL_MCTP from Linux 5.16 to bits/socket.h
|
||||
- Update kernel version to 5.17 in tst-mman-consts.py
|
||||
- Update kernel version to 5.16 in tst-mman-consts.py
|
||||
- Update syscall lists for Linux 5.17
|
||||
- Add ARPHRD_CAN, ARPHRD_MCTP to net/if_arp.h
|
||||
- Update kernel version to 5.15 in tst-mman-consts.py
|
||||
- Add PF_MCTP, AF_MCTP from Linux 5.15 to bits/socket.h
|
||||
|
||||
* Thu Apr 28 2022 Carlos O'Donell <carlos@redhat.com> - 2.34-32
|
||||
- Sync with upstream branch release/2.34/master,
|
||||
commit c66c92181ddbd82306537a608e8c0282587131de:
|
||||
|
Loading…
Reference in New Issue
Block a user