Relevant commits already backported; skipped from this sync: - elf: handle addition overflow in _dl_find_object_update_1 [BZ #32245] (glibc-RHEL-119398.patch) - Avoid uninitialized result in sem_open when file does not exist (glibc-RHEL-119392-1.patch) - Rename new tst-sem17 test to tst-sem18 (glibc-RHEL-119392-2.patch) - nss: Group merge does not react to ERANGE during merge (bug 33361) (glibc-RHEL-114265.patch) - AArch64: Fix instability in AdvSIMD tan (glibc-RHEL-118273-44.patch) RPM-Changelog: - Sync with upstream branch release/2.39/master (RHEL-126766) - Upstream commit: ce65d944e38a20cb70af2a48a4b8aa5d8fabe1cc - posix: Reset wordexp_t fields with WRDE_REUSE (CVE-2025-15281 / BZ 33814) - resolv: Fix NSS DNS backend for getnetbyaddr (CVE-2026-0915) - memalign: reinstate alignment overflow check (CVE-2026-0861) - support: Exit on consistency check failure in resolv_response_add_name - support: Fix FILE * leak in check_for_unshare_hints in test-container - sprof: fix -Wformat warnings on 32-bit hosts - sprof: check pread size and offset for overflow - getaddrinfo.c: Avoid uninitialized pointer access [BZ #32465] - nptl: Optimize trylock for high cache contention workloads (BZ #33704) - ppc64le: Power 10 rawmemchr clobbers v20 (bug #33091) - ppc64le: Restore optimized strncmp for power10 - ppc64le: Restore optimized strcmp for power10 - AArch64: Optimise SVE scalar callbacks - aarch64: fix includes in SME tests - aarch64: fix cfi directives around __libc_arm_za_disable - aarch64: tests for SME - aarch64: clear ZA state of SME before clone and clone3 syscalls - aarch64: define macro for calling __libc_arm_za_disable - aarch64: update tests for SME - aarch64: Disable ZA state of SME in setjmp and sigsetjmp - linux: Also check pkey_get for ENOSYS on tst-pkey (BZ 31996) - aarch64: Do not link conform tests with -Wl,-z,force-bti (bug 33601) - x86: fix wmemset ifunc stray '!' (bug 33542) - x86: Detect Intel Nova Lake Processor - x86: Detect Intel Wildcat Lake Processor Resolves: RHEL-126766 Resolves: RHEL-45143 Resolves: RHEL-45145 Resolves: RHEL-142786 Resolves: RHEL-141852 Resolves: RHEL-141733
222 lines
11 KiB
Diff
222 lines
11 KiB
Diff
commit d1d0d09e9e5e086d3de9217a9572b634ea74857a
|
||
Author: Yury Khrustalev <yury.khrustalev@arm.com>
|
||
Date: Thu Sep 25 15:54:36 2025 +0100
|
||
|
||
aarch64: clear ZA state of SME before clone and clone3 syscalls
|
||
|
||
This change adds a call to the __arm_za_disable() function immediately
|
||
before the SVC instruction inside clone() and clone3() wrappers. It also
|
||
adds a macro for inline clone() used in fork() and adds the same call to
|
||
the vfork implementation. This sets the ZA state of SME to "off" on return
|
||
from these functions (for both the child and the parent).
|
||
|
||
The __arm_za_disable() function is described in [1] (8.1.3). Note that
|
||
the internal Glibc name for this function is __libc_arm_za_disable().
|
||
|
||
When this change was originally proposed [2,3], it generated a long
|
||
discussion where several questions and concerns were raised. Here we
|
||
will address these concerns and explain why this change is useful and,
|
||
in fact, necessary.
|
||
|
||
In a nutshell, a C library that conforms to the AAPCS64 spec [1] (pertinent
|
||
to this change, mainly, the chapters 6.2 and 6.6), should have a call to the
|
||
__arm_za_disable() function in clone() and clone3() wrappers. The following
|
||
explains in detail why this is the case.
|
||
|
||
When we consider using the __arm_za_disable() function inside the clone()
|
||
and clone3() libc wrappers, we talk about the C library subroutines clone()
|
||
and clone3() rather than the syscalls with similar names. In the current
|
||
version of Glibc, clone() is public and clone3() is private, but it being
|
||
private is not pertinent to this discussion.
|
||
|
||
We will begin with stating that this change is NOT a bug fix for something
|
||
in the kernel. The requirement to call __arm_za_disable() does NOT come from
|
||
the kernel. It also is NOT needed to satisfy a contract between the kernel
|
||
and userspace. This is why it is not for the kernel documentation to describe
|
||
this requirement. This requirement is instead needed to satisfy a pure userspace
|
||
scheme outlined in [1] and to make sure that software that uses Glibc (or any
|
||
other C library that has correct handling of SME states (see below)) conforms
|
||
to [1] without having to unnecessarily become SME-aware thus losing portability.
|
||
|
||
To recap (see [1] (6.2)), SME extension defines SME state which is part of
|
||
processor state. Part of this SME state is ZA state that is necessary to
|
||
manage ZA storage register in the context of the ZA lazy saving scheme [1]
|
||
(6.6). This scheme exists because it would be challenging to handle ZA
|
||
storage of SME in either callee-saved or caller-saved manner.
|
||
|
||
There are 3 kinds of ZA state that are defined in terms of the PSTATE.ZA
|
||
bit and the TPIDR2_EL0 register (see [1] (6.6.3)):
|
||
|
||
- "off": PSTATE.ZA == 0
|
||
- "active": PSTATE.ZA == 1 TPIDR2_EL0 == null
|
||
- "dormant": PSTATE.ZA == 1 TPIDR2_EL0 != null
|
||
|
||
As [1] (6.7.2) outlines, every subroutine has exactly one SME-interface
|
||
depending on the permitted ZA-states on entry and on normal return from
|
||
a call to this subroutine. Callers of a subroutine must know and respect
|
||
the ZA-interface of the subroutines they are using. Using a subroutine
|
||
in a way that is not permitted by its ZA-interface is undefined behaviour.
|
||
|
||
In particular, clone() and clone3() (the C library functions) have the
|
||
ZA-private interface. This means that the permitted ZA-states on entry
|
||
are "off" and "dormant" and that the permitted states on return are "off"
|
||
or "dormant" (but if and only if it was "dormant" on entry).
|
||
|
||
This means that both functions in question should correctly handle both
|
||
"off" and "dormant" ZA-states on entry. The conforming states on return
|
||
are "off" and "dormant" (if inbound state was already "dormant").
|
||
|
||
This change ensures that the ZA-state on return is always "off". Note,
|
||
that, in the context of clone() and clone3(), "on return" means a point
|
||
when execution resumes at certain address after transferring from clone()
|
||
or clone3(). For the caller (we may refer to it as "parent") this is the
|
||
return address in the link register where the RET instruction jumps. For
|
||
the "child", this is the target branch address.
|
||
|
||
So, the "off" state on return is permitted and conformant. Why can't we
|
||
retain the "dormant" state? In theory, we can, but we shouldn't, here is
|
||
why.
|
||
|
||
Every subroutine with a private-ZA interface, including clone() and clone3(),
|
||
must comply with the lazy saving scheme [1] (6.7.2). This puts additional
|
||
responsibility on a subroutine if ZA-state on return is "dormant" because
|
||
this state has special meaning. The "caller" (that is the place in code
|
||
where execution is transferred to, so this include both "parent" and "child")
|
||
may check the ZA-state and use it as per the spec of the "dormant" state that
|
||
is outlined in [1] (6.6.6 and 6.6.7).
|
||
|
||
Conforming to this would require more code inside of clone() and clone3()
|
||
which hardly is desirable.
|
||
|
||
For the return to "parent" this could be achieved in theory, but given that
|
||
neither clone() nor clone3() are supposed to be used in the middle of an
|
||
SME operation, if wouldn't be useful. For the "return" to "child" this
|
||
would be particularly difficult to achieve given the complexity of these
|
||
functions and their interfaces. Most importantly, it would be illegal
|
||
and somewhat meaningless to allow a "child" to start execution in the
|
||
"dormant" ZA-state because the very essence of the "dormant" state implies
|
||
that there is a place to return and that there is some outer context that
|
||
we are allowed to interact with.
|
||
|
||
To sum up, calling __arm_za_disable() to ensure the "off" ZA-state when the
|
||
execution resumes after a call to clone() or clone3() is correct and also
|
||
the most simple way to conform to [1].
|
||
|
||
Can there be situations when we can avoid calling __arm_za_disable()?
|
||
|
||
Calling __arm_za_disable() implies certain (sufficiently small) overhead,
|
||
so one might rightly ponder avoiding making a call to this function when
|
||
we can afford not to. The most trivial cases like this (e.g. when the
|
||
calling thread doesn't have access to SME or to the TPIDR2_EL0 register)
|
||
are already handled by this function (see [1] (8.1.3 and 8.1.2)). Reasoning
|
||
about other possible use cases would require making code inside clone() and
|
||
clone3() more complicated and it would defeat the point of trying to make
|
||
an optimisation of not calling __arm_za_disable().
|
||
|
||
Why can't the kernel do this instead?
|
||
|
||
The handling of SME state by the kernel is described in [4]. In short,
|
||
kernel must not impose a specific ZA-interface onto a userspace function.
|
||
Interaction with the kernel happens (among other thing) via system calls.
|
||
In Glibc many of the system calls (notably, including SYS_clone and
|
||
SYS_clone3) are used via wrappers, and the kernel has no control of them
|
||
and, moreover, it cannot dictate how these wrappers should behave because
|
||
it is simply outside of the kernel's remit.
|
||
|
||
However, in certain cases, the kernel may ensure that a "child" doesn't
|
||
start in an incorrect state. This is what is done by the recent change
|
||
included in 6.16 kernel [5]. This is not enough to ensure that code that
|
||
uses clone() and clone3() function conforms to [1] when it runs on a
|
||
system that provides SME, hence this change.
|
||
|
||
[1]: https://github.com/ARM-software/abi-aa/blob/main/aapcs64/aapcs64.rst
|
||
[2]: https://inbox.sourceware.org/libc-alpha/20250522114828.2291047-1-yury.khrustalev@arm.com
|
||
[3]: https://inbox.sourceware.org/libc-alpha/20250609121407.3316070-1-yury.khrustalev@arm.com
|
||
[4]: https://www.kernel.org/doc/html/v6.16/arch/arm64/sme.html
|
||
[5]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=cde5c32db55740659fca6d56c09b88800d88fd29
|
||
|
||
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
|
||
(cherry picked from commit 27effb3d50424fb9634be77a2acd614b0386ff25)
|
||
(cherry picked from commit 256030b9842a10b1f22851b1de0c119761417544)
|
||
(cherry picked from commit 889ae4bdbb4a6fbf37c2303da8cdae3d18880d9e)
|
||
(cherry picked from commit 899ebf35691f01c357fc582b3e88db87accb5ee1)
|
||
|
||
diff --git a/sysdeps/unix/sysv/linux/aarch64/clone.S b/sysdeps/unix/sysv/linux/aarch64/clone.S
|
||
index fed19acc2f78351f..585f312a3f7319ed 100644
|
||
--- a/sysdeps/unix/sysv/linux/aarch64/clone.S
|
||
+++ b/sysdeps/unix/sysv/linux/aarch64/clone.S
|
||
@@ -45,6 +45,9 @@ ENTRY(__clone)
|
||
and x1, x1, -16
|
||
cbz x1, .Lsyscall_error
|
||
|
||
+ /* Clear ZA state of SME. */
|
||
+ CALL_LIBC_ARM_ZA_DISABLE
|
||
+
|
||
/* Do the system call. */
|
||
/* X0:flags, x1:newsp, x2:parenttidptr, x3:newtls, x4:childtid. */
|
||
mov x0, x2 /* flags */
|
||
diff --git a/sysdeps/unix/sysv/linux/aarch64/clone3.S b/sysdeps/unix/sysv/linux/aarch64/clone3.S
|
||
index 9b00b6b8853e9b8b..ec6874830ae98676 100644
|
||
--- a/sysdeps/unix/sysv/linux/aarch64/clone3.S
|
||
+++ b/sysdeps/unix/sysv/linux/aarch64/clone3.S
|
||
@@ -46,6 +46,9 @@ ENTRY(__clone3)
|
||
cbz x10, .Lsyscall_error /* No NULL cl_args pointer. */
|
||
cbz x2, .Lsyscall_error /* No NULL function pointer. */
|
||
|
||
+ /* Clear ZA state of SME. */
|
||
+ CALL_LIBC_ARM_ZA_DISABLE
|
||
+
|
||
/* Do the system call, the kernel expects:
|
||
x8: system call number
|
||
x0: cl_args
|
||
diff --git a/sysdeps/unix/sysv/linux/aarch64/sysdep.h b/sysdeps/unix/sysv/linux/aarch64/sysdep.h
|
||
index 2f039015190d7b24..19e66f77202add1a 100644
|
||
--- a/sysdeps/unix/sysv/linux/aarch64/sysdep.h
|
||
+++ b/sysdeps/unix/sysv/linux/aarch64/sysdep.h
|
||
@@ -247,6 +247,31 @@
|
||
#undef HAVE_INTERNAL_BRK_ADDR_SYMBOL
|
||
#define HAVE_INTERNAL_BRK_ADDR_SYMBOL 1
|
||
|
||
+/* Clear ZA state of SME (C version). */
|
||
+/* The __libc_arm_za_disable function has special calling convention
|
||
+ that allows to call it without stack manipulation and preserving
|
||
+ most of the registers. */
|
||
+#define CALL_LIBC_ARM_ZA_DISABLE() \
|
||
+({ \
|
||
+ unsigned long int __tmp; \
|
||
+ asm volatile ( \
|
||
+ " mov %0, x30\n" \
|
||
+ " .cfi_register x30, %0\n" \
|
||
+ " bl __libc_arm_za_disable\n" \
|
||
+ " mov x30, %0\n" \
|
||
+ " .cfi_register %0, x30\n" \
|
||
+ : "=r" (__tmp) \
|
||
+ : \
|
||
+ : "x14", "x15", "x16", "x17", "x18", "memory" ); \
|
||
+})
|
||
+
|
||
+/* Do clear ZA state of SME before making normal clone syscall. */
|
||
+#define INLINE_CLONE_SYSCALL(a0, a1, a2, a3, a4) \
|
||
+({ \
|
||
+ CALL_LIBC_ARM_ZA_DISABLE (); \
|
||
+ INLINE_SYSCALL_CALL (clone, a0, a1, a2, a3, a4); \
|
||
+})
|
||
+
|
||
#endif /* __ASSEMBLER__ */
|
||
|
||
#endif /* linux/aarch64/sysdep.h */
|
||
diff --git a/sysdeps/unix/sysv/linux/aarch64/vfork.S b/sysdeps/unix/sysv/linux/aarch64/vfork.S
|
||
index e71e492da339b25a..65fa85ae895b61a1 100644
|
||
--- a/sysdeps/unix/sysv/linux/aarch64/vfork.S
|
||
+++ b/sysdeps/unix/sysv/linux/aarch64/vfork.S
|
||
@@ -27,6 +27,9 @@
|
||
|
||
ENTRY (__vfork)
|
||
|
||
+ /* Clear ZA state of SME. */
|
||
+ CALL_LIBC_ARM_ZA_DISABLE
|
||
+
|
||
mov x0, #0x4111 /* CLONE_VM | CLONE_VFORK | SIGCHLD */
|
||
mov x1, sp
|
||
DO_CALL (clone, 2)
|