230 lines
		
	
	
		
			7.7 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
	
	
			
		
		
	
	
			230 lines
		
	
	
		
			7.7 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
	
	
.. SPDX-License-Identifier: GPL-2.0
 | 
						|
 | 
						|
Speculative Return Stack Overflow (SRSO)
 | 
						|
========================================
 | 
						|
 | 
						|
This is a mitigation for the speculative return stack overflow (SRSO)
 | 
						|
vulnerability found on AMD processors. The mechanism is by now the well
 | 
						|
known scenario of poisoning CPU functional units - the Branch Target
 | 
						|
Buffer (BTB) and Return Address Predictor (RAP) in this case - and then
 | 
						|
tricking the elevated privilege domain (the kernel) into leaking
 | 
						|
sensitive data.
 | 
						|
 | 
						|
AMD CPUs predict RET instructions using a Return Address Predictor (aka
 | 
						|
Return Address Stack/Return Stack Buffer). In some cases, a non-architectural
 | 
						|
CALL instruction (i.e., an instruction predicted to be a CALL but is
 | 
						|
not actually a CALL) can create an entry in the RAP which may be used
 | 
						|
to predict the target of a subsequent RET instruction.
 | 
						|
 | 
						|
The specific circumstances that lead to this varies by microarchitecture
 | 
						|
but the concern is that an attacker can mis-train the CPU BTB to predict
 | 
						|
non-architectural CALL instructions in kernel space and use this to
 | 
						|
control the speculative target of a subsequent kernel RET, potentially
 | 
						|
leading to information disclosure via a speculative side-channel.
 | 
						|
 | 
						|
The issue is tracked under CVE-2023-20569.
 | 
						|
 | 
						|
Affected processors
 | 
						|
-------------------
 | 
						|
 | 
						|
AMD Zen, generations 1-4. That is, all families 0x17 and 0x19. Older
 | 
						|
processors have not been investigated.
 | 
						|
 | 
						|
System information and options
 | 
						|
------------------------------
 | 
						|
 | 
						|
First of all, it is required that the latest microcode be loaded for
 | 
						|
mitigations to be effective.
 | 
						|
 | 
						|
The sysfs file showing SRSO mitigation status is:
 | 
						|
 | 
						|
  /sys/devices/system/cpu/vulnerabilities/spec_rstack_overflow
 | 
						|
 | 
						|
The possible values in this file are:
 | 
						|
 | 
						|
 * 'Not affected':
 | 
						|
 | 
						|
   The processor is not vulnerable
 | 
						|
 | 
						|
* 'Vulnerable':
 | 
						|
 | 
						|
   The processor is vulnerable and no mitigations have been applied.
 | 
						|
 | 
						|
 * 'Vulnerable: No microcode':
 | 
						|
 | 
						|
   The processor is vulnerable, no microcode extending IBPB
 | 
						|
   functionality to address the vulnerability has been applied.
 | 
						|
 | 
						|
 * 'Vulnerable: Safe RET, no microcode':
 | 
						|
 | 
						|
   The "Safe RET" mitigation (see below) has been applied to protect the
 | 
						|
   kernel, but the IBPB-extending microcode has not been applied.  User
 | 
						|
   space tasks may still be vulnerable.
 | 
						|
 | 
						|
 * 'Vulnerable: Microcode, no safe RET':
 | 
						|
 | 
						|
   Extended IBPB functionality microcode patch has been applied. It does
 | 
						|
   not address User->Kernel and Guest->Host transitions protection but it
 | 
						|
   does address User->User and VM->VM attack vectors.
 | 
						|
 | 
						|
   Note that User->User mitigation is controlled by how the IBPB aspect in
 | 
						|
   the Spectre v2 mitigation is selected:
 | 
						|
 | 
						|
    * conditional IBPB:
 | 
						|
 | 
						|
      where each process can select whether it needs an IBPB issued
 | 
						|
      around it PR_SPEC_DISABLE/_ENABLE etc, see :doc:`spectre`
 | 
						|
 | 
						|
    * strict:
 | 
						|
 | 
						|
      i.e., always on - by supplying spectre_v2_user=on on the kernel
 | 
						|
      command line
 | 
						|
 | 
						|
   (spec_rstack_overflow=microcode)
 | 
						|
 | 
						|
 * 'Mitigation: Safe RET':
 | 
						|
 | 
						|
   Combined microcode/software mitigation. It complements the
 | 
						|
   extended IBPB microcode patch functionality by addressing
 | 
						|
   User->Kernel and Guest->Host transitions protection.
 | 
						|
 | 
						|
   Selected by default or by spec_rstack_overflow=safe-ret
 | 
						|
 | 
						|
 * 'Mitigation: IBPB':
 | 
						|
 | 
						|
   Similar protection as "safe RET" above but employs an IBPB barrier on
 | 
						|
   privilege domain crossings (User->Kernel, Guest->Host).
 | 
						|
 | 
						|
  (spec_rstack_overflow=ibpb)
 | 
						|
 | 
						|
 * 'Mitigation: IBPB on VMEXIT':
 | 
						|
 | 
						|
   Mitigation addressing the cloud provider scenario - the Guest->Host
 | 
						|
   transitions only.
 | 
						|
 | 
						|
   (spec_rstack_overflow=ibpb-vmexit)
 | 
						|
 | 
						|
 | 
						|
 | 
						|
In order to exploit vulnerability, an attacker needs to:
 | 
						|
 | 
						|
 - gain local access on the machine
 | 
						|
 | 
						|
 - break kASLR
 | 
						|
 | 
						|
 - find gadgets in the running kernel in order to use them in the exploit
 | 
						|
 | 
						|
 - potentially create and pin an additional workload on the sibling
 | 
						|
   thread, depending on the microarchitecture (not necessary on fam 0x19)
 | 
						|
 | 
						|
 - run the exploit
 | 
						|
 | 
						|
Considering the performance implications of each mitigation type, the
 | 
						|
default one is 'Mitigation: safe RET' which should take care of most
 | 
						|
attack vectors, including the local User->Kernel one.
 | 
						|
 | 
						|
As always, the user is advised to keep her/his system up-to-date by
 | 
						|
applying software updates regularly.
 | 
						|
 | 
						|
The default setting will be reevaluated when needed and especially when
 | 
						|
new attack vectors appear.
 | 
						|
 | 
						|
As one can surmise, 'Mitigation: safe RET' does come at the cost of some
 | 
						|
performance depending on the workload. If one trusts her/his userspace
 | 
						|
and does not want to suffer the performance impact, one can always
 | 
						|
disable the mitigation with spec_rstack_overflow=off.
 | 
						|
 | 
						|
Similarly, 'Mitigation: IBPB' is another full mitigation type employing
 | 
						|
an indirect branch prediction barrier after having applied the required
 | 
						|
microcode patch for one's system. This mitigation comes also at
 | 
						|
a performance cost.
 | 
						|
 | 
						|
Mitigation: Safe RET
 | 
						|
--------------------
 | 
						|
 | 
						|
The mitigation works by ensuring all RET instructions speculate to
 | 
						|
a controlled location, similar to how speculation is controlled in the
 | 
						|
retpoline sequence.  To accomplish this, the __x86_return_thunk forces
 | 
						|
the CPU to mispredict every function return using a 'safe return'
 | 
						|
sequence.
 | 
						|
 | 
						|
To ensure the safety of this mitigation, the kernel must ensure that the
 | 
						|
safe return sequence is itself free from attacker interference.  In Zen3
 | 
						|
and Zen4, this is accomplished by creating a BTB alias between the
 | 
						|
untraining function srso_alias_untrain_ret() and the safe return
 | 
						|
function srso_alias_safe_ret() which results in evicting a potentially
 | 
						|
poisoned BTB entry and using that safe one for all function returns.
 | 
						|
 | 
						|
In older Zen1 and Zen2, this is accomplished using a reinterpretation
 | 
						|
technique similar to Retbleed one: srso_untrain_ret() and
 | 
						|
srso_safe_ret().
 | 
						|
 | 
						|
Checking the safe RET mitigation actually works
 | 
						|
-----------------------------------------------
 | 
						|
 | 
						|
In case one wants to validate whether the SRSO safe RET mitigation works
 | 
						|
on a kernel, one could use two performance counters
 | 
						|
 | 
						|
* PMC_0xc8 - Count of RET/RET lw retired
 | 
						|
* PMC_0xc9 - Count of RET/RET lw retired mispredicted
 | 
						|
 | 
						|
and compare the number of RETs retired properly vs those retired
 | 
						|
mispredicted, in kernel mode. Another way of specifying those events
 | 
						|
is::
 | 
						|
 | 
						|
        # perf list ex_ret_near_ret
 | 
						|
 | 
						|
        List of pre-defined events (to be used in -e or -M):
 | 
						|
 | 
						|
        core:
 | 
						|
          ex_ret_near_ret
 | 
						|
               [Retired Near Returns]
 | 
						|
          ex_ret_near_ret_mispred
 | 
						|
               [Retired Near Returns Mispredicted]
 | 
						|
 | 
						|
Either the command using the event mnemonics::
 | 
						|
 | 
						|
        # perf stat -e ex_ret_near_ret:k -e ex_ret_near_ret_mispred:k sleep 10s
 | 
						|
 | 
						|
or using the raw PMC numbers::
 | 
						|
 | 
						|
        # perf stat -e cpu/event=0xc8,umask=0/k -e cpu/event=0xc9,umask=0/k sleep 10s
 | 
						|
 | 
						|
should give the same amount. I.e., every RET retired should be
 | 
						|
mispredicted::
 | 
						|
 | 
						|
        [root@brent: ~/kernel/linux/tools/perf> ./perf stat -e cpu/event=0xc8,umask=0/k -e cpu/event=0xc9,umask=0/k sleep 10s
 | 
						|
 | 
						|
         Performance counter stats for 'sleep 10s':
 | 
						|
 | 
						|
                   137,167      cpu/event=0xc8,umask=0/k
 | 
						|
                   137,173      cpu/event=0xc9,umask=0/k
 | 
						|
 | 
						|
              10.004110303 seconds time elapsed
 | 
						|
 | 
						|
               0.000000000 seconds user
 | 
						|
               0.004462000 seconds sys
 | 
						|
 | 
						|
vs the case when the mitigation is disabled (spec_rstack_overflow=off)
 | 
						|
or not functioning properly, showing usually a lot smaller number of
 | 
						|
mispredicted retired RETs vs the overall count of retired RETs during
 | 
						|
a workload::
 | 
						|
 | 
						|
       [root@brent: ~/kernel/linux/tools/perf> ./perf stat -e cpu/event=0xc8,umask=0/k -e cpu/event=0xc9,umask=0/k sleep 10s
 | 
						|
 | 
						|
        Performance counter stats for 'sleep 10s':
 | 
						|
 | 
						|
                  201,627      cpu/event=0xc8,umask=0/k
 | 
						|
                    4,074      cpu/event=0xc9,umask=0/k
 | 
						|
 | 
						|
             10.003267252 seconds time elapsed
 | 
						|
 | 
						|
              0.002729000 seconds user
 | 
						|
              0.000000000 seconds sys
 | 
						|
 | 
						|
Also, there is a selftest which performs the above, go to
 | 
						|
tools/testing/selftests/x86/ and do::
 | 
						|
 | 
						|
        make srso
 | 
						|
        ./srso
 |