5b6a4a1f9e
- Include fixes to build in RHEL 9 environment (bz#1906468) - Resolves: bz#1906468 ([RHEL9][FTBFS] edk2 FTBFS on Red Hat Enterprise Linux 9.0.0 Alpha)
106 lines
3.7 KiB
Diff
106 lines
3.7 KiB
Diff
From 70c9d989107c6ac964bb437c5a4ea6ffe3214e45 Mon Sep 17 00:00:00 2001
|
||
From: Miroslav Rezanina <mrezanin@redhat.com>
|
||
Date: Mon, 10 Aug 2020 07:52:28 +0200
|
||
Subject: UefiCpuPkg/PiSmmCpuDxeSmm: pause in WaitForSemaphore() before
|
||
re-fetch
|
||
MIME-Version: 1.0
|
||
Content-Type: text/plain; charset=UTF-8
|
||
Content-Transfer-Encoding: 8bit
|
||
|
||
RH-Author: Laszlo Ersek <lersek@redhat.com>
|
||
Message-id: <20200731141037.1941-2-lersek@redhat.com>
|
||
Patchwork-id: 98121
|
||
O-Subject: [RHEL-8.3.0 edk2 PATCH 1/1] UefiCpuPkg/PiSmmCpuDxeSmm: pause in WaitForSemaphore() before re-fetch
|
||
Bugzilla: 1861718
|
||
RH-Acked-by: Vitaly Kuznetsov <vkuznets@redhat.com>
|
||
RH-Acked-by: Eduardo Habkost <ehabkost@redhat.com>
|
||
|
||
Most busy waits (spinlocks) in "UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c"
|
||
already call CpuPause() in their loop bodies; see SmmWaitForApArrival(),
|
||
APHandler(), and SmiRendezvous(). However, the "main wait" within
|
||
APHandler():
|
||
|
||
> //
|
||
> // Wait for something to happen
|
||
> //
|
||
> WaitForSemaphore (mSmmMpSyncData->CpuData[CpuIndex].Run);
|
||
|
||
doesn't do so, as WaitForSemaphore() keeps trying to acquire the semaphore
|
||
without pausing.
|
||
|
||
The performance impact is especially notable in QEMU/KVM + OVMF
|
||
virtualization with CPU overcommit (that is, when the guest has
|
||
significantly more VCPUs than the host has physical CPUs). The guest BSP
|
||
is working heavily in:
|
||
|
||
BSPHandler() [MpService.c]
|
||
PerformRemainingTasks() [PiSmmCpuDxeSmm.c]
|
||
SetUefiMemMapAttributes() [SmmCpuMemoryManagement.c]
|
||
|
||
while the many guest APs are spinning in the "Wait for something to
|
||
happen" semaphore acquisition, in APHandler(). The guest APs are
|
||
generating useless memory traffic and saturating host CPUs, hindering the
|
||
guest BSP's progress in SetUefiMemMapAttributes().
|
||
|
||
Rework the loop in WaitForSemaphore(): call CpuPause() in every iteration
|
||
after the first check fails. Due to Pause Loop Exiting (known as Pause
|
||
Filter on AMD), the host scheduler can favor the guest BSP over the guest
|
||
APs.
|
||
|
||
Running a 16 GB RAM + 512 VCPU guest on a 448 PCPU host, this patch
|
||
reduces OVMF boot time (counted until reaching grub) from 20-30 minutes to
|
||
less than 4 minutes.
|
||
|
||
The patch should benefit physical machines as well -- according to the
|
||
Intel SDM, PAUSE "Improves the performance of spin-wait loops". Adding
|
||
PAUSE to the generic WaitForSemaphore() function is considered a general
|
||
improvement.
|
||
|
||
Cc: Eric Dong <eric.dong@intel.com>
|
||
Cc: Philippe Mathieu-Daudé <philmd@redhat.com>
|
||
Cc: Rahul Kumar <rahul1.kumar@intel.com>
|
||
Cc: Ray Ni <ray.ni@intel.com>
|
||
Ref: https://bugzilla.redhat.com/show_bug.cgi?id=1861718
|
||
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
|
||
Message-Id: <20200729185217.10084-1-lersek@redhat.com>
|
||
Reviewed-by: Eric Dong <eric.dong@intel.com>
|
||
(cherry picked from commit 9001b750df64b25b14ec45a2efa1361a7b96c00a)
|
||
Signed-off-by: Miroslav Rezanina <mrezanin@redhat.com>
|
||
---
|
||
UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c | 18 +++++++++++-------
|
||
1 file changed, 11 insertions(+), 7 deletions(-)
|
||
|
||
diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c b/UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c
|
||
index 57e788c01b..4bcd217917 100644
|
||
--- a/UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c
|
||
+++ b/UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c
|
||
@@ -40,14 +40,18 @@ WaitForSemaphore (
|
||
{
|
||
UINT32 Value;
|
||
|
||
- do {
|
||
+ for (;;) {
|
||
Value = *Sem;
|
||
- } while (Value == 0 ||
|
||
- InterlockedCompareExchange32 (
|
||
- (UINT32*)Sem,
|
||
- Value,
|
||
- Value - 1
|
||
- ) != Value);
|
||
+ if (Value != 0 &&
|
||
+ InterlockedCompareExchange32 (
|
||
+ (UINT32*)Sem,
|
||
+ Value,
|
||
+ Value - 1
|
||
+ ) == Value) {
|
||
+ break;
|
||
+ }
|
||
+ CpuPause ();
|
||
+ }
|
||
return Value - 1;
|
||
}
|
||
|
||
--
|
||
2.18.4
|
||
|