106 lines
3.7 KiB
Diff
106 lines
3.7 KiB
Diff
|
From 70c9d989107c6ac964bb437c5a4ea6ffe3214e45 Mon Sep 17 00:00:00 2001
|
|||
|
From: Miroslav Rezanina <mrezanin@redhat.com>
|
|||
|
Date: Mon, 10 Aug 2020 07:52:28 +0200
|
|||
|
Subject: [PATCH] UefiCpuPkg/PiSmmCpuDxeSmm: pause in WaitForSemaphore() before
|
|||
|
re-fetch
|
|||
|
MIME-Version: 1.0
|
|||
|
Content-Type: text/plain; charset=UTF-8
|
|||
|
Content-Transfer-Encoding: 8bit
|
|||
|
|
|||
|
RH-Author: Laszlo Ersek <lersek@redhat.com>
|
|||
|
Message-id: <20200731141037.1941-2-lersek@redhat.com>
|
|||
|
Patchwork-id: 98121
|
|||
|
O-Subject: [RHEL-8.3.0 edk2 PATCH 1/1] UefiCpuPkg/PiSmmCpuDxeSmm: pause in WaitForSemaphore() before re-fetch
|
|||
|
Bugzilla: 1861718
|
|||
|
RH-Acked-by: Vitaly Kuznetsov <vkuznets@redhat.com>
|
|||
|
RH-Acked-by: Eduardo Habkost <ehabkost@redhat.com>
|
|||
|
|
|||
|
Most busy waits (spinlocks) in "UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c"
|
|||
|
already call CpuPause() in their loop bodies; see SmmWaitForApArrival(),
|
|||
|
APHandler(), and SmiRendezvous(). However, the "main wait" within
|
|||
|
APHandler():
|
|||
|
|
|||
|
> //
|
|||
|
> // Wait for something to happen
|
|||
|
> //
|
|||
|
> WaitForSemaphore (mSmmMpSyncData->CpuData[CpuIndex].Run);
|
|||
|
|
|||
|
doesn't do so, as WaitForSemaphore() keeps trying to acquire the semaphore
|
|||
|
without pausing.
|
|||
|
|
|||
|
The performance impact is especially notable in QEMU/KVM + OVMF
|
|||
|
virtualization with CPU overcommit (that is, when the guest has
|
|||
|
significantly more VCPUs than the host has physical CPUs). The guest BSP
|
|||
|
is working heavily in:
|
|||
|
|
|||
|
BSPHandler() [MpService.c]
|
|||
|
PerformRemainingTasks() [PiSmmCpuDxeSmm.c]
|
|||
|
SetUefiMemMapAttributes() [SmmCpuMemoryManagement.c]
|
|||
|
|
|||
|
while the many guest APs are spinning in the "Wait for something to
|
|||
|
happen" semaphore acquisition, in APHandler(). The guest APs are
|
|||
|
generating useless memory traffic and saturating host CPUs, hindering the
|
|||
|
guest BSP's progress in SetUefiMemMapAttributes().
|
|||
|
|
|||
|
Rework the loop in WaitForSemaphore(): call CpuPause() in every iteration
|
|||
|
after the first check fails. Due to Pause Loop Exiting (known as Pause
|
|||
|
Filter on AMD), the host scheduler can favor the guest BSP over the guest
|
|||
|
APs.
|
|||
|
|
|||
|
Running a 16 GB RAM + 512 VCPU guest on a 448 PCPU host, this patch
|
|||
|
reduces OVMF boot time (counted until reaching grub) from 20-30 minutes to
|
|||
|
less than 4 minutes.
|
|||
|
|
|||
|
The patch should benefit physical machines as well -- according to the
|
|||
|
Intel SDM, PAUSE "Improves the performance of spin-wait loops". Adding
|
|||
|
PAUSE to the generic WaitForSemaphore() function is considered a general
|
|||
|
improvement.
|
|||
|
|
|||
|
Cc: Eric Dong <eric.dong@intel.com>
|
|||
|
Cc: Philippe Mathieu-Daudé <philmd@redhat.com>
|
|||
|
Cc: Rahul Kumar <rahul1.kumar@intel.com>
|
|||
|
Cc: Ray Ni <ray.ni@intel.com>
|
|||
|
Ref: https://bugzilla.redhat.com/show_bug.cgi?id=1861718
|
|||
|
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
|
|||
|
Message-Id: <20200729185217.10084-1-lersek@redhat.com>
|
|||
|
Reviewed-by: Eric Dong <eric.dong@intel.com>
|
|||
|
(cherry picked from commit 9001b750df64b25b14ec45a2efa1361a7b96c00a)
|
|||
|
Signed-off-by: Miroslav Rezanina <mrezanin@redhat.com>
|
|||
|
---
|
|||
|
UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c | 18 +++++++++++-------
|
|||
|
1 file changed, 11 insertions(+), 7 deletions(-)
|
|||
|
|
|||
|
diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c b/UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c
|
|||
|
index 57e788c..4bcd217 100644
|
|||
|
--- a/UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c
|
|||
|
+++ b/UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c
|
|||
|
@@ -40,14 +40,18 @@ WaitForSemaphore (
|
|||
|
{
|
|||
|
UINT32 Value;
|
|||
|
|
|||
|
- do {
|
|||
|
+ for (;;) {
|
|||
|
Value = *Sem;
|
|||
|
- } while (Value == 0 ||
|
|||
|
- InterlockedCompareExchange32 (
|
|||
|
- (UINT32*)Sem,
|
|||
|
- Value,
|
|||
|
- Value - 1
|
|||
|
- ) != Value);
|
|||
|
+ if (Value != 0 &&
|
|||
|
+ InterlockedCompareExchange32 (
|
|||
|
+ (UINT32*)Sem,
|
|||
|
+ Value,
|
|||
|
+ Value - 1
|
|||
|
+ ) == Value) {
|
|||
|
+ break;
|
|||
|
+ }
|
|||
|
+ CpuPause ();
|
|||
|
+ }
|
|||
|
return Value - 1;
|
|||
|
}
|
|||
|
|
|||
|
--
|
|||
|
1.8.3.1
|
|||
|
|