qemu-kvm/kvm-vfio-helpers-Align-mmaps.patch
Jon Maloy add392f0f0 * Mon May 26 2025 Jon Maloy <jmaloy@redhat.com> - 9.1.0-21
- kvm-meson-configure-add-valgrind-option-en-dis-able-valg.patch [RHEL-88153]
- kvm-distro-add-an-explicit-valgrind-devel-build-dep.patch [RHEL-88153]
- kvm-hw-i386-Fix-machine-type-compatibility.patch [RHEL-91307]
- kvm-vfio-helpers-Refactor-vfio_region_mmap-error-handlin.patch [RHEL-88533]
- kvm-vfio-helpers-Align-mmaps.patch [RHEL-88533]
- Resolves: RHEL-88153
  ([s390x] valgrind not working with qemu-kvm for non-x86 builds)
- Resolves: RHEL-91307
  (Fix x86 M-type compats)
- Resolves: RHEL-88533
  (Improve VFIO mmapping performance with  huge  pfnmaps)
2025-05-26 16:19:15 -04:00

101 lines
4.3 KiB
Diff

From 0e733c43122688a40b0bad9cf9af43ac3655fa30 Mon Sep 17 00:00:00 2001
From: Alex Williamson <alex.williamson@redhat.com>
Date: Tue, 22 Oct 2024 14:08:29 -0600
Subject: [PATCH 5/5] vfio/helpers: Align mmaps
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
RH-Author: Donald Dutile <None>
RH-MergeRequest: 366: Improve VFIO mmapping performance with huge pfnmaps
RH-Jira: RHEL-88533
RH-Acked-by: Cédric Le Goater <clg@redhat.com>
RH-Acked-by: Alex Williamson <None>
RH-Commit: [2/2] f0e99cf993f82796352376bc7280342729ea5624 (ddutile/qemu-kvm)
Thanks to work by Peter Xu, support is introduced in Linux v6.12 to
allow pfnmap insertions at PMD and PUD levels of the page table. This
means that provided a properly aligned mmap, the vfio driver is able
to map MMIO at significantly larger intervals than PAGE_SIZE. For
example on x86_64 (the only architecture currently supporting huge
pfnmaps for PUD), rather than 4KiB mappings, we can map device MMIO
using 2MiB and even 1GiB page table entries.
Typically mmap will already provide PMD aligned mappings, so devices
with moderately sized MMIO ranges, even GPUs with standard 256MiB BARs,
will already take advantage of this support. However in order to better
support devices exposing multi-GiB MMIO, such as 3D accelerators or GPUs
with resizable BARs enabled, we need to manually align the mmap.
There doesn't seem to be a way for userspace to easily learn about PMD
and PUD mapping level sizes, therefore this takes the simple approach
to align the mapping to the power-of-two size of the region, up to 1GiB,
which is currently the maximum alignment we care about.
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
(cherry picked from commit 00b519c0bca0e933ed22e2e6f8bca6b23f41f950)
Jira: https://issues.redhat.com/browse/RHEL-88533
Signed-off-by: Donald Dutile <ddutile@redhat.com>
---
hw/vfio/helpers.c | 32 ++++++++++++++++++++++++++++++--
1 file changed, 30 insertions(+), 2 deletions(-)
diff --git a/hw/vfio/helpers.c b/hw/vfio/helpers.c
index b9e606e364..913796f437 100644
--- a/hw/vfio/helpers.c
+++ b/hw/vfio/helpers.c
@@ -27,6 +27,7 @@
#include "trace.h"
#include "qapi/error.h"
#include "qemu/error-report.h"
+#include "qemu/units.h"
#include "monitor/monitor.h"
/*
@@ -406,8 +407,35 @@ int vfio_region_mmap(VFIORegion *region)
prot |= region->flags & VFIO_REGION_INFO_FLAG_WRITE ? PROT_WRITE : 0;
for (i = 0; i < region->nr_mmaps; i++) {
- region->mmaps[i].mmap = mmap(NULL, region->mmaps[i].size, prot,
- MAP_SHARED, region->vbasedev->fd,
+ size_t align = MIN(1ULL << ctz64(region->mmaps[i].size), 1 * GiB);
+ void *map_base, *map_align;
+
+ /*
+ * Align the mmap for more efficient mapping in the kernel. Ideally
+ * we'd know the PMD and PUD mapping sizes to use as discrete alignment
+ * intervals, but we don't. As of Linux v6.12, the largest PUD size
+ * supporting huge pfnmap is 1GiB (ARCH_SUPPORTS_PUD_PFNMAP is only set
+ * on x86_64). Align by power-of-two size, capped at 1GiB.
+ *
+ * NB. qemu_memalign() and friends actually allocate memory, whereas
+ * the region size here can exceed host memory, therefore we manually
+ * create an oversized anonymous mapping and clean it up for alignment.
+ */
+ map_base = mmap(0, region->mmaps[i].size + align, PROT_NONE,
+ MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
+ if (map_base == MAP_FAILED) {
+ ret = -errno;
+ goto no_mmap;
+ }
+
+ map_align = (void *)ROUND_UP((uintptr_t)map_base, (uintptr_t)align);
+ munmap(map_base, map_align - map_base);
+ munmap(map_align + region->mmaps[i].size,
+ align - (map_align - map_base));
+
+ region->mmaps[i].mmap = mmap(map_align, region->mmaps[i].size, prot,
+ MAP_SHARED | MAP_FIXED,
+ region->vbasedev->fd,
region->fd_offset +
region->mmaps[i].offset);
if (region->mmaps[i].mmap == MAP_FAILED) {
--
2.48.1