* Mon Jun 02 2025 Jon Maloy <jmaloy@redhat.com> - 9.1.0-22
- kvm-migration-postcopy-Spatial-locality-page-hint-for-pr.patch [RHEL-85159] - kvm-Allow-guest-network-get-route-guest-get-load-QGA-com.patch [RHEL-91605 RHEL-91606] - Resolves: RHEL-85159 (Video stuck about 1 min after switchover phase when play one video during postcopy-preempt migration) - Resolves: RHEL-91605 ([qemu-guest-agent] Add new api 'guest-network-get-route' to allow-rpc [RHEL-9]) - Resolves: RHEL-91606 ([qemu-guest-agent] Enable 'guest-get-load' by default [RHEL-9])
This commit is contained in:
parent
add392f0f0
commit
6c856c29b8
237
kvm-migration-postcopy-Spatial-locality-page-hint-for-pr.patch
Normal file
237
kvm-migration-postcopy-Spatial-locality-page-hint-for-pr.patch
Normal file
@ -0,0 +1,237 @@
|
||||
From 1fa31324da8ebba64a44c1e9b64f7e59c29f3d75 Mon Sep 17 00:00:00 2001
|
||||
From: Peter Xu <peterx@redhat.com>
|
||||
Date: Thu, 24 Apr 2025 18:07:05 -0400
|
||||
Subject: [PATCH 1/2] migration/postcopy: Spatial locality page hint for
|
||||
preempt mode
|
||||
MIME-Version: 1.0
|
||||
Content-Type: text/plain; charset=UTF-8
|
||||
Content-Transfer-Encoding: 8bit
|
||||
|
||||
RH-Author: Peter Xu <peterx@redhat.com>
|
||||
RH-MergeRequest: 358: migration/postcopy: Spatial locality page hint for preempt mode
|
||||
RH-Jira: RHEL-85159
|
||||
RH-Acked-by: Juraj Marcin <None>
|
||||
RH-Acked-by: Daniel P. Berrangé <berrange@redhat.com>
|
||||
RH-Commit: [1/1] f5bce349c80f98428c73a3898f87d4d10ec2f4bd (peterx/qemu-kvm)
|
||||
|
||||
The preempt mode postcopy has been introduced for a while. From latency
|
||||
POV, it should always win the vanilla postcopy.
|
||||
|
||||
However there's one thing missing when preempt mode is enabled right now,
|
||||
which is the spatial locality hint when there're page requests from the
|
||||
destination side.
|
||||
|
||||
In vanilla postcopy, as long as a page request was unqueued, it will update
|
||||
the PSS of the precopy background stream, so that after a page request the
|
||||
background thread will move the pages after whatever was requested. It's
|
||||
pretty much a natural behavior when there's only one channel anyway, and
|
||||
one scanner to send the pages.
|
||||
|
||||
Preempt mode didn't follow that, because preempt mode has its own channel
|
||||
and its own PSS (which doesn't linearly scan the guest memory, but
|
||||
dedicated to resolve page requested from destination). So the page request
|
||||
process and the background migration process are completely separate.
|
||||
|
||||
This patch adds the hint explicitly for preempt mode. With that, whenever
|
||||
the preempt mode receives a page request on the source, it will service the
|
||||
remote page fault in the return path, then it'll provide a hint to the
|
||||
background thread so that we'll start sending the pages right after the
|
||||
requested ones in the background, assuming the follow up pages have a
|
||||
higher chance to be accessed later.
|
||||
|
||||
NOTE: since the background migration thread and return path thread run
|
||||
completely concurrently, it doesn't always mean the hint will be applied
|
||||
every single time. For example, it's possible that the return path thread
|
||||
receives multiple page requests in a row without the background thread
|
||||
getting the chance to consume one. In such case, the preempt thread only
|
||||
provide the hint if the previous hint has been consumed. After all,
|
||||
there's no point queuing hints when we only have one linear scanner.
|
||||
|
||||
This could measureably improve the simple sequential memory access pattern
|
||||
during postcopy (when preempt is on). For random accesses, I can measure a
|
||||
slight increase of remote page fault latency from ~500us -> ~600us, that
|
||||
could be a trade-off to have such hint mechanism, and after all that's
|
||||
still greatly improved comparing to vanilla postcopy on random (~10ms).
|
||||
|
||||
The patch is verified by our QE team in a video streaming test case, to
|
||||
reduce the pause of the video from ~1min to a few seconds when switching
|
||||
over to postcopy with preempt mode.
|
||||
|
||||
Reported-by: Xiaohui Li <xiaohli@redhat.com>
|
||||
Tested-by: Xiaohui Li <xiaohli@redhat.com>
|
||||
Reviewed-by: Juraj Marcin <jmarcin@redhat.com>
|
||||
Link: https://lore.kernel.org/r/20250424220705.195544-1-peterx@redhat.com
|
||||
Signed-off-by: Peter Xu <peterx@redhat.com>
|
||||
(cherry picked from commit 20d82622812d888478d04a2d0d8575d70eb5d749)
|
||||
Signed-off-by: Peter Xu <peterx@redhat.com>
|
||||
---
|
||||
migration/ram.c | 97 ++++++++++++++++++++++++++++++++++++++++++++++++-
|
||||
1 file changed, 96 insertions(+), 1 deletion(-)
|
||||
|
||||
diff --git a/migration/ram.c b/migration/ram.c
|
||||
index edec1a2d07..0803f85b8a 100644
|
||||
--- a/migration/ram.c
|
||||
+++ b/migration/ram.c
|
||||
@@ -112,6 +112,36 @@
|
||||
|
||||
XBZRLECacheStats xbzrle_counters;
|
||||
|
||||
+/*
|
||||
+ * This structure locates a specific location of a guest page. In QEMU,
|
||||
+ * it's described in a tuple of (ramblock, offset).
|
||||
+ */
|
||||
+struct PageLocation {
|
||||
+ RAMBlock *block;
|
||||
+ unsigned long offset;
|
||||
+};
|
||||
+typedef struct PageLocation PageLocation;
|
||||
+
|
||||
+/**
|
||||
+ * PageLocationHint: describes a hint to a page location
|
||||
+ *
|
||||
+ * @valid set if the hint is vaild and to be consumed
|
||||
+ * @location: the hint content
|
||||
+ *
|
||||
+ * In postcopy preempt mode, the urgent channel may provide hints to the
|
||||
+ * background channel, so that QEMU source can try to migrate whatever is
|
||||
+ * right after the requested urgent pages.
|
||||
+ *
|
||||
+ * This is based on the assumption that the VM (already running on the
|
||||
+ * destination side) tends to access the memory with spatial locality.
|
||||
+ * This is also the default behavior of vanilla postcopy (preempt off).
|
||||
+ */
|
||||
+struct PageLocationHint {
|
||||
+ bool valid;
|
||||
+ PageLocation location;
|
||||
+};
|
||||
+typedef struct PageLocationHint PageLocationHint;
|
||||
+
|
||||
/* used by the search for pages to send */
|
||||
struct PageSearchStatus {
|
||||
/* The migration channel used for a specific host page */
|
||||
@@ -414,6 +444,13 @@ struct RAMState {
|
||||
* RAM migration.
|
||||
*/
|
||||
unsigned int postcopy_bmap_sync_requested;
|
||||
+ /*
|
||||
+ * Page hint during postcopy when preempt mode is on. Return path
|
||||
+ * thread sets it, while background migration thread consumes it.
|
||||
+ *
|
||||
+ * Protected by @bitmap_mutex.
|
||||
+ */
|
||||
+ PageLocationHint page_hint;
|
||||
};
|
||||
typedef struct RAMState RAMState;
|
||||
|
||||
@@ -2091,6 +2128,21 @@ static void pss_host_page_finish(PageSearchStatus *pss)
|
||||
pss->host_page_start = pss->host_page_end = 0;
|
||||
}
|
||||
|
||||
+static void ram_page_hint_update(RAMState *rs, PageSearchStatus *pss)
|
||||
+{
|
||||
+ PageLocationHint *hint = &rs->page_hint;
|
||||
+
|
||||
+ /* If there's a pending hint not consumed, don't bother */
|
||||
+ if (hint->valid) {
|
||||
+ return;
|
||||
+ }
|
||||
+
|
||||
+ /* Provide a hint to the background stream otherwise */
|
||||
+ hint->location.block = pss->block;
|
||||
+ hint->location.offset = pss->page;
|
||||
+ hint->valid = true;
|
||||
+}
|
||||
+
|
||||
/*
|
||||
* Send an urgent host page specified by `pss'. Need to be called with
|
||||
* bitmap_mutex held.
|
||||
@@ -2136,6 +2188,7 @@ out:
|
||||
/* For urgent requests, flush immediately if sent */
|
||||
if (sent) {
|
||||
qemu_fflush(pss->pss_channel);
|
||||
+ ram_page_hint_update(rs, pss);
|
||||
}
|
||||
return ret;
|
||||
}
|
||||
@@ -2223,6 +2276,30 @@ static int ram_save_host_page(RAMState *rs, PageSearchStatus *pss)
|
||||
return (res < 0 ? res : pages);
|
||||
}
|
||||
|
||||
+static bool ram_page_hint_valid(RAMState *rs)
|
||||
+{
|
||||
+ /* There's only page hint during postcopy preempt mode */
|
||||
+ if (!postcopy_preempt_active()) {
|
||||
+ return false;
|
||||
+ }
|
||||
+
|
||||
+ return rs->page_hint.valid;
|
||||
+}
|
||||
+
|
||||
+static void ram_page_hint_collect(RAMState *rs, RAMBlock **block,
|
||||
+ unsigned long *page)
|
||||
+{
|
||||
+ PageLocationHint *hint = &rs->page_hint;
|
||||
+
|
||||
+ assert(hint->valid);
|
||||
+
|
||||
+ *block = hint->location.block;
|
||||
+ *page = hint->location.offset;
|
||||
+
|
||||
+ /* Mark the hint consumed */
|
||||
+ hint->valid = false;
|
||||
+}
|
||||
+
|
||||
/**
|
||||
* ram_find_and_save_block: finds a dirty page and sends it to f
|
||||
*
|
||||
@@ -2239,6 +2316,8 @@ static int ram_save_host_page(RAMState *rs, PageSearchStatus *pss)
|
||||
static int ram_find_and_save_block(RAMState *rs)
|
||||
{
|
||||
PageSearchStatus *pss = &rs->pss[RAM_CHANNEL_PRECOPY];
|
||||
+ unsigned long next_page;
|
||||
+ RAMBlock *next_block;
|
||||
int pages = 0;
|
||||
|
||||
/* No dirty page as there is zero RAM */
|
||||
@@ -2258,7 +2337,14 @@ static int ram_find_and_save_block(RAMState *rs)
|
||||
rs->last_page = 0;
|
||||
}
|
||||
|
||||
- pss_init(pss, rs->last_seen_block, rs->last_page);
|
||||
+ if (ram_page_hint_valid(rs)) {
|
||||
+ ram_page_hint_collect(rs, &next_block, &next_page);
|
||||
+ } else {
|
||||
+ next_block = rs->last_seen_block;
|
||||
+ next_page = rs->last_page;
|
||||
+ }
|
||||
+
|
||||
+ pss_init(pss, next_block, next_page);
|
||||
|
||||
while (true){
|
||||
if (!get_queued_page(rs, pss)) {
|
||||
@@ -2392,6 +2478,13 @@ static void ram_save_cleanup(void *opaque)
|
||||
migration_ops = NULL;
|
||||
}
|
||||
|
||||
+static void ram_page_hint_reset(PageLocationHint *hint)
|
||||
+{
|
||||
+ hint->location.block = NULL;
|
||||
+ hint->location.offset = 0;
|
||||
+ hint->valid = false;
|
||||
+}
|
||||
+
|
||||
static void ram_state_reset(RAMState *rs)
|
||||
{
|
||||
int i;
|
||||
@@ -2404,6 +2497,8 @@ static void ram_state_reset(RAMState *rs)
|
||||
rs->last_page = 0;
|
||||
rs->last_version = ram_list.version;
|
||||
rs->xbzrle_started = false;
|
||||
+
|
||||
+ ram_page_hint_reset(&rs->page_hint);
|
||||
}
|
||||
|
||||
#define MAX_WAIT 50 /* ms, half buffered_file limit */
|
||||
--
|
||||
2.48.1
|
||||
|
@ -13,7 +13,7 @@
|
||||
#
|
||||
# You can get the list of RPC commands using "qemu-ga --allow-rpcs='?'".
|
||||
# There should be no spaces between commas and commands in the allow list.
|
||||
FILTER_RPC_ARGS="--allow-rpcs=guest-sync-delimited,guest-sync,guest-ping,guest-get-time,guest-set-time,guest-info,guest-shutdown,guest-fsfreeze-status,guest-fsfreeze-freeze,guest-fsfreeze-freeze-list,guest-fsfreeze-thaw,guest-fstrim,guest-suspend-disk,guest-suspend-ram,guest-suspend-hybrid,guest-network-get-interfaces,guest-get-vcpus,guest-set-vcpus,guest-get-disks,guest-get-fsinfo,guest-set-user-password,guest-get-memory-blocks,guest-set-memory-blocks,guest-get-memory-block-info,guest-get-host-name,guest-get-users,guest-get-timezone,guest-get-osinfo,guest-get-devices,guest-ssh-get-authorized-keys,guest-ssh-add-authorized-keys,guest-ssh-remove-authorized-keys,guest-get-diskstats,guest-get-cpustats"
|
||||
FILTER_RPC_ARGS="--allow-rpcs=guest-sync-delimited,guest-sync,guest-ping,guest-get-time,guest-set-time,guest-info,guest-shutdown,guest-fsfreeze-status,guest-fsfreeze-freeze,guest-fsfreeze-freeze-list,guest-fsfreeze-thaw,guest-fstrim,guest-suspend-disk,guest-suspend-ram,guest-suspend-hybrid,guest-network-get-interfaces,guest-get-vcpus,guest-set-vcpus,guest-get-disks,guest-get-fsinfo,guest-set-user-password,guest-get-memory-blocks,guest-set-memory-blocks,guest-get-memory-block-info,guest-get-host-name,guest-get-users,guest-get-timezone,guest-get-osinfo,guest-get-devices,guest-ssh-get-authorized-keys,guest-ssh-add-authorized-keys,guest-ssh-remove-authorized-keys,guest-get-diskstats,guest-get-cpustats,guest-network-get-route,guest-get-load"
|
||||
|
||||
# Fsfreeze hook script specification.
|
||||
#
|
||||
|
@ -149,7 +149,7 @@ Obsoletes: %{name}-block-ssh <= %{epoch}:%{version} \
|
||||
Summary: QEMU is a machine emulator and virtualizer
|
||||
Name: qemu-kvm
|
||||
Version: 9.1.0
|
||||
Release: 21%{?rcrel}%{?dist}%{?cc_suffix}
|
||||
Release: 22%{?rcrel}%{?dist}%{?cc_suffix}
|
||||
# Epoch because we pushed a qemu-1.0 package. AIUI this can't ever be dropped
|
||||
# Epoch 15 used for RHEL 8
|
||||
# Epoch 17 used for RHEL 9 (due to release versioning offset in RHEL 8.5)
|
||||
@ -567,6 +567,8 @@ Patch197: kvm-hw-i386-Fix-machine-type-compatibility.patch
|
||||
Patch198: kvm-vfio-helpers-Refactor-vfio_region_mmap-error-handlin.patch
|
||||
# For RHEL-88533 - Improve VFIO mmapping performance with huge pfnmaps
|
||||
Patch199: kvm-vfio-helpers-Align-mmaps.patch
|
||||
# For RHEL-85159 - Video stuck about 1 min after switchover phase when play one video during postcopy-preempt migration
|
||||
Patch200: kvm-migration-postcopy-Spatial-locality-page-hint-for-pr.patch
|
||||
|
||||
%if %{have_clang}
|
||||
BuildRequires: clang
|
||||
@ -1642,6 +1644,16 @@ useradd -r -u 107 -g qemu -G kvm -d / -s /sbin/nologin \
|
||||
%endif
|
||||
|
||||
%changelog
|
||||
* Mon Jun 02 2025 Jon Maloy <jmaloy@redhat.com> - 9.1.0-22
|
||||
- kvm-migration-postcopy-Spatial-locality-page-hint-for-pr.patch [RHEL-85159]
|
||||
- kvm-Allow-guest-network-get-route-guest-get-load-QGA-com.patch [RHEL-91605 RHEL-91606]
|
||||
- Resolves: RHEL-85159
|
||||
(Video stuck about 1 min after switchover phase when play one video during postcopy-preempt migration)
|
||||
- Resolves: RHEL-91605
|
||||
([qemu-guest-agent] Add new api 'guest-network-get-route' to allow-rpc [RHEL-9])
|
||||
- Resolves: RHEL-91606
|
||||
([qemu-guest-agent] Enable 'guest-get-load' by default [RHEL-9])
|
||||
|
||||
* Mon May 26 2025 Jon Maloy <jmaloy@redhat.com> - 9.1.0-21
|
||||
- kvm-meson-configure-add-valgrind-option-en-dis-able-valg.patch [RHEL-88153]
|
||||
- kvm-distro-add-an-explicit-valgrind-devel-build-dep.patch [RHEL-88153]
|
||||
|
Loading…
Reference in New Issue
Block a user