* Tue Mar 25 2025 Miroslav Rezanina <mrezanin@redhat.com> - 9.1.0-16
- kvm-migration-Fix-UAF-for-incoming-migration-on-Migratio.patch [RHEL-69776] - kvm-scripts-improve-error-from-qemu-trace-stap-on-missin.patch [RHEL-83535] - kvm-Recommend-systemtap-client-from-qemu-tools.patch [RHEL-83535] - Resolves: RHEL-69776 ([rhel10]Guest crashed on the target host when the migration was canceled) - Resolves: RHEL-83535 ([Qemu RHEL-10] qemu-trace-stap should handle lack of stap more gracefully)
This commit is contained in:
parent
a583c50c4e
commit
c840a0db5d
180
kvm-migration-Fix-UAF-for-incoming-migration-on-Migratio.patch
Normal file
180
kvm-migration-Fix-UAF-for-incoming-migration-on-Migratio.patch
Normal file
@ -0,0 +1,180 @@
|
|||||||
|
From f479e7cd7cc4e48f7383c4ee78609bb7605b70c6 Mon Sep 17 00:00:00 2001
|
||||||
|
From: Peter Xu <peterx@redhat.com>
|
||||||
|
Date: Thu, 20 Feb 2025 08:24:59 -0500
|
||||||
|
Subject: [PATCH 1/3] migration: Fix UAF for incoming migration on
|
||||||
|
MigrationState
|
||||||
|
|
||||||
|
RH-Author: Peter Xu <peterx@redhat.com>
|
||||||
|
RH-MergeRequest: 342: migration: Fix UAF for incoming migration on MigrationState
|
||||||
|
RH-Jira: RHEL-69776
|
||||||
|
RH-Acked-by: Juraj Marcin <None>
|
||||||
|
RH-Acked-by: Miroslav Rezanina <mrezanin@redhat.com>
|
||||||
|
RH-Commit: [1/1] 65c90964461cf884167db18bedf62db7dc242573 (peterx/qemu-kvm)
|
||||||
|
|
||||||
|
On the incoming migration side, QEMU uses a coroutine to load all the VM
|
||||||
|
states. Inside, it may reference MigrationState on global states like
|
||||||
|
migration capabilities, parameters, error state, shared mutexes and more.
|
||||||
|
|
||||||
|
However there's nothing yet to make sure MigrationState won't get
|
||||||
|
destroyed (e.g. after migration_shutdown()). Meanwhile there's also no API
|
||||||
|
available to remove the incoming coroutine in migration_shutdown(),
|
||||||
|
avoiding it to access the freed elements.
|
||||||
|
|
||||||
|
There's a bug report showing this can happen and crash dest QEMU when
|
||||||
|
migration is cancelled on source.
|
||||||
|
|
||||||
|
When it happens, the dest main thread is trying to cleanup everything:
|
||||||
|
|
||||||
|
#0 qemu_aio_coroutine_enter
|
||||||
|
#1 aio_dispatch_handler
|
||||||
|
#2 aio_poll
|
||||||
|
#3 monitor_cleanup
|
||||||
|
#4 qemu_cleanup
|
||||||
|
#5 qemu_default_main
|
||||||
|
|
||||||
|
Then it found the migration incoming coroutine, schedule it (even after
|
||||||
|
migration_shutdown()), causing crash:
|
||||||
|
|
||||||
|
#0 __pthread_kill_implementation
|
||||||
|
#1 __pthread_kill_internal
|
||||||
|
#2 __GI_raise
|
||||||
|
#3 __GI_abort
|
||||||
|
#4 __assert_fail_base
|
||||||
|
#5 __assert_fail
|
||||||
|
#6 qemu_mutex_lock_impl
|
||||||
|
#7 qemu_lockable_mutex_lock
|
||||||
|
#8 qemu_lockable_lock
|
||||||
|
#9 qemu_lockable_auto_lock
|
||||||
|
#10 migrate_set_error
|
||||||
|
#11 process_incoming_migration_co
|
||||||
|
#12 coroutine_trampoline
|
||||||
|
|
||||||
|
To fix it, take a refcount after an incoming setup is properly done when
|
||||||
|
qmp_migrate_incoming() succeeded the 1st time. As it's during a QMP
|
||||||
|
handler which needs BQL, it means the main loop is still alive (without
|
||||||
|
going into cleanups, which also needs BQL).
|
||||||
|
|
||||||
|
Releasing the refcount now only until the incoming migration coroutine
|
||||||
|
finished or failed. Hence the refcount is valid for both (1) setup phase
|
||||||
|
of incoming ports, mostly IO watches (e.g. qio_channel_add_watch_full()),
|
||||||
|
and (2) the incoming coroutine itself (process_incoming_migration_co()).
|
||||||
|
|
||||||
|
Note that we can't unref in migration_incoming_state_destroy(), because
|
||||||
|
both qmp_xen_load_devices_state() and load_snapshot() will use it without
|
||||||
|
an incoming migration. Those hold BQL so they're not prone to this issue.
|
||||||
|
|
||||||
|
PS: I suspect nobody uses Xen's command at all, as it didn't register yank,
|
||||||
|
hence AFAIU the command should crash on master when trying to unregister
|
||||||
|
yank in migration_incoming_state_destroy().. but that's another story.
|
||||||
|
|
||||||
|
Also note that in some incoming failure cases we may not always unref the
|
||||||
|
MigrationState refcount, which is a trade-off to keep things simple. We
|
||||||
|
could make it accurate, but it can be an overkill. Some examples:
|
||||||
|
|
||||||
|
- Unlike most of the rest protocols, socket_start_incoming_migration()
|
||||||
|
may create net listener after incoming port setup sucessfully.
|
||||||
|
It means we can't unref in migration_channel_process_incoming() as a
|
||||||
|
generic path because socket protocol might keep using MigrationState.
|
||||||
|
|
||||||
|
- For either socket or file, multiple IO watches might be created, it
|
||||||
|
means logically each IO watch needs to take one refcount for
|
||||||
|
MigrationState so as to be 100% accurate on ownership of refcount taken.
|
||||||
|
|
||||||
|
In general, we at least need per-protocol handling to make it accurate,
|
||||||
|
which can be an overkill if we know incoming failed after all. Add a short
|
||||||
|
comment to explain that when taking the refcount in qmp_migrate_incoming().
|
||||||
|
|
||||||
|
Bugzilla: https://issues.redhat.com/browse/RHEL-69775
|
||||||
|
Tested-by: Yan Fu <yafu@redhat.com>
|
||||||
|
Signed-off-by: Peter Xu <peterx@redhat.com>
|
||||||
|
Reviewed-by: Fabiano Rosas <farosas@suse.de>
|
||||||
|
Message-ID: <20250220132459.512610-1-peterx@redhat.com>
|
||||||
|
Signed-off-by: Fabiano Rosas <farosas@suse.de>
|
||||||
|
(cherry picked from commit d657a14de5d597bbfe7b54e4c4f0646f440e98ad)
|
||||||
|
Signed-off-by: Peter Xu <peterx@redhat.com>
|
||||||
|
---
|
||||||
|
migration/migration.c | 40 ++++++++++++++++++++++++++++++++++++++--
|
||||||
|
1 file changed, 38 insertions(+), 2 deletions(-)
|
||||||
|
|
||||||
|
diff --git a/migration/migration.c b/migration/migration.c
|
||||||
|
index 999d4cac54..aabdc45c16 100644
|
||||||
|
--- a/migration/migration.c
|
||||||
|
+++ b/migration/migration.c
|
||||||
|
@@ -115,6 +115,27 @@ static void migration_downtime_start(MigrationState *s)
|
||||||
|
s->downtime_start = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
|
||||||
|
}
|
||||||
|
|
||||||
|
+/*
|
||||||
|
+ * This is unfortunate: incoming migration actually needs the outgoing
|
||||||
|
+ * migration state (MigrationState) to be there too, e.g. to query
|
||||||
|
+ * capabilities, parameters, using locks, setup errors, etc.
|
||||||
|
+ *
|
||||||
|
+ * NOTE: when calling this, making sure current_migration exists and not
|
||||||
|
+ * been freed yet! Otherwise trying to access the refcount is already
|
||||||
|
+ * an use-after-free itself..
|
||||||
|
+ *
|
||||||
|
+ * TODO: Move shared part of incoming / outgoing out into separate object.
|
||||||
|
+ * Then this is not needed.
|
||||||
|
+ */
|
||||||
|
+static void migrate_incoming_ref_outgoing_state(void)
|
||||||
|
+{
|
||||||
|
+ object_ref(migrate_get_current());
|
||||||
|
+}
|
||||||
|
+static void migrate_incoming_unref_outgoing_state(void)
|
||||||
|
+{
|
||||||
|
+ object_unref(migrate_get_current());
|
||||||
|
+}
|
||||||
|
+
|
||||||
|
static void migration_downtime_end(MigrationState *s)
|
||||||
|
{
|
||||||
|
int64_t now = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
|
||||||
|
@@ -821,7 +842,7 @@ process_incoming_migration_co(void *opaque)
|
||||||
|
* postcopy thread.
|
||||||
|
*/
|
||||||
|
trace_process_incoming_migration_co_postcopy_end_main();
|
||||||
|
- return;
|
||||||
|
+ goto out;
|
||||||
|
}
|
||||||
|
/* Else if something went wrong then just fall out of the normal exit */
|
||||||
|
}
|
||||||
|
@@ -837,7 +858,8 @@ process_incoming_migration_co(void *opaque)
|
||||||
|
}
|
||||||
|
|
||||||
|
migration_bh_schedule(process_incoming_migration_bh, mis);
|
||||||
|
- return;
|
||||||
|
+ goto out;
|
||||||
|
+
|
||||||
|
fail:
|
||||||
|
migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,
|
||||||
|
MIGRATION_STATUS_FAILED);
|
||||||
|
@@ -854,6 +876,9 @@ fail:
|
||||||
|
|
||||||
|
exit(EXIT_FAILURE);
|
||||||
|
}
|
||||||
|
+out:
|
||||||
|
+ /* Pairs with the refcount taken in qmp_migrate_incoming() */
|
||||||
|
+ migrate_incoming_unref_outgoing_state();
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
@@ -1875,6 +1900,17 @@ void qmp_migrate_incoming(const char *uri, bool has_channels,
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
+ /*
|
||||||
|
+ * Making sure MigrationState is available until incoming migration
|
||||||
|
+ * completes.
|
||||||
|
+ *
|
||||||
|
+ * NOTE: QEMU _might_ leak this refcount in some failure paths, but
|
||||||
|
+ * that's OK. This is the minimum change we need to at least making
|
||||||
|
+ * sure success case is clean on the refcount. We can try harder to
|
||||||
|
+ * make it accurate for any kind of failures, but it might be an
|
||||||
|
+ * overkill and doesn't bring us much benefit.
|
||||||
|
+ */
|
||||||
|
+ migrate_incoming_ref_outgoing_state();
|
||||||
|
once = false;
|
||||||
|
}
|
||||||
|
|
||||||
|
--
|
||||||
|
2.39.3
|
||||||
|
|
@ -0,0 +1,89 @@
|
|||||||
|
From 804d6d4fe3e5fde1fe5e21b5c4a33b6f279fe407 Mon Sep 17 00:00:00 2001
|
||||||
|
From: =?UTF-8?q?Daniel=20P=2E=20Berrang=C3=A9?= <berrange@redhat.com>
|
||||||
|
Date: Fri, 6 Dec 2024 11:45:24 +0000
|
||||||
|
Subject: [PATCH 2/3] scripts: improve error from qemu-trace-stap on missing
|
||||||
|
'stap'
|
||||||
|
MIME-Version: 1.0
|
||||||
|
Content-Type: text/plain; charset=UTF-8
|
||||||
|
Content-Transfer-Encoding: 8bit
|
||||||
|
|
||||||
|
RH-Author: Daniel P. Berrangé <berrange@redhat.com>
|
||||||
|
RH-MergeRequest: 346: scripts: improve error from qemu-trace-stap on missing 'stap'
|
||||||
|
RH-Jira: RHEL-83535
|
||||||
|
RH-Acked-by: Miroslav Rezanina <mrezanin@redhat.com>
|
||||||
|
RH-Commit: [1/2] 2b2425b5c8cd8d59e9d8d4b228d2d97156bba510 (berrange/centos-src-qemu)
|
||||||
|
|
||||||
|
If the 'stap' binary is missing in $PATH, a huge trace is thrown
|
||||||
|
|
||||||
|
$ qemu-trace-stap list /usr/bin/qemu-system-x86_64
|
||||||
|
Traceback (most recent call last):
|
||||||
|
File "/usr/bin/qemu-trace-stap", line 169, in <module>
|
||||||
|
main()
|
||||||
|
File "/usr/bin/qemu-trace-stap", line 165, in main
|
||||||
|
args.func(args)
|
||||||
|
File "/usr/bin/qemu-trace-stap", line 83, in cmd_run
|
||||||
|
subprocess.call(stapargs)
|
||||||
|
File "/usr/lib64/python3.12/subprocess.py", line 389, in call
|
||||||
|
with Popen(*popenargs, **kwargs) as p:
|
||||||
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
File "/usr/lib64/python3.12/subprocess.py", line 1026, in {}init{}
|
||||||
|
self._execute_child(args, executable, preexec_fn, close_fds,
|
||||||
|
File "/usr/lib64/python3.12/subprocess.py", line 1955, in _execute_child
|
||||||
|
raise child_exception_type(errno_num, err_msg, err_filename)
|
||||||
|
FileNotFoundError: [Errno 2] No such file or directory: 'stap'
|
||||||
|
|
||||||
|
With this change the user now gets
|
||||||
|
|
||||||
|
$ qemu-trace-stap list /usr/bin/qemu-system-x86_64
|
||||||
|
Unable to find 'stap' in $PATH
|
||||||
|
|
||||||
|
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
|
||||||
|
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
|
||||||
|
Message-id: 20241206114524.1666664-1-berrange@redhat.com
|
||||||
|
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
|
||||||
|
(cherry picked from commit 9976be3911a2d0503f026ae37c17077273bf30ee)
|
||||||
|
---
|
||||||
|
scripts/qemu-trace-stap | 6 ++++--
|
||||||
|
1 file changed, 4 insertions(+), 2 deletions(-)
|
||||||
|
|
||||||
|
diff --git a/scripts/qemu-trace-stap b/scripts/qemu-trace-stap
|
||||||
|
index eb6e951ff2..e983460ee7 100755
|
||||||
|
--- a/scripts/qemu-trace-stap
|
||||||
|
+++ b/scripts/qemu-trace-stap
|
||||||
|
@@ -56,6 +56,7 @@ def tapset_dir(binary):
|
||||||
|
|
||||||
|
|
||||||
|
def cmd_run(args):
|
||||||
|
+ stap = which("stap")
|
||||||
|
prefix = probe_prefix(args.binary)
|
||||||
|
tapsets = tapset_dir(args.binary)
|
||||||
|
|
||||||
|
@@ -76,7 +77,7 @@ def cmd_run(args):
|
||||||
|
|
||||||
|
# We request an 8MB buffer, since the stap default 1MB buffer
|
||||||
|
# can be easily overflowed by frequently firing QEMU traces
|
||||||
|
- stapargs = ["stap", "-s", "8", "-I", tapsets ]
|
||||||
|
+ stapargs = [stap, "-s", "8", "-I", tapsets ]
|
||||||
|
if args.pid is not None:
|
||||||
|
stapargs.extend(["-x", args.pid])
|
||||||
|
stapargs.extend(["-e", script])
|
||||||
|
@@ -84,6 +85,7 @@ def cmd_run(args):
|
||||||
|
|
||||||
|
|
||||||
|
def cmd_list(args):
|
||||||
|
+ stap = which("stap")
|
||||||
|
tapsets = tapset_dir(args.binary)
|
||||||
|
|
||||||
|
if args.verbose:
|
||||||
|
@@ -96,7 +98,7 @@ def cmd_list(args):
|
||||||
|
|
||||||
|
if verbose:
|
||||||
|
print("Listing probes with name '%s'" % script)
|
||||||
|
- proc = subprocess.Popen(["stap", "-I", tapsets, "-l", script],
|
||||||
|
+ proc = subprocess.Popen([stap, "-I", tapsets, "-l", script],
|
||||||
|
stdout=subprocess.PIPE,
|
||||||
|
universal_newlines=True)
|
||||||
|
out, err = proc.communicate()
|
||||||
|
--
|
||||||
|
2.39.3
|
||||||
|
|
@ -143,7 +143,7 @@ Obsoletes: %{name}-block-ssh <= %{epoch}:%{version} \
|
|||||||
Summary: QEMU is a machine emulator and virtualizer
|
Summary: QEMU is a machine emulator and virtualizer
|
||||||
Name: qemu-kvm
|
Name: qemu-kvm
|
||||||
Version: 9.1.0
|
Version: 9.1.0
|
||||||
Release: 15%{?rcrel}%{?dist}%{?cc_suffix}
|
Release: 16%{?rcrel}%{?dist}%{?cc_suffix}
|
||||||
# Epoch because we pushed a qemu-1.0 package. AIUI this can't ever be dropped
|
# Epoch because we pushed a qemu-1.0 package. AIUI this can't ever be dropped
|
||||||
# Epoch 15 used for RHEL 8
|
# Epoch 15 used for RHEL 8
|
||||||
# Epoch 17 used for RHEL 9 (due to release versioning offset in RHEL 8.5)
|
# Epoch 17 used for RHEL 9 (due to release versioning offset in RHEL 8.5)
|
||||||
@ -482,6 +482,10 @@ Patch152: kvm-iotests-Add-filter_qtest.patch
|
|||||||
Patch153: kvm-iotests-Add-qsd-migrate-case.patch
|
Patch153: kvm-iotests-Add-qsd-migrate-case.patch
|
||||||
# For RHEL-54670 - Provide QMP command for block device reactivation after migration [rhel-10.0]
|
# For RHEL-54670 - Provide QMP command for block device reactivation after migration [rhel-10.0]
|
||||||
Patch154: kvm-iotests-Add-NBD-based-tests-for-inactive-nodes.patch
|
Patch154: kvm-iotests-Add-NBD-based-tests-for-inactive-nodes.patch
|
||||||
|
# For RHEL-69776 - [rhel10]Guest crashed on the target host when the migration was canceled
|
||||||
|
Patch155: kvm-migration-Fix-UAF-for-incoming-migration-on-Migratio.patch
|
||||||
|
# For RHEL-83535 - [Qemu RHEL-10] qemu-trace-stap should handle lack of stap more gracefully
|
||||||
|
Patch156: kvm-scripts-improve-error-from-qemu-trace-stap-on-missin.patch
|
||||||
|
|
||||||
%if %{have_clang}
|
%if %{have_clang}
|
||||||
BuildRequires: clang
|
BuildRequires: clang
|
||||||
@ -637,6 +641,7 @@ This package provides documentation and auxiliary programs used with %{name}.
|
|||||||
|
|
||||||
%package tools
|
%package tools
|
||||||
Summary: %{name} support tools
|
Summary: %{name} support tools
|
||||||
|
Recommends: systemtap-client
|
||||||
%description tools
|
%description tools
|
||||||
%{name}-tools provides various tools related to %{name} usage.
|
%{name}-tools provides various tools related to %{name} usage.
|
||||||
|
|
||||||
@ -1548,6 +1553,15 @@ useradd -r -u 107 -g qemu -G kvm -d / -s /sbin/nologin \
|
|||||||
%endif
|
%endif
|
||||||
|
|
||||||
%changelog
|
%changelog
|
||||||
|
* Tue Mar 25 2025 Miroslav Rezanina <mrezanin@redhat.com> - 9.1.0-16
|
||||||
|
- kvm-migration-Fix-UAF-for-incoming-migration-on-Migratio.patch [RHEL-69776]
|
||||||
|
- kvm-scripts-improve-error-from-qemu-trace-stap-on-missin.patch [RHEL-83535]
|
||||||
|
- kvm-Recommend-systemtap-client-from-qemu-tools.patch [RHEL-83535]
|
||||||
|
- Resolves: RHEL-69776
|
||||||
|
([rhel10]Guest crashed on the target host when the migration was canceled)
|
||||||
|
- Resolves: RHEL-83535
|
||||||
|
([Qemu RHEL-10] qemu-trace-stap should handle lack of stap more gracefully)
|
||||||
|
|
||||||
* Mon Feb 17 2025 Miroslav Rezanina <mrezanin@redhat.com> - 9.1.0-15
|
* Mon Feb 17 2025 Miroslav Rezanina <mrezanin@redhat.com> - 9.1.0-15
|
||||||
- kvm-migration-Add-helper-to-get-target-runstate.patch [RHEL-54670]
|
- kvm-migration-Add-helper-to-get-target-runstate.patch [RHEL-54670]
|
||||||
- kvm-qmp-cont-Only-activate-disks-if-migration-completed.patch [RHEL-54670]
|
- kvm-qmp-cont-Only-activate-disks-if-migration-completed.patch [RHEL-54670]
|
||||||
|
Loading…
Reference in New Issue
Block a user