- kvm-migration-multifd-move-macros-to-multifd-header.patch [RHEL-59697] - kvm-migration-refactor-channel-discovery-mechanism.patch [RHEL-59697] - kvm-migration-Add-save_postcopy_prepare-savevm-handler.patch [RHEL-59697] - kvm-migration-ram-Implement-save_postcopy_prepare.patch [RHEL-59697] - kvm-tests-qtest-migration-consolidate-set-capabilities.patch [RHEL-59697] - kvm-migration-write-zero-pages-when-postcopy-enabled.patch [RHEL-59697] - kvm-migration-enable-multifd-and-postcopy-together.patch [RHEL-59697] - kvm-migration-Add-qtest-for-migration-over-RDMA.patch [RHEL-59697] - kvm-qtest-migration-rdma-Enforce-RLIMIT_MEMLOCK-128MB-re.patch [RHEL-59697] - kvm-qtest-migration-rdma-Add-test-for-rdma-migration-wit.patch [RHEL-59697] - kvm-tests-qtest-migration-add-postcopy-tests-with-multif.patch [RHEL-59697] - kvm-file-posix-Fix-aio-threads-performance-regression-af.patch [RHEL-96854] - kvm-block-remove-outdated-comments-about-AioContext-lock.patch [RHEL-88561] - kvm-block-move-drain-outside-of-read-locked-bdrv_reopen_.patch [RHEL-88561] - kvm-block-snapshot-move-drain-outside-of-read-locked-bdr.patch [RHEL-88561] - kvm-block-move-drain-outside-of-read-locked-bdrv_inactiv.patch [RHEL-88561] - kvm-block-mark-bdrv_parent_change_aio_context-GRAPH_RDLO.patch [RHEL-88561] - kvm-block-mark-change_aio_ctx-callback-and-instances-as-.patch [RHEL-88561] - kvm-block-mark-bdrv_child_change_aio_context-GRAPH_RDLOC.patch [RHEL-88561] - kvm-block-move-drain-outside-of-bdrv_change_aio_context-.patch [RHEL-88561] - kvm-block-move-drain-outside-of-bdrv_try_change_aio_cont.patch [RHEL-88561] - kvm-block-move-drain-outside-of-bdrv_attach_child_common.patch [RHEL-88561] - kvm-block-move-drain-outside-of-bdrv_set_backing_hd_drai.patch [RHEL-88561] - kvm-block-move-drain-outside-of-bdrv_root_attach_child.patch [RHEL-88561] - kvm-block-move-drain-outside-of-bdrv_attach_child.patch [RHEL-88561] - kvm-block-move-drain-outside-of-quorum_add_child.patch [RHEL-88561] - kvm-block-move-drain-outside-of-bdrv_root_unref_child.patch [RHEL-88561] - kvm-block-move-drain-outside-of-quorum_del_child.patch [RHEL-88561] - kvm-blockdev-drain-while-unlocked-in-internal_snapshot_a.patch [RHEL-88561] - kvm-blockdev-drain-while-unlocked-in-external_snapshot_a.patch [RHEL-88561] - kvm-block-mark-bdrv_drained_begin-and-friends-as-GRAPH_U.patch [RHEL-88561] - kvm-iotests-graph-changes-while-io-remove-image-file-aft.patch [RHEL-88561] - kvm-iotests-graph-changes-while-io-add-test-case-with-re.patch [RHEL-88561] - Resolves: RHEL-59697 (Allow multifd+postcopy features being enabled together, but only use multifd during precopy ) - Resolves: RHEL-96854 (Performance Degradation(aio=threads) between Upstream Commit b75c5f9 and 984a32f) - Resolves: RHEL-88561 (qemu graph deadlock during job-dismiss)
270 lines
11 KiB
Diff
270 lines
11 KiB
Diff
From c5b0d95edea3d1e2ef47804e0c6fe19ade4dcaed Mon Sep 17 00:00:00 2001
|
|
From: Fiona Ebner <f.ebner@proxmox.com>
|
|
Date: Fri, 30 May 2025 17:10:45 +0200
|
|
Subject: [PATCH 20/33] block: move drain outside of bdrv_change_aio_context()
|
|
and mark GRAPH_RDLOCK
|
|
|
|
RH-Author: Kevin Wolf <kwolf@redhat.com>
|
|
RH-MergeRequest: 393: block: do not drain while holding the graph lock
|
|
RH-Jira: RHEL-88561
|
|
RH-Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
|
|
RH-Acked-by: Hanna Czenczek <hreitz@redhat.com>
|
|
RH-Commit: [8/21] 26dc90de9b684044752ba2ad3831bd786e40fe72 (kmwolf/centos-qemu-kvm)
|
|
|
|
This is in preparation to mark bdrv_drained_begin() as GRAPH_UNLOCKED.
|
|
|
|
Note that even if bdrv_drained_begin() were already marked as
|
|
GRAPH_UNLOCKED, TSA would not complain about the instance in
|
|
bdrv_change_aio_context() before this change, because it is preceded
|
|
by a bdrv_graph_rdunlock_main_loop() call. It is not correct to
|
|
release the lock here, and in case the caller holds a write lock, it
|
|
wouldn't actually release the lock.
|
|
|
|
In combination with block-stream, there is a deadlock that can happen
|
|
because of this [0]. In particular, it can happen that
|
|
main thread IO thread
|
|
1. acquires write lock
|
|
in blk_co_do_preadv_part():
|
|
2. have non-zero blk->in_flight
|
|
3. try to acquire read lock
|
|
4. begin drain
|
|
|
|
Steps 3 and 4 might be switched. Draining will poll and get stuck,
|
|
because it will see the non-zero in_flight counter. But the IO thread
|
|
will not make any progress either, because it cannot acquire the read
|
|
lock.
|
|
|
|
After this change, all paths to bdrv_change_aio_context() drain:
|
|
bdrv_change_aio_context() is called by:
|
|
1. bdrv_child_cb_change_aio_ctx() which is only called via the
|
|
change_aio_ctx() callback, see below.
|
|
2. bdrv_child_change_aio_context(), see below.
|
|
3. bdrv_try_change_aio_context(), where a drained section is
|
|
introduced.
|
|
|
|
The change_aio_ctx() callback is called by:
|
|
1. bdrv_attach_child_common_abort(), where a drained section is
|
|
introduced.
|
|
2. bdrv_attach_child_common(), where a drained section is introduced.
|
|
3. bdrv_parent_change_aio_context(), see below.
|
|
|
|
bdrv_child_change_aio_context() is called by:
|
|
1. bdrv_change_aio_context(), i.e. recursive, so being in a drained
|
|
section is invariant.
|
|
2. child_job_change_aio_ctx(), which is only called via the
|
|
change_aio_ctx() callback, see above.
|
|
|
|
bdrv_parent_change_aio_context() is called by:
|
|
1. bdrv_change_aio_context(), i.e. recursive, so being in a drained
|
|
section is invariant.
|
|
|
|
This resolves all code paths. Note that bdrv_attach_child_common()
|
|
and bdrv_attach_child_common_abort() hold the graph write lock and
|
|
callers of bdrv_try_change_aio_context() might too, so they are not
|
|
actually allowed to drain either. This will be addressed in the
|
|
following commits.
|
|
|
|
More granular draining is not trivially possible, because
|
|
bdrv_change_aio_context() can recursively call itself e.g. via
|
|
bdrv_child_change_aio_context().
|
|
|
|
[0]: https://lore.kernel.org/qemu-devel/73839c04-7616-407e-b057-80ca69e63f51@virtuozzo.com/
|
|
|
|
Reported-by: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>
|
|
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
|
|
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
|
|
Message-ID: <20250530151125.955508-9-f.ebner@proxmox.com>
|
|
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
|
|
(cherry picked from commit 91ba0e1c382bd4a4b9c6a200f8a175d6ff30ab99)
|
|
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
|
|
---
|
|
block.c | 57 +++++++++++++++++++++++---------
|
|
include/block/block_int-common.h | 12 +++++++
|
|
2 files changed, 53 insertions(+), 16 deletions(-)
|
|
|
|
diff --git a/block.c b/block.c
|
|
index f7b21d8f27..af438ae7ff 100644
|
|
--- a/block.c
|
|
+++ b/block.c
|
|
@@ -106,9 +106,9 @@ static void bdrv_reopen_abort(BDRVReopenState *reopen_state);
|
|
|
|
static bool bdrv_backing_overridden(BlockDriverState *bs);
|
|
|
|
-static bool bdrv_change_aio_context(BlockDriverState *bs, AioContext *ctx,
|
|
- GHashTable *visited, Transaction *tran,
|
|
- Error **errp);
|
|
+static bool GRAPH_RDLOCK
|
|
+bdrv_change_aio_context(BlockDriverState *bs, AioContext *ctx,
|
|
+ GHashTable *visited, Transaction *tran, Error **errp);
|
|
|
|
/* If non-zero, use only whitelisted block drivers */
|
|
static int use_bdrv_whitelist;
|
|
@@ -3040,8 +3040,10 @@ static void GRAPH_WRLOCK bdrv_attach_child_common_abort(void *opaque)
|
|
|
|
/* No need to visit `child`, because it has been detached already */
|
|
visited = g_hash_table_new(NULL, NULL);
|
|
+ bdrv_drain_all_begin();
|
|
ret = s->child->klass->change_aio_ctx(s->child, s->old_parent_ctx,
|
|
visited, tran, &error_abort);
|
|
+ bdrv_drain_all_end();
|
|
g_hash_table_destroy(visited);
|
|
|
|
/* transaction is supposed to always succeed */
|
|
@@ -3122,9 +3124,11 @@ bdrv_attach_child_common(BlockDriverState *child_bs,
|
|
bool ret_child;
|
|
|
|
g_hash_table_add(visited, new_child);
|
|
+ bdrv_drain_all_begin();
|
|
ret_child = child_class->change_aio_ctx(new_child, child_ctx,
|
|
visited, aio_ctx_tran,
|
|
NULL);
|
|
+ bdrv_drain_all_end();
|
|
if (ret_child == true) {
|
|
error_free(local_err);
|
|
ret = 0;
|
|
@@ -7576,6 +7580,17 @@ typedef struct BdrvStateSetAioContext {
|
|
BlockDriverState *bs;
|
|
} BdrvStateSetAioContext;
|
|
|
|
+/*
|
|
+ * Changes the AioContext of @child to @ctx and recursively for the associated
|
|
+ * block nodes and all their children and parents. Returns true if the change is
|
|
+ * possible and the transaction @tran can be continued. Returns false and sets
|
|
+ * @errp if not and the transaction must be aborted.
|
|
+ *
|
|
+ * @visited will accumulate all visited BdrvChild objects. The caller is
|
|
+ * responsible for freeing the list afterwards.
|
|
+ *
|
|
+ * Must be called with the affected block nodes drained.
|
|
+ */
|
|
static bool GRAPH_RDLOCK
|
|
bdrv_parent_change_aio_context(BdrvChild *c, AioContext *ctx,
|
|
GHashTable *visited, Transaction *tran,
|
|
@@ -7604,6 +7619,17 @@ bdrv_parent_change_aio_context(BdrvChild *c, AioContext *ctx,
|
|
return true;
|
|
}
|
|
|
|
+/*
|
|
+ * Changes the AioContext of @c->bs to @ctx and recursively for all its children
|
|
+ * and parents. Returns true if the change is possible and the transaction @tran
|
|
+ * can be continued. Returns false and sets @errp if not and the transaction
|
|
+ * must be aborted.
|
|
+ *
|
|
+ * @visited will accumulate all visited BdrvChild objects. The caller is
|
|
+ * responsible for freeing the list afterwards.
|
|
+ *
|
|
+ * Must be called with the affected block nodes drained.
|
|
+ */
|
|
bool bdrv_child_change_aio_context(BdrvChild *c, AioContext *ctx,
|
|
GHashTable *visited, Transaction *tran,
|
|
Error **errp)
|
|
@@ -7619,10 +7645,6 @@ bool bdrv_child_change_aio_context(BdrvChild *c, AioContext *ctx,
|
|
static void bdrv_set_aio_context_clean(void *opaque)
|
|
{
|
|
BdrvStateSetAioContext *state = (BdrvStateSetAioContext *) opaque;
|
|
- BlockDriverState *bs = (BlockDriverState *) state->bs;
|
|
-
|
|
- /* Paired with bdrv_drained_begin in bdrv_change_aio_context() */
|
|
- bdrv_drained_end(bs);
|
|
|
|
g_free(state);
|
|
}
|
|
@@ -7650,10 +7672,12 @@ static TransactionActionDrv set_aio_context = {
|
|
*
|
|
* @visited will accumulate all visited BdrvChild objects. The caller is
|
|
* responsible for freeing the list afterwards.
|
|
+ *
|
|
+ * @bs must be drained.
|
|
*/
|
|
-static bool bdrv_change_aio_context(BlockDriverState *bs, AioContext *ctx,
|
|
- GHashTable *visited, Transaction *tran,
|
|
- Error **errp)
|
|
+static bool GRAPH_RDLOCK
|
|
+bdrv_change_aio_context(BlockDriverState *bs, AioContext *ctx,
|
|
+ GHashTable *visited, Transaction *tran, Error **errp)
|
|
{
|
|
BdrvChild *c;
|
|
BdrvStateSetAioContext *state;
|
|
@@ -7664,21 +7688,17 @@ static bool bdrv_change_aio_context(BlockDriverState *bs, AioContext *ctx,
|
|
return true;
|
|
}
|
|
|
|
- bdrv_graph_rdlock_main_loop();
|
|
QLIST_FOREACH(c, &bs->parents, next_parent) {
|
|
if (!bdrv_parent_change_aio_context(c, ctx, visited, tran, errp)) {
|
|
- bdrv_graph_rdunlock_main_loop();
|
|
return false;
|
|
}
|
|
}
|
|
|
|
QLIST_FOREACH(c, &bs->children, next) {
|
|
if (!bdrv_child_change_aio_context(c, ctx, visited, tran, errp)) {
|
|
- bdrv_graph_rdunlock_main_loop();
|
|
return false;
|
|
}
|
|
}
|
|
- bdrv_graph_rdunlock_main_loop();
|
|
|
|
state = g_new(BdrvStateSetAioContext, 1);
|
|
*state = (BdrvStateSetAioContext) {
|
|
@@ -7686,8 +7706,7 @@ static bool bdrv_change_aio_context(BlockDriverState *bs, AioContext *ctx,
|
|
.bs = bs,
|
|
};
|
|
|
|
- /* Paired with bdrv_drained_end in bdrv_set_aio_context_clean() */
|
|
- bdrv_drained_begin(bs);
|
|
+ assert(bs->quiesce_counter > 0);
|
|
|
|
tran_add(tran, &set_aio_context, state);
|
|
|
|
@@ -7720,6 +7739,8 @@ int bdrv_try_change_aio_context(BlockDriverState *bs, AioContext *ctx,
|
|
if (ignore_child) {
|
|
g_hash_table_add(visited, ignore_child);
|
|
}
|
|
+ bdrv_drain_all_begin();
|
|
+ bdrv_graph_rdlock_main_loop();
|
|
ret = bdrv_change_aio_context(bs, ctx, visited, tran, errp);
|
|
g_hash_table_destroy(visited);
|
|
|
|
@@ -7733,10 +7754,14 @@ int bdrv_try_change_aio_context(BlockDriverState *bs, AioContext *ctx,
|
|
if (!ret) {
|
|
/* Just run clean() callbacks. No AioContext changed. */
|
|
tran_abort(tran);
|
|
+ bdrv_graph_rdunlock_main_loop();
|
|
+ bdrv_drain_all_end();
|
|
return -EPERM;
|
|
}
|
|
|
|
tran_commit(tran);
|
|
+ bdrv_graph_rdunlock_main_loop();
|
|
+ bdrv_drain_all_end();
|
|
return 0;
|
|
}
|
|
|
|
diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h
|
|
index 307dc56ed8..5d34e6a510 100644
|
|
--- a/include/block/block_int-common.h
|
|
+++ b/include/block/block_int-common.h
|
|
@@ -987,6 +987,18 @@ struct BdrvChildClass {
|
|
bool backing_mask_protocol,
|
|
Error **errp);
|
|
|
|
+ /*
|
|
+ * Notifies the parent that the child is trying to change its AioContext.
|
|
+ * The parent may in turn change the AioContext of other nodes in the same
|
|
+ * transaction. Returns true if the change is possible and the transaction
|
|
+ * can be continued. Returns false and sets @errp if not and the transaction
|
|
+ * must be aborted.
|
|
+ *
|
|
+ * @visited will accumulate all visited BdrvChild objects. The caller is
|
|
+ * responsible for freeing the list afterwards.
|
|
+ *
|
|
+ * Must be called with the affected block nodes drained.
|
|
+ */
|
|
bool GRAPH_RDLOCK_PTR (*change_aio_ctx)(BdrvChild *child, AioContext *ctx,
|
|
GHashTable *visited,
|
|
Transaction *tran, Error **errp);
|
|
--
|
|
2.39.3
|
|
|