qemu-kvm/kvm-block-linux-aio-bound-ioq_submit-recursion-depth.patch
Miroslav Rezanina 6c478a495d * Tue Jun 23 2026 Miroslav Rezanina <mrezanin@redhat.com> - 10.1.0-22
- kvm-blkdebug-Add-delay-ns-option.patch [RHEL-121686]
- kvm-block-Add-blk_co_start-end_request-and-BDRV_REQ_NO_Q.patch [RHEL-121686]
- kvm-block-Add-flags-parameter-to-blk_-_pdiscard.patch [RHEL-121686]
- kvm-ide-Minimal-fix-for-deadlock-between-TRIM-and-drain.patch [RHEL-121686]
- kvm-ide-Clean-up-ide_trim_co_entry-to-be-idiomatic-corou.patch [RHEL-121686]
- kvm-ide-test-Factor-out-wait_dma_completion.patch [RHEL-121686]
- kvm-ide-test-Test-reset-during-TRIM.patch [RHEL-121686]
- kvm-block-graph-lock-fix-missed-wakeup-in-bdrv_graph_co_.patch [RHEL-186384]
- kvm-block-curl-fix-curl-internal-handles-handling.patch [RHEL-186384]
- kvm-block-curl.c-Use-explicit-long-constants-in-curl_eas.patch [RHEL-186384]
- kvm-block-curl.c-Fix-CURLOPT_VERBOSE-parameter-type.patch [RHEL-186384]
- kvm-block-curl-fix-concurrent-completion-handling.patch [RHEL-186384]
- kvm-block-curl-free-s-password-in-cleanup-paths.patch [RHEL-186384]
- kvm-nvme-Kick-and-check-completions-in-BDS-context.patch [RHEL-186384]
- kvm-nvme-Note-in-which-AioContext-some-functions-run.patch [RHEL-186384]
- kvm-block-remove-detached-header-option-from-opts-after-.patch [RHEL-186384]
- kvm-block-fix-luks-amend-when-run-in-coroutine.patch [RHEL-186384]
- kvm-qed-Don-t-try-to-flush-during-incoming-migration.patch [RHEL-186384]
- kvm-block-vmdk-fix-OOB-read-in-vmdk_read_extent.patch [RHEL-186384]
- kvm-block-throttle-groups-fix-deadlock-with-iolimits-and.patch [RHEL-186384]
- kvm-throttle-group-Fix-race-condition-in-throttle_group_.patch [RHEL-186384]
- kvm-qemu-img-Fix-amend-option-parse-error-handling.patch [RHEL-186384]
- kvm-qemu-img-rebase-don-t-exceed-IO_BUF_SIZE-in-one-oper.patch [RHEL-186384]
- kvm-python-backport-drop-Python3.6-workarounds.patch [RHEL-186384]
- kvm-python-backport-Remove-deprecated-get_event_loop-cal.patch [RHEL-186384]
- kvm-python-backport-avoid-creating-additional-event-loop.patch [RHEL-186384]
- kvm-iotests-147-ensure-temporary-sockets-are-closed-befo.patch [RHEL-186384]
- kvm-iotests-151-ensure-subprocesses-are-cleaned-up.patch [RHEL-186384]
- kvm-tests-qemu-iotest-fix-iotest-024-with-qed-images.patch [RHEL-186384]
- kvm-tests-qemu-iotests-Fix-check-for-existing-file-in-_r.patch [RHEL-186384]
- kvm-async-access-bottom-half-flags-with-qatomic_read.patch [RHEL-186384]
- kvm-block-linux-aio-bound-ioq_submit-recursion-depth.patch [RHEL-186384]
- kvm-block-io-fallback-to-bounce-buffer-if-BLKZEROOUT-is-.patch [RHEL-186384]
- kvm-file-posix-populate-pwrite_zeroes_alignment.patch [RHEL-186384]
- kvm-block-use-pwrite_zeroes_alignment-when-writing-first.patch [RHEL-186384]
- kvm-iotests-add-Linux-loop-device-image-creation-test.patch [RHEL-186384]
- kvm-virtio-Fix-crash-when-sriov-pf-is-set-for-non-PCI-Ex.patch [RHEL-186384]
- kvm-virtio-scsi-pass-the-same-cdb_size-to-virtio_scsi_po.patch [RHEL-186384]
- kvm-hw-scsi-avoid-deadlock-upon-TMF-request-cancelling-w.patch [RHEL-186384]
- kvm-virtio-blk-fix-zone-report-buffer-out-of-memory-CVE-.patch [RHEL-186384]
- kvm-ide-Fix-potential-assertion-failure-on-VM-stop-for-P.patch [RHEL-186384]
- kvm-block-Create-DEFAULT_BLOCK_CONF-macro.patch [RHEL-186384]
- kvm-block-Add-more-defaults-to-DEFAULT_BLOCK_CONF.patch [RHEL-186384]
- kvm-block-mirror-check-range-when-setting-zero-bitmap-fo.patch [RHEL-186384]
- kvm-iotests-test-active-mirror-with-unaligned-small-writ.patch [RHEL-186384]
- kvm-block-mirror-fix-assertion-failure-upon-duplicate-co.patch [RHEL-186384]
- kvm-commit-Drain-nodes-across-all-of-bdrv_commit.patch [RHEL-186384]
- kvm-qemu-io-Add-aio_discard-command.patch [RHEL-186384]
- kvm-qcow2-Fix-corruption-on-discard-during-write-with-CO.patch [RHEL-186384]
- kvm-iotests-046-Test-that-discard-write_zeroes-wait-for-.patch [RHEL-186384]
- kvm-qcow2-Fix-data-loss-on-zero-write-with-detect-zeroes.patch [RHEL-186384]
- kvm-block-Fix-crash-after-setting-latency-historygram-wi.patch [RHEL-186384]
- Resolves: RHEL-121686
  (qemu-kvm hung during drain after double pause)
- Resolves: RHEL-186384
  (virt-storage: Backport stable branch fixes)
2026-06-23 09:22:28 +02:00

141 lines
5.5 KiB
Diff

From 9a6c4bad7f575826796a4c690a0fa6bbcda1f5ad Mon Sep 17 00:00:00 2001
From: "Denis V. Lunev" <den@openvz.org>
Date: Sat, 13 Jun 2026 23:03:08 +0300
Subject: [PATCH 32/52] block/linux-aio: bound ioq_submit() recursion depth
RH-Author: Kevin Wolf <kwolf@redhat.com>
RH-MergeRequest: 504: virt-storage: Backport stable branch fixes
RH-Jira: RHEL-186384
RH-Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
RH-Acked-by: Miroslav Rezanina <mrezanin@redhat.com>
RH-Commit: [25/45] 04545c714dacb2571f169b3da3c7cf493dea31bd (kmwolf/centos-qemu-kvm)
qemu_laio_process_completions() wraps its body in defer_call_begin /
defer_call_end. Inside the section, completion callbacks wake coroutines
that queue new aiocbs; laio_do_submit() defers laio_deferred_fn. At the
bottom of qemu_laio_process_completions() the defer_call_end() fires
laio_deferred_fn, which calls ioq_submit(), closing the cycle:
ioq_submit
-> io_submit(2) // some sync completions
-> qemu_laio_process_completions // defer_call_begin
-> aio_co_wake // resumes coroutine
-> laio_do_submit
-> defer_call(laio_deferred_fn, s) // enqueued
-> defer_call_end // nesting drops to 0
-> laio_deferred_fn
-> ioq_submit // +1 stack frame, loop
When io_submit(2) returns asynchronously (O_DIRECT) the cycle
terminates in one extra frame: the fresh aiocb is still in flight, no
completion is drained, no coroutine wakes, no new submission queues.
When submissions complete synchronously (non-O_DIRECT, or per-descriptor
drivers such as vmdk) each level enqueues more work for the next
defer_call_end() to drain, so recursion grows without bound and QEMU
crashes with SIGSEGV on the thread guard page.
The cycle was closed by two performance commits, each correct in
isolation:
076682885d ("block/linux-aio: convert to blk_io_plug_call() API")
-- introduced laio_deferred_fn and wired
laio_do_submit -> defer_call(laio_deferred_fn, s).
84d61e5f36 ("virtio: use defer_call() in virtio_irqfd_notify()")
-- added defer_call_begin/end around qemu_laio_process_completions
so virtio-irqfd notifications batch across a completion pass.
The supported aio=native + cache=none pairing keeps submissions
asynchronous, so the cycle stays bounded; nothing in the code enforces
that contract. Observed in production as a SIGSEGV during a backup job
configured with --cached + aio=native; reproducible on upstream with
qemu-io against vmdk.
Cap ioq_submit() recursion with a counter on LaioQueue, which is only
accessed from the AioContext home thread. On overflow, return without
submitting. The pending work is drained by s->completion_bh, which
qemu_laio_process_completions() has already scheduled on entry -- no
work is lost; one event-loop round-trip of latency is paid only when
the bound is hit, which cannot happen on a supported configuration.
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Hanna Reitz <hreitz@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20260520142503.251959-2-den@openvz.org>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
(cherry picked from commit 6864bec553b2e37699739615e604fc3c7bae0e1d)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Message-ID: <20260613200411.1808021-25-mjt@tls.msk.ru>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
block/linux-aio.c | 22 ++++++++++++++++++++++
1 file changed, 22 insertions(+)
diff --git a/block/linux-aio.c b/block/linux-aio.c
index 84397de54c..37de9b564b 100644
--- a/block/linux-aio.c
+++ b/block/linux-aio.c
@@ -36,6 +36,19 @@
/* Maximum number of requests in a batch. (default value) */
#define DEFAULT_MAX_BATCH 32
+/*
+ * Bound on how deep ioq_submit() may recurse on a single LaioQueue via the
+ * ioq_submit -> qemu_laio_process_completions -> defer_call_end ->
+ * laio_deferred_fn -> ioq_submit cycle. The cycle terminates naturally
+ * when io_submit(2) returns asynchronously (O_DIRECT), but can grow
+ * without bound when submissions complete synchronously. On overflow
+ * the caller returns without submitting; the outermost
+ * qemu_laio_process_completions() has already scheduled s->completion_bh
+ * (via qemu_bh_schedule() at the top of that function), which resumes
+ * submission from the next event-loop dispatch.
+ */
+#define IOQ_SUBMIT_MAX_DEPTH 8
+
struct qemu_laiocb {
Coroutine *co;
LinuxAioState *ctx;
@@ -61,6 +74,7 @@ typedef struct {
unsigned int in_queue;
unsigned int in_flight;
bool blocked;
+ unsigned int submit_depth;
QSIMPLEQ_HEAD(, qemu_laiocb) pending;
} LaioQueue;
@@ -331,6 +345,7 @@ static void ioq_init(LaioQueue *io_q)
io_q->in_queue = 0;
io_q->in_flight = 0;
io_q->blocked = false;
+ io_q->submit_depth = 0;
}
static void ioq_submit(LinuxAioState *s)
@@ -340,6 +355,11 @@ static void ioq_submit(LinuxAioState *s)
QEMU_UNINITIALIZED struct iocb *iocbs[MAX_EVENTS];
QSIMPLEQ_HEAD(, qemu_laiocb) completed;
+ if (s->io_q.submit_depth >= IOQ_SUBMIT_MAX_DEPTH) {
+ return;
+ }
+ s->io_q.submit_depth++;
+
do {
if (s->io_q.in_flight >= MAX_EVENTS) {
break;
@@ -385,6 +405,8 @@ static void ioq_submit(LinuxAioState *s)
* pended requests will be submitted from there.
*/
}
+
+ s->io_q.submit_depth--;
}
static uint64_t laio_max_batch(LinuxAioState *s, uint64_t dev_max_batch)
--
2.52.0