* Fri May 19 2023 Miroslav Rezanina <mrezanin@redhat.com> - 6.2.0-34

- kvm-migration-Handle-block-device-inactivation-failures-.patch [bz#2177957] - kvm-migration-Minor-control-flow-simplification.patch [bz#2177957] - kvm-migration-Attempt-disk-reactivation-in-more-failure-.patch [bz#2177957] - kvm-nbd-server-push-pending-frames-after-sending-reply.patch [bz#2035712] - kvm-nbd-server-Request-TCP_NODELAY.patch [bz#2035712] - Resolves: bz#2177957 (Qemu core dump if cut off nfs storage during migration) - Resolves: bz#2035712 ([qemu] Booting from Guest Image over NBD with TLS Is Slow)
2023-05-19 03:02:46 -04:00 · 2023-05-19 03:02:46 -04:00 · 0b6715de3c
commit 0b6715de3c
parent c5c2aa1409
6 changed files with 430 additions and 1 deletions
--- a/kvm-migration-Attempt-disk-reactivation-in-more-failure-.patch
+++ b/kvm-migration-Attempt-disk-reactivation-in-more-failure-.patch
@ -0,0 +1,111 @@
 From a1f2a51d1a789c46e806adb332236ca16d538bf9 Mon Sep 17 00:00:00 2001
 From: Eric Blake <eblake@redhat.com>
 Date: Tue, 2 May 2023 15:52:12 -0500
 Subject: [PATCH 3/5] migration: Attempt disk reactivation in more failure
 scenarios
 RH-Author: Eric Blake <eblake@redhat.com>
 RH-MergeRequest: 273: migration: prevent source core dump if NFS dies mid-migration
 RH-Bugzilla: 2177957
 RH-Acked-by: Miroslav Rezanina <mrezanin@redhat.com>
 RH-Acked-by: quintela1 <quintela@redhat.com>
 RH-Acked-by: Kevin Wolf <kwolf@redhat.com>
 RH-Commit: [3/3] e84bf1e7233c0273ca3136ecaa6b2cfc9c0efacb (ebblake/qemu-kvm)
 Commit fe904ea824 added a fail_inactivate label, which tries to
 reactivate disks on the source after a failure while s->state ==
 MIGRATION_STATUS_ACTIVE, but didn't actually use the label if
 qemu_savevm_state_complete_precopy() failed.  This failure to
 reactivate is also present in commit 6039dd5b1c (also covering the new
 s->state == MIGRATION_STATUS_DEVICE state) and 403d18ae (ensuring
 s->block_inactive is set more reliably).
 Consolidate the two labels back into one - no matter HOW migration is
 failed, if there is any chance we can reach vm_start() after having
 attempted inactivation, it is essential that we have tried to restart
 disks before then.  This also makes the cleanup more like
 migrate_fd_cancel().
 Suggested-by: Kevin Wolf <kwolf@redhat.com>
 Signed-off-by: Eric Blake <eblake@redhat.com>
 Message-Id: <20230502205212.134680-1-eblake@redhat.com>
 Acked-by: Peter Xu <peterx@redhat.com>
 Reviewed-by: Juan Quintela <quintela@redhat.com>
 Reviewed-by: Kevin Wolf <kwolf@redhat.com>
 Signed-off-by: Kevin Wolf <kwolf@redhat.com>
 (cherry picked from commit 6dab4c93ecfae48e2e67b984d1032c1e988d3005)
 [eblake: downstream migrate_colo() => migrate_colo_enabled()]
 Signed-off-by: Eric Blake <eblake@redhat.com>
 ---
 migration/migration.c | 24 ++++++++++++++----------
 1 file changed, 14 insertions(+), 10 deletions(-)
 diff --git a/migration/migration.c b/migration/migration.c
 index 6ba8eb0fdf..817170d52d 100644
 --- a/migration/migration.c
 +++ b/migration/migration.c
@@ -3255,6 +3255,11 @@ static void migration_completion(MigrationState *s)
                                             MIGRATION_STATUS_DEVICE);
             }
             if (ret >= 0) {
 +                /*
 +                 * Inactivate disks except in COLO, and track that we
 +                 * have done so in order to remember to reactivate
 +                 * them if migration fails or is cancelled.
 +                 */
                 s->block_inactive = !migrate_colo_enabled();
                 qemu_file_set_rate_limit(s->to_dst_file, INT64_MAX);
                 ret = qemu_savevm_state_complete_precopy(s->to_dst_file, false,
@@ -3290,13 +3295,13 @@ static void migration_completion(MigrationState *s)
         rp_error = await_return_path_close_on_source(s);
         trace_migration_return_path_end_after(rp_error);
         if (rp_error) {
 -            goto fail_invalidate;
 +            goto fail;
         }
     }
     if (qemu_file_get_error(s->to_dst_file)) {
         trace_migration_completion_file_err();
 -        goto fail_invalidate;
 +        goto fail;
     }
     if (!migrate_colo_enabled()) {
@@ -3306,26 +3311,25 @@ static void migration_completion(MigrationState *s)
     return;
 -fail_invalidate:
 -    /* If not doing postcopy, vm_start() will be called: let's regain
 -     * control on images.
 -     */
 -    if (s->state == MIGRATION_STATUS_ACTIVE ||
 -        s->state == MIGRATION_STATUS_DEVICE) {
 +fail:
 +    if (s->block_inactive && (s->state == MIGRATION_STATUS_ACTIVE ||
 +                              s->state == MIGRATION_STATUS_DEVICE)) {
 +        /*
 +         * If not doing postcopy, vm_start() will be called: let's
 +         * regain control on images.
 +         */
         Error *local_err = NULL;
         qemu_mutex_lock_iothread();
         bdrv_invalidate_cache_all(&local_err);
         if (local_err) {
             error_report_err(local_err);
 -            s->block_inactive = true;
         } else {
             s->block_inactive = false;
         }
         qemu_mutex_unlock_iothread();
     }
 -fail:
     migrate_set_state(&s->state, current_active_state,
                       MIGRATION_STATUS_FAILED);
 }
 -- 
 2.39.1
--- a/kvm-migration-Handle-block-device-inactivation-failures-.patch
+++ b/kvm-migration-Handle-block-device-inactivation-failures-.patch
@ -0,0 +1,117 @@
 From 1b07c7663b6a5c19c9303088d63c39dba7e3bb36 Mon Sep 17 00:00:00 2001
 From: Eric Blake <eblake@redhat.com>
 Date: Fri, 14 Apr 2023 10:33:58 -0500
 Subject: [PATCH 1/5] migration: Handle block device inactivation failures
 better
 RH-Author: Eric Blake <eblake@redhat.com>
 RH-MergeRequest: 273: migration: prevent source core dump if NFS dies mid-migration
 RH-Bugzilla: 2177957
 RH-Acked-by: Miroslav Rezanina <mrezanin@redhat.com>
 RH-Acked-by: quintela1 <quintela@redhat.com>
 RH-Acked-by: Kevin Wolf <kwolf@redhat.com>
 RH-Commit: [1/3] 5892c17ca0a21d824d176e7398d12f7cf991651d (ebblake/qemu-kvm)
 Consider what happens when performing a migration between two host
 machines connected to an NFS server serving multiple block devices to
 the guest, when the NFS server becomes unavailable.  The migration
 attempts to inactivate all block devices on the source (a necessary
 step before the destination can take over); but if the NFS server is
 non-responsive, the attempt to inactivate can itself fail.  When that
 happens, the destination fails to get the migrated guest (good,
 because the source wasn't able to flush everything properly):
  (qemu) qemu-kvm: load of migration failed: Input/output error
 at which point, our only hope for the guest is for the source to take
 back control.  With the current code base, the host outputs a message, but then appears to resume:
  (qemu) qemu-kvm: qemu_savevm_state_complete_precopy_non_iterable: bdrv_inactivate_all() failed (-1)
  (src qemu)info status
   VM status: running
 but a second migration attempt now asserts:
  (src qemu) qemu-kvm: ../block.c:6738: int bdrv_inactivate_recurse(BlockDriverState *): Assertion `!(bs->open_flags & BDRV_O_INACTIVE)' failed.
 Whether the guest is recoverable on the source after the first failure
 is debatable, but what we do not want is to have qemu itself fail due
 to an assertion.  It looks like the problem is as follows:
 In migration.c:migration_completion(), the source sets 'inactivate' to
 true (since COLO is not enabled), then tries
 savevm.c:qemu_savevm_state_complete_precopy() with a request to
 inactivate block devices.  In turn, this calls
 block.c:bdrv_inactivate_all(), which fails when flushing runs up
 against the non-responsive NFS server.  With savevm failing, we are
 now left in a state where some, but not all, of the block devices have
 been inactivated; but migration_completion() then jumps to 'fail'
 rather than 'fail_invalidate' and skips an attempt to reclaim those
 those disks by calling bdrv_activate_all().  Even if we do attempt to
 reclaim disks, we aren't taking note of failure there, either.
 Thus, we have reached a state where the migration engine has forgotten
 all state about whether a block device is inactive, because we did not
 set s->block_inactive in enough places; so migration allows the source
 to reach vm_start() and resume execution, violating the block layer
 invariant that the guest CPUs should not be restarted while a device
 is inactive.  Note that the code in migration.c:migrate_fd_cancel()
 will also try to reactivate all block devices if s->block_inactive was
 set, but because we failed to set that flag after the first failure,
 the source assumes it has reclaimed all devices, even though it still
 has remaining inactivated devices and does not try again.  Normally,
 qmp_cont() will also try to reactivate all disks (or correctly fail if
 the disks are not reclaimable because NFS is not yet back up), but the
 auto-resumption of the source after a migration failure does not go
 through qmp_cont().  And because we have left the block layer in an
 inconsistent state with devices still inactivated, the later migration
 attempt is hitting the assertion failure.
 Since it is important to not resume the source with inactive disks,
 this patch marks s->block_inactive before attempting inactivation,
 rather than after succeeding, in order to prevent any vm_start() until
 it has successfully reactivated all devices.
 See also https://bugzilla.redhat.com/show_bug.cgi?id=2058982
 Signed-off-by: Eric Blake <eblake@redhat.com>
 Reviewed-by: Juan Quintela <quintela@redhat.com>
 Acked-by: Lukas Straub <lukasstraub2@web.de>
 Tested-by: Lukas Straub <lukasstraub2@web.de>
 Signed-off-by: Juan Quintela <quintela@redhat.com>
 (cherry picked from commit 403d18ae384239876764bbfa111d6cc5dcb673d1)
 ---
 migration/migration.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)
 diff --git a/migration/migration.c b/migration/migration.c
 index 0885549de0..08e5e8f013 100644
 --- a/migration/migration.c
 +++ b/migration/migration.c
@@ -3256,13 +3256,11 @@ static void migration_completion(MigrationState *s)
                                             MIGRATION_STATUS_DEVICE);
             }
             if (ret >= 0) {
 +                s->block_inactive = inactivate;
                 qemu_file_set_rate_limit(s->to_dst_file, INT64_MAX);
                 ret = qemu_savevm_state_complete_precopy(s->to_dst_file, false,
                                                          inactivate);
             }
 -            if (inactivate && ret >= 0) {
 -                s->block_inactive = true;
 -            }
         }
         qemu_mutex_unlock_iothread();
@@ -3321,6 +3319,7 @@ fail_invalidate:
         bdrv_invalidate_cache_all(&local_err);
         if (local_err) {
             error_report_err(local_err);
 +            s->block_inactive = true;
         } else {
             s->block_inactive = false;
         }
 -- 
 2.39.1
--- a/kvm-migration-Minor-control-flow-simplification.patch
+++ b/kvm-migration-Minor-control-flow-simplification.patch
@ -0,0 +1,53 @@
 From e79d0506184e861350d2a3e62dd986aa03d30aa8 Mon Sep 17 00:00:00 2001
 From: Eric Blake <eblake@redhat.com>
 Date: Thu, 20 Apr 2023 09:35:51 -0500
 Subject: [PATCH 2/5] migration: Minor control flow simplification
 RH-Author: Eric Blake <eblake@redhat.com>
 RH-MergeRequest: 273: migration: prevent source core dump if NFS dies mid-migration
 RH-Bugzilla: 2177957
 RH-Acked-by: Miroslav Rezanina <mrezanin@redhat.com>
 RH-Acked-by: quintela1 <quintela@redhat.com>
 RH-Acked-by: Kevin Wolf <kwolf@redhat.com>
 RH-Commit: [2/3] f00b21b6ebd377af79af93ac18f103f8dc0309d6 (ebblake/qemu-kvm)
 No need to declare a temporary variable.
 Suggested-by: Juan Quintela <quintela@redhat.com>
 Fixes: 1df36e8c6289 ("migration: Handle block device inactivation failures better")
 Signed-off-by: Eric Blake <eblake@redhat.com>
 Reviewed-by: Juan Quintela <quintela@redhat.com>
 Signed-off-by: Juan Quintela <quintela@redhat.com>
 (cherry picked from commit 5d39f44d7ac5c63f53d4d0900ceba9521bc27e49)
 ---
 migration/migration.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)
 diff --git a/migration/migration.c b/migration/migration.c
 index 08e5e8f013..6ba8eb0fdf 100644
 --- a/migration/migration.c
 +++ b/migration/migration.c
@@ -3248,7 +3248,6 @@ static void migration_completion(MigrationState *s)
         ret = global_state_store();
         if (!ret) {
 -            bool inactivate = !migrate_colo_enabled();
             ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
             trace_migration_completion_vm_stop(ret);
             if (ret >= 0) {
@@ -3256,10 +3255,10 @@ static void migration_completion(MigrationState *s)
                                             MIGRATION_STATUS_DEVICE);
             }
             if (ret >= 0) {
 -                s->block_inactive = inactivate;
 +                s->block_inactive = !migrate_colo_enabled();
                 qemu_file_set_rate_limit(s->to_dst_file, INT64_MAX);
                 ret = qemu_savevm_state_complete_precopy(s->to_dst_file, false,
 -                                                         inactivate);
 +                                                         s->block_inactive);
             }
         }
         qemu_mutex_unlock_iothread();
 -- 
 2.39.1
--- a/kvm-nbd-server-Request-TCP_NODELAY.patch
+++ b/kvm-nbd-server-Request-TCP_NODELAY.patch
@ -0,0 +1,55 @@
 From 17c5524ada3f2ca9a9c645f540bedc5575302059 Mon Sep 17 00:00:00 2001
 From: Eric Blake <eblake@redhat.com>
 Date: Mon, 3 Apr 2023 19:40:47 -0500
 Subject: [PATCH 5/5] nbd/server: Request TCP_NODELAY
 MIME-Version: 1.0
 Content-Type: text/plain; charset=UTF-8
 Content-Transfer-Encoding: 8bit
 RH-Author: Eric Blake <eblake@redhat.com>
 RH-MergeRequest: 274: nbd: improve TLS performance of NBD server
 RH-Bugzilla: 2035712
 RH-Acked-by: Miroslav Rezanina <mrezanin@redhat.com>
 RH-Acked-by: Kevin Wolf <kwolf@redhat.com>
 RH-Acked-by: Stefano Garzarella <sgarzare@redhat.com>
 RH-Commit: [2/2] 092145077756cda2a4f849c5911031b0fc4a2134 (ebblake/qemu-kvm)
 Nagle's algorithm adds latency in order to reduce network packet
 overhead on small packets.  But when we are already using corking to
 merge smaller packets into transactional requests, the extra delay
 from TCP defaults just gets in the way (see recent commit bd2cd4a4).
 For reference, qemu as an NBD client already requests TCP_NODELAY (see
 nbd_connect() in nbd/client-connection.c); as does libnbd as a client
 [1], and nbdkit as a server [2].  Furthermore, the NBD spec recommends
 the use of TCP_NODELAY [3].
 [1] https://gitlab.com/nbdkit/libnbd/-/blob/a48a1142/generator/states-connect.c#L39
 [2] https://gitlab.com/nbdkit/nbdkit/-/blob/45b72f5b/server/sockets.c#L430
 [3] https://github.com/NetworkBlockDevice/nbd/blob/master/doc/proto.md#protocol-phases
 CC: Florian Westphal <fw@strlen.de>
 Signed-off-by: Eric Blake <eblake@redhat.com>
 Message-Id: <20230404004047.142086-1-eblake@redhat.com>
 Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
 (cherry picked from commit f1426881a827a6d3f31b65616c4a8db1e9e7c45e)
 Signed-off-by: Eric Blake <eblake@redhat.com>
 ---
 nbd/server.c | 1 +
 1 file changed, 1 insertion(+)
 diff --git a/nbd/server.c b/nbd/server.c
 index a5edc7f681..6db124cf53 100644
 --- a/nbd/server.c
 +++ b/nbd/server.c
@@ -2738,6 +2738,7 @@ void nbd_client_new(QIOChannelSocket *sioc,
     }
     client->tlsauthz = g_strdup(tlsauthz);
     client->sioc = sioc;
 +    qio_channel_set_delay(QIO_CHANNEL(sioc), false);
     object_ref(OBJECT(client->sioc));
     client->ioc = QIO_CHANNEL(sioc);
     object_ref(OBJECT(client->ioc));
 -- 
 2.39.1
--- a/kvm-nbd-server-push-pending-frames-after-sending-reply.patch
+++ b/kvm-nbd-server-push-pending-frames-after-sending-reply.patch
@ -0,0 +1,72 @@
 From 170872370c6f3c916e741eb32d80431995d7a870 Mon Sep 17 00:00:00 2001
 From: Florian Westphal <fw@strlen.de>
 Date: Fri, 24 Mar 2023 11:47:20 +0100
 Subject: [PATCH 4/5] nbd/server: push pending frames after sending reply
 RH-Author: Eric Blake <eblake@redhat.com>
 RH-MergeRequest: 274: nbd: improve TLS performance of NBD server
 RH-Bugzilla: 2035712
 RH-Acked-by: Miroslav Rezanina <mrezanin@redhat.com>
 RH-Acked-by: Kevin Wolf <kwolf@redhat.com>
 RH-Acked-by: Stefano Garzarella <sgarzare@redhat.com>
 RH-Commit: [1/2] ab92c06c48810aa40380de0433dcac4c6e4be9a5 (ebblake/qemu-kvm)
 qemu-nbd doesn't set TCP_NODELAY on the tcp socket.
 Kernel waits for more data and avoids transmission of small packets.
 Without TLS this is barely noticeable, but with TLS this really shows.
 Booting a VM via qemu-nbd on localhost (with tls) takes more than
 2 minutes on my system.  tcpdump shows frequent wait periods, where no
 packets get sent for a 40ms period.
 Add explicit (un)corking when processing (and responding to) requests.
 "TCP_CORK, &zero" after earlier "CORK, &one" will flush pending data.
 VM Boot time:
 main:    no tls:  23s, with tls: 2m45s
 patched: no tls:  14s, with tls: 15s
 VM Boot time, qemu-nbd via network (same lan):
 main:    no tls:  18s, with tls: 1m50s
 patched: no tls:  17s, with tls: 18s
 Future optimization: if we could detect if there is another pending
 request we could defer the uncork operation because more data would be
 appended.
 Signed-off-by: Florian Westphal <fw@strlen.de>
 Message-Id: <20230324104720.2498-1-fw@strlen.de>
 Reviewed-by: Eric Blake <eblake@redhat.com>
 Reviewed-by: Kevin Wolf <kwolf@redhat.com>
 Signed-off-by: Kevin Wolf <kwolf@redhat.com>
 (cherry picked from commit bd2cd4a441ded163b62371790876f28a9b834317)
 Signed-off-by: Eric Blake <eblake@redhat.com>
 ---
 nbd/server.c | 3 +++
 1 file changed, 3 insertions(+)
 diff --git a/nbd/server.c b/nbd/server.c
 index 4630dd7322..a5edc7f681 100644
 --- a/nbd/server.c
 +++ b/nbd/server.c
@@ -2647,6 +2647,8 @@ static coroutine_fn void nbd_trip(void *opaque)
         goto disconnect;
     }
 +    qio_channel_set_cork(client->ioc, true);
 +
     if (ret < 0) {
         /* It wans't -EIO, so, according to nbd_co_receive_request()
          * semantics, we should return the error to the client. */
@@ -2672,6 +2674,7 @@ static coroutine_fn void nbd_trip(void *opaque)
         goto disconnect;
     }
 +    qio_channel_set_cork(client->ioc, false);
 done:
     nbd_request_put(req);
     nbd_client_put(client);
 -- 
 2.39.1
--- a/qemu-kvm.spec
+++ b/qemu-kvm.spec
@ -83,7 +83,7 @@ Obsoletes: %1-rhev <= %{epoch}:%{version}-%{release}
 Summary: QEMU is a machine emulator and virtualizer
 Name: qemu-kvm
 Version: 6.2.0
-Release: 33%{?rcrel}%{?dist}
+Release: 34%{?rcrel}%{?dist}
 # Epoch because we pushed a qemu-1.0 package. AIUI this can't ever be dropped
 Epoch: 15
 License: GPLv2 and GPLv2+ and CC-BY
@ -654,6 +654,16 @@ Patch256: kvm-dma-helpers-prevent-dma_blk_cb-vs-dma_aio_cancel-rac.patch
 Patch257: kvm-virtio-scsi-reset-SCSI-devices-from-main-loop-thread.patch
 # For bz#2187159 - RHEL8.8 - KVM - Secure Guest crashed during booting with 248 vcpus
 Patch258: kvm-s390x-pv-Implement-a-CGS-check-helper.patch
 # For bz#2177957 - Qemu core dump if cut off nfs storage during migration
 Patch259: kvm-migration-Handle-block-device-inactivation-failures-.patch
 # For bz#2177957 - Qemu core dump if cut off nfs storage during migration
 Patch260: kvm-migration-Minor-control-flow-simplification.patch
 # For bz#2177957 - Qemu core dump if cut off nfs storage during migration
 Patch261: kvm-migration-Attempt-disk-reactivation-in-more-failure-.patch
 # For bz#2035712 - [qemu] Booting from Guest Image over NBD with TLS Is Slow
 Patch262: kvm-nbd-server-push-pending-frames-after-sending-reply.patch
 # For bz#2035712 - [qemu] Booting from Guest Image over NBD with TLS Is Slow
 Patch263: kvm-nbd-server-Request-TCP_NODELAY.patch
 BuildRequires: wget
 BuildRequires: rpm-build
@ -1823,6 +1833,17 @@ sh %{_sysconfdir}/sysconfig/modules/kvm.modules &> /dev/null || :
 %changelog
 * Fri May 19 2023 Miroslav Rezanina <mrezanin@redhat.com> - 6.2.0-34
 - kvm-migration-Handle-block-device-inactivation-failures-.patch [bz#2177957]
 - kvm-migration-Minor-control-flow-simplification.patch [bz#2177957]
 - kvm-migration-Attempt-disk-reactivation-in-more-failure-.patch [bz#2177957]
 - kvm-nbd-server-push-pending-frames-after-sending-reply.patch [bz#2035712]
 - kvm-nbd-server-Request-TCP_NODELAY.patch [bz#2035712]
 - Resolves: bz#2177957
  (Qemu core dump if cut off nfs storage during migration)
 - Resolves: bz#2035712
  ([qemu] Booting from Guest Image over NBD with TLS Is Slow)
 * Tue Apr 25 2023 Miroslav Rezanina <mrezanin@redhat.com> - 6.2.0-33
 - kvm-s390x-pv-Implement-a-CGS-check-helper.patch [bz#2187159]
 - Resolves: bz#2187159