268 lines
9.7 KiB
Diff
268 lines
9.7 KiB
Diff
From 02714666a525ea4dd8756f66fae28163fb685d05 Mon Sep 17 00:00:00 2001
|
|
Message-Id: <02714666a525ea4dd8756f66fae28163fb685d05@dist-git>
|
|
From: Peter Krempa <pkrempa@redhat.com>
|
|
Date: Tue, 23 Jun 2020 12:24:06 +0200
|
|
Subject: [PATCH] kbase: Add document outlining internals of incremental backup
|
|
in qemu
|
|
MIME-Version: 1.0
|
|
Content-Type: text/plain; charset=UTF-8
|
|
Content-Transfer-Encoding: 8bit
|
|
|
|
Outline the basics and how to integrate with externally created
|
|
overlays. Other topics will continue later.
|
|
|
|
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
|
|
Reviewed-by: Eric Blake <eblake@redhat.com>
|
|
(cherry picked from commit da5e5a1e154836abe83077cf477c104b8f98b1d2)
|
|
https://bugzilla.redhat.com/show_bug.cgi?id=1804593
|
|
|
|
Conflicts: docs/kbase.html.in: real time kvm article not backported
|
|
Message-Id: <e0fea1e270856b642c34827eb3af1c0b01afd510.1592906423.git.pkrempa@redhat.com>
|
|
Reviewed-by: Ján Tomko <jtomko@redhat.com>
|
|
---
|
|
docs/kbase.html.in | 3 +
|
|
docs/kbase/incrementalbackupinternals.rst | 217 ++++++++++++++++++++++
|
|
2 files changed, 220 insertions(+)
|
|
create mode 100644 docs/kbase/incrementalbackupinternals.rst
|
|
|
|
diff --git a/docs/kbase.html.in b/docs/kbase.html.in
|
|
index 7d6caf3cb1..f2975960f6 100644
|
|
--- a/docs/kbase.html.in
|
|
+++ b/docs/kbase.html.in
|
|
@@ -32,6 +32,9 @@
|
|
|
|
<dt><a href="kbase/virtiofs.html">Virtio-FS</a></dt>
|
|
<dd>Share a filesystem between the guest and the host</dd>
|
|
+
|
|
+ <dt><a href="kbase/incrementalbackupinternals.html">Incremental backup internals</a></dt>
|
|
+ <dd>Incremental backup implementation details relevant for users</dd>
|
|
</dl>
|
|
</div>
|
|
|
|
diff --git a/docs/kbase/incrementalbackupinternals.rst b/docs/kbase/incrementalbackupinternals.rst
|
|
new file mode 100644
|
|
index 0000000000..0c4b4f7486
|
|
--- /dev/null
|
|
+++ b/docs/kbase/incrementalbackupinternals.rst
|
|
@@ -0,0 +1,217 @@
|
|
+================================================
|
|
+Internals of incremental backup handling in qemu
|
|
+================================================
|
|
+
|
|
+.. contents::
|
|
+
|
|
+Libvirt's implementation of incremental backups in the ``qemu`` driver uses
|
|
+qemu's ``block-dirty-bitmaps`` under the hood to track the guest visible disk
|
|
+state changes corresponding to the points in time described by a libvirt
|
|
+checkpoint.
|
|
+
|
|
+There are some semantica implications with how libvirt creates and manages the
|
|
+bitmaps which de-facto become API as they are written into the disk images, and
|
|
+this document will try to summarize them.
|
|
+
|
|
+Glossary
|
|
+========
|
|
+
|
|
+See the knowledge base article on
|
|
+`domain state capture <https://libvirt.org/kbase/domainstatecapture.html>`_ for
|
|
+a deeper explanation of some of the concepts.
|
|
+
|
|
+Checkpoint
|
|
+
|
|
+ A libvirt object which represents a named point in time of the life of the
|
|
+ vm where libvirt tracks writes the VM has done, thereby allowing a backup of
|
|
+ only the blocks which changed. Note that state of the VM memory is _not_
|
|
+ captured.
|
|
+
|
|
+ A checkpoint can be created either explicitly via the corresponding API
|
|
+ (although this isn't very useful on its own), or simultaneously with an
|
|
+ incremental or full backup of the VM using the ``virDomainBackupBegin`` API
|
|
+ which allows a next backup to only copy the differences.
|
|
+
|
|
+Backup
|
|
+
|
|
+ A copy of either all blocks of selected disks (full backup) or blocks changed
|
|
+ since a checkpoint (incremental backup) at the time the backup job was
|
|
+ started. (Blocks modified while the backup job is running are not part of the
|
|
+ backup!)
|
|
+
|
|
+Snapshot
|
|
+
|
|
+ Similarly to a checkpoint it's a point in time in the lifecycle of the VM
|
|
+ but the state of the VM including memory is captured at that point allowing
|
|
+ returning to the state later.
|
|
+
|
|
+Blockjob
|
|
+
|
|
+ A long running job which modifies the shape and/or location of the disk
|
|
+ backing chain (images storing the disk contents). Libvirt supports
|
|
+ ``block pull`` where data is moved up the chain towards the active layer,
|
|
+ ``block commit`` where data is moved down the chain towards the base/oldest
|
|
+ image. These blockjobs always remove images from the backing chain. Lastly
|
|
+ ``block copy`` where image is moved to a different location (and possibly
|
|
+ collapsed moving all of the data into the new location into the one image).
|
|
+
|
|
+block-dirty-bitmap (bitmap)
|
|
+
|
|
+ A data structure in qemu tracking which blocks were written by the guest
|
|
+ OS since the bitmap was created.
|
|
+
|
|
+Relationships of bitmaps, checkpoints and VM disks
|
|
+==================================================
|
|
+
|
|
+When a checkpoint is created libvirt creates a block-dirty-bitmap for every
|
|
+configured VM disk named the same way as the chcheckpoint. The bitmap is
|
|
+actively recording which blocks were changed by the guest OS from that point on.
|
|
+Other bitmaps are not impacted by any way as they are self-contained:
|
|
+
|
|
+::
|
|
+
|
|
+ +----------------+ +----------------+
|
|
+ | disk: vda | | disk: vdb |
|
|
+ +--------+-------+ +--------+-------+
|
|
+ | |
|
|
+ +--------v-------+ +--------v-------+
|
|
+ | vda-1.qcow2 | | vdb-1.qcow2 |
|
|
+ | | | |
|
|
+ | bitmaps: chk-a | | bitmaps: chk-a |
|
|
+ | chk-b | | chk-b |
|
|
+ | | | |
|
|
+ +----------------+ +----------------+
|
|
+
|
|
+Bitmaps are created at the same time to track changes to all disks in sync and
|
|
+are active and persisted in the QCOW2 image. Other formats currently don't
|
|
+support this feature.
|
|
+
|
|
+Modification of bitmaps outside of libvirt is not recommended, but when adhering
|
|
+to the same semantics which the document will describe it should be safe to do
|
|
+so, even if we obviously can't guarantee that.
|
|
+
|
|
+
|
|
+Integration with external snapshots
|
|
+===================================
|
|
+
|
|
+Handling of bitmaps
|
|
+-------------------
|
|
+
|
|
+Creating an external snapshot involves adding a new layer to the backing chain
|
|
+on top of the previous chain. In this step there are no new bitmaps created by
|
|
+default, which would mean that backups become impossible after this step.
|
|
+
|
|
+To prevent this from happening we need to re-create the active bitmaps in the
|
|
+new top/active layer of the backing chain which allows us to continue tracking
|
|
+the changes with same granularity as before and also allows libvirt to stitch
|
|
+together all the corresponding bitmaps to do a backup across snapshots.
|
|
+
|
|
+After taking a snapshot of the ``vda`` disk from the example above placed into
|
|
+``vda-2.qcow2`` the following topology will be created:
|
|
+
|
|
+::
|
|
+
|
|
+ +----------------+
|
|
+ | disk: vda |
|
|
+ +-------+--------+
|
|
+ |
|
|
+ +-------v--------+ +----------------+
|
|
+ | vda-2.qcow2 | | vda-1.qcow2 |
|
|
+ | | | |
|
|
+ | bitmaps: chk-a +----> bitmaps: chk-a |
|
|
+ | chk-b | | chk-b |
|
|
+ | | | |
|
|
+ +----------------+ +----------------+
|
|
+
|
|
+Checking bitmap health
|
|
+----------------------
|
|
+
|
|
+QEMU optimizes disk writes by only updating the bitmaps in certain cases. This
|
|
+also can cause problems in cases when e.g. QEMU crashes.
|
|
+
|
|
+For a chain of corresponding bitmaps in a backing chain to be considered valid
|
|
+and eligible for use with ``virDomainBackupBegin`` it must conform to the
|
|
+following rules:
|
|
+
|
|
+1) Top image must contain the bitmap
|
|
+2) If any of the backing images in the chain contain the bitmap too, all
|
|
+ contiguous images must have the bitmap (no gaps)
|
|
+3) all of the above bitmaps must be marked as active
|
|
+ (``auto`` flag in ``qemu-img`` output, ``recording`` in qemu)
|
|
+4) none of the above bitmaps can be inconsistent
|
|
+ (``in-use`` flag in ``qemu-img`` provided that it's not used on image which
|
|
+ is currently in use by a qemu instance, or ``inconsistent`` in qemu)
|
|
+
|
|
+::
|
|
+
|
|
+ # check that image has bitmaps
|
|
+ $ qemu-img info vda-1.qcow2
|
|
+ image: vda-1.qcow2
|
|
+ file format: qcow2
|
|
+ virtual size: 100 MiB (104857600 bytes)
|
|
+ disk size: 220 KiB
|
|
+ cluster_size: 65536
|
|
+ Format specific information:
|
|
+ compat: 1.1
|
|
+ compression type: zlib
|
|
+ lazy refcounts: false
|
|
+ bitmaps:
|
|
+ [0]:
|
|
+ flags:
|
|
+ [0]: in-use
|
|
+ [1]: auto
|
|
+ name: chk-a
|
|
+ granularity: 65536
|
|
+ [1]:
|
|
+ flags:
|
|
+ [0]: auto
|
|
+ name: chk-b
|
|
+ granularity: 65536
|
|
+ refcount bits: 16
|
|
+ corrupt: false
|
|
+
|
|
+(See also the ``qemuBlockBitmapChainIsValid`` helper method in
|
|
+``src/qemu/qemu_block.c``)
|
|
+
|
|
+Creating external snapshots manually
|
|
+--------------------------------------
|
|
+
|
|
+To create the same topology outside of libvirt (e.g when doing snapshots offline)
|
|
+a new ``qemu-img`` which supports the ``bitmap`` subcommand is recommended. The
|
|
+following algorithm then ensures that the new image after snapshot will work
|
|
+with backups (note that ``jq`` is a JSON processor):
|
|
+
|
|
+::
|
|
+
|
|
+ #!/bin/bash
|
|
+
|
|
+ # arguments
|
|
+ SNAP_IMG="vda-2.qcow2"
|
|
+ BACKING_IMG="vda-1.qcow2"
|
|
+
|
|
+ # constants - snapshots and bitmaps work only with qcow2
|
|
+ SNAP_FMT="qcow2"
|
|
+ BACKING_IMG_FMT="qcow2"
|
|
+
|
|
+ # create snapshot overlay
|
|
+ qemu-img create -f "$SNAP_FMT" -F "$BACKING_IMG_FMT" -b "$BACKING_IMG" "$SNAP_IMG"
|
|
+
|
|
+ BACKING_IMG_INFO=$(qemu-img info --output=json -f "$BACKING_IMG_FMT" "$BACKING_IMG")
|
|
+ BACKING_BITMAPS=$(jq '."format-specific".data.bitmaps' <<< "$BACKING_IMG_INFO")
|
|
+
|
|
+ if [ "x$BACKING_BITMAPS" = "xnull" ]; then
|
|
+ exit 0
|
|
+ fi
|
|
+
|
|
+ for BACKING_BITMAP_ in $(jq -c '.[]' <<< "$BACKING_BITMAPS"); do
|
|
+ BITMAP_FLAGS=$(jq -c -r '.flags[]' <<< "$BACKING_BITMAP_")
|
|
+ BITMAP_NAME=$(jq -r '.name' <<< "$BACKING_BITMAP_")
|
|
+
|
|
+ if grep 'in-use' <<< "$BITMAP_FLAGS" ||
|
|
+ grep -v 'auto' <<< "$BITMAP_FLAGS"; then
|
|
+ continue
|
|
+ fi
|
|
+
|
|
+ qemu-img bitmap -f "$SNAP_FMT" "$SNAP_IMG" --add "$BITMAP_NAME"
|
|
+
|
|
+ done
|
|
--
|
|
2.27.0
|
|
|