More efficient cache and cow filters.

Add nbdkit-cow-filter cow-on-read option.
Add nbdkit-cache-filter cache-on-read=/PATH.
Add nbdkit-cache-filter cache-min-block-size option.
Add nbdkit-delay-filter delay-open and delay-close options.
Reduce verbosity of debugging from virt-v2v.
Miscellaneous bugfixes
resolves: rhbz#1950632
This commit is contained in:
Richard W.M. Jones 2021-07-26 19:44:01 +01:00
parent addc394e81
commit b4d42fa29b
23 changed files with 2589 additions and 174 deletions

View File

@ -1,7 +1,7 @@
From 5a23c7cf3c5eccac6e6de775722bc1136a66be83 Mon Sep 17 00:00:00 2001 From 5a23c7cf3c5eccac6e6de775722bc1136a66be83 Mon Sep 17 00:00:00 2001
From: "Richard W.M. Jones" <rjones@redhat.com> From: "Richard W.M. Jones" <rjones@redhat.com>
Date: Mon, 5 Jul 2021 17:54:45 +0100 Date: Mon, 5 Jul 2021 17:54:45 +0100
Subject: [PATCH 1/7] ocaml: Call caml_shutdown when unloading the plugin Subject: [PATCH] ocaml: Call caml_shutdown when unloading the plugin
This has several useful effects (taken from the OCaml documentation): This has several useful effects (taken from the OCaml documentation):
@ -72,5 +72,5 @@ index 00959cb6..9d7d72ad 100644
static void static void
-- --
2.32.0 2.31.1

View File

@ -1,7 +1,7 @@
From 397b7b245aee178b2683de8a34847843f658b43d Mon Sep 17 00:00:00 2001 From 397b7b245aee178b2683de8a34847843f658b43d Mon Sep 17 00:00:00 2001
From: "Richard W.M. Jones" <rjones@redhat.com> From: "Richard W.M. Jones" <rjones@redhat.com>
Date: Mon, 5 Jul 2021 18:00:28 +0100 Date: Mon, 5 Jul 2021 18:00:28 +0100
Subject: [PATCH 2/7] ocaml: Fix valgrinding by only ignoring caml_stat_alloc* Subject: [PATCH] ocaml: Fix valgrinding by only ignoring caml_stat_alloc*
functions functions
These are meant to be "static" so are not freed by design. Other These are meant to be "static" so are not freed by design. Other
@ -35,5 +35,5 @@ index f74b0943..a2b7fc60 100644
+ fun:caml_stat_alloc* + fun:caml_stat_alloc*
} }
-- --
2.32.0 2.31.1

View File

@ -1,8 +1,7 @@
From 4efeffd80a5e85abf5603f20631910b2ef180317 Mon Sep 17 00:00:00 2001 From 4efeffd80a5e85abf5603f20631910b2ef180317 Mon Sep 17 00:00:00 2001
From: "Richard W.M. Jones" <rjones@redhat.com> From: "Richard W.M. Jones" <rjones@redhat.com>
Date: Tue, 13 Jul 2021 16:50:16 +0100 Date: Tue, 13 Jul 2021 16:50:16 +0100
Subject: [PATCH 3/7] ocaml: tests: Actually call .get_ready method in test Subject: [PATCH] ocaml: tests: Actually call .get_ready method in test plugin
plugin
It was added in a previous commit, but never called. It was added in a previous commit, but never called.
@ -25,5 +24,5 @@ index 2bbaa218..fee8528e 100644
config = Some config; config = Some config;
config_complete = Some config_complete; config_complete = Some config_complete;
-- --
2.32.0 2.31.1

View File

@ -1,143 +1,18 @@
From 1610c0865534819eccefec55fd2d751843bb6d64 Mon Sep 17 00:00:00 2001 From 1610c0865534819eccefec55fd2d751843bb6d64 Mon Sep 17 00:00:00 2001
From: "Richard W.M. Jones" <rjones@redhat.com> From: "Richard W.M. Jones" <rjones@redhat.com>
Date: Tue, 13 Jul 2021 16:15:45 +0100 Date: Tue, 13 Jul 2021 16:15:45 +0100
Subject: [PATCH 4/7] ocaml: Rearrange the callbacks Subject: [PATCH] ocaml: Rearrange the callbacks
Just refactoring. Just refactoring.
(cherry picked from commit e54e16e81c51dcbb16d70d83c5b0403babdf5f99) (cherry picked from commit e54e16e81c51dcbb16d70d83c5b0403babdf5f99)
--- ---
plugins/ocaml/NBDKit.ml | 68 ++++++++++++++++++--------------------
plugins/ocaml/NBDKit.mli | 31 +++++++++-------- plugins/ocaml/NBDKit.mli | 31 +++++++++--------
plugins/ocaml/callbacks.h | 34 +++++++++---------- plugins/ocaml/callbacks.h | 34 +++++++++----------
plugins/ocaml/NBDKit.ml | 68 ++++++++++++++++++--------------------
tests/test_ocaml_plugin.ml | 9 ++--- tests/test_ocaml_plugin.ml | 9 ++---
4 files changed, 72 insertions(+), 70 deletions(-) 4 files changed, 72 insertions(+), 70 deletions(-)
diff --git a/plugins/ocaml/NBDKit.mli b/plugins/ocaml/NBDKit.mli
index ac4b0cbc..cda09f44 100644
--- a/plugins/ocaml/NBDKit.mli
+++ b/plugins/ocaml/NBDKit.mli
@@ -69,33 +69,31 @@ type thread_model =
The ['a] parameter is the handle type returned by your
[open_connection] method and passed back to all connected calls. *)
type 'a plugin = {
- name : string; (* required *)
+ (* Plugin description. *)
+ name : string; (** required field *)
longname : string;
version : string;
description : string;
+ (* Plugin lifecycle. *)
load : (unit -> unit) option;
+ get_ready : (unit -> unit) option;
+ after_fork : (unit -> unit) option;
unload : (unit -> unit) option;
- dump_plugin : (unit -> unit) option;
-
+ (* Plugin configuration. *)
config : (string -> string -> unit) option;
config_complete : (unit -> unit) option;
config_help : string;
thread_model : (unit -> thread_model) option;
- get_ready : (unit -> unit) option;
- after_fork : (unit -> unit) option;
-
+ (* Connection lifecycle. *)
preconnect : (bool -> unit) option;
- list_exports : (bool -> bool -> export list) option;
- default_export : (bool -> bool -> string) option;
- open_connection : (bool -> 'a) option; (* required *)
+ open_connection : (bool -> 'a) option; (** required field *)
close : ('a -> unit) option;
- get_size : ('a -> int64) option; (* required *)
- export_description : ('a -> string) option;
-
+ (* NBD negotiation. *)
+ get_size : ('a -> int64) option; (** required field *)
can_cache : ('a -> cache_flag) option;
can_extents : ('a -> bool) option;
can_fast_zero : ('a -> bool) option;
@@ -107,13 +105,20 @@ type 'a plugin = {
can_zero : ('a -> bool) option;
is_rotational : ('a -> bool) option;
- pread : ('a -> int32 -> int64 -> flags -> string) option; (* required *)
+ (* Serving data. *)
+ pread : ('a -> int32 -> int64 -> flags -> string) option; (* required field *)
pwrite : ('a -> string -> int64 -> flags -> unit) option;
flush : ('a -> flags -> unit) option;
trim : ('a -> int32 -> int64 -> flags -> unit) option;
zero : ('a -> int32 -> int64 -> flags -> unit) option;
extents : ('a -> int32 -> int64 -> flags -> extent list) option;
cache : ('a -> int32 -> int64 -> flags -> unit) option;
+
+ (* Miscellaneous. *)
+ dump_plugin : (unit -> unit) option;
+ list_exports : (bool -> bool -> export list) option;
+ default_export : (bool -> bool -> string) option;
+ export_description : ('a -> string) option;
}
(** The plugin with all fields set to [None], so you can write
diff --git a/plugins/ocaml/callbacks.h b/plugins/ocaml/callbacks.h
index 7171ef21..4d29fb73 100644
--- a/plugins/ocaml/callbacks.h
+++ b/plugins/ocaml/callbacks.h
@@ -33,21 +33,8 @@
/* This is not a header file. It is included at various places in
* plugin.c as a convenient way to define per-callback things.
*/
-CB(load)
-CB(unload)
-CB(dump_plugin)
-CB(config)
-CB(config_complete)
-CB(thread_model)
-CB(get_ready)
CB(after_fork)
-CB(preconnect)
-CB(list_exports)
-CB(default_export)
-CB(open)
-CB(close)
-CB(get_size)
-CB(export_description)
+CB(cache)
CB(can_cache)
CB(can_extents)
CB(can_fast_zero)
@@ -57,11 +44,24 @@ CB(can_multi_conn)
CB(can_trim)
CB(can_write)
CB(can_zero)
+CB(close)
+CB(config)
+CB(config_complete)
+CB(default_export)
+CB(dump_plugin)
+CB(export_description)
+CB(extents)
+CB(flush)
+CB(get_ready)
+CB(get_size)
CB(is_rotational)
+CB(list_exports)
+CB(load)
+CB(open)
CB(pread)
+CB(preconnect)
CB(pwrite)
-CB(flush)
+CB(thread_model)
CB(trim)
+CB(unload)
CB(zero)
-CB(extents)
-CB(cache)
diff --git a/plugins/ocaml/NBDKit.ml b/plugins/ocaml/NBDKit.ml diff --git a/plugins/ocaml/NBDKit.ml b/plugins/ocaml/NBDKit.ml
index cdc3bc58..529618d2 100644 index cdc3bc58..529618d2 100644
--- a/plugins/ocaml/NBDKit.ml --- a/plugins/ocaml/NBDKit.ml
@ -281,6 +156,131 @@ index cdc3bc58..529618d2 100644
(* Bindings to nbdkit server functions. *) (* Bindings to nbdkit server functions. *)
diff --git a/plugins/ocaml/NBDKit.mli b/plugins/ocaml/NBDKit.mli
index ac4b0cbc..cda09f44 100644
--- a/plugins/ocaml/NBDKit.mli
+++ b/plugins/ocaml/NBDKit.mli
@@ -69,33 +69,31 @@ type thread_model =
The ['a] parameter is the handle type returned by your
[open_connection] method and passed back to all connected calls. *)
type 'a plugin = {
- name : string; (* required *)
+ (* Plugin description. *)
+ name : string; (** required field *)
longname : string;
version : string;
description : string;
+ (* Plugin lifecycle. *)
load : (unit -> unit) option;
+ get_ready : (unit -> unit) option;
+ after_fork : (unit -> unit) option;
unload : (unit -> unit) option;
- dump_plugin : (unit -> unit) option;
-
+ (* Plugin configuration. *)
config : (string -> string -> unit) option;
config_complete : (unit -> unit) option;
config_help : string;
thread_model : (unit -> thread_model) option;
- get_ready : (unit -> unit) option;
- after_fork : (unit -> unit) option;
-
+ (* Connection lifecycle. *)
preconnect : (bool -> unit) option;
- list_exports : (bool -> bool -> export list) option;
- default_export : (bool -> bool -> string) option;
- open_connection : (bool -> 'a) option; (* required *)
+ open_connection : (bool -> 'a) option; (** required field *)
close : ('a -> unit) option;
- get_size : ('a -> int64) option; (* required *)
- export_description : ('a -> string) option;
-
+ (* NBD negotiation. *)
+ get_size : ('a -> int64) option; (** required field *)
can_cache : ('a -> cache_flag) option;
can_extents : ('a -> bool) option;
can_fast_zero : ('a -> bool) option;
@@ -107,13 +105,20 @@ type 'a plugin = {
can_zero : ('a -> bool) option;
is_rotational : ('a -> bool) option;
- pread : ('a -> int32 -> int64 -> flags -> string) option; (* required *)
+ (* Serving data. *)
+ pread : ('a -> int32 -> int64 -> flags -> string) option; (* required field *)
pwrite : ('a -> string -> int64 -> flags -> unit) option;
flush : ('a -> flags -> unit) option;
trim : ('a -> int32 -> int64 -> flags -> unit) option;
zero : ('a -> int32 -> int64 -> flags -> unit) option;
extents : ('a -> int32 -> int64 -> flags -> extent list) option;
cache : ('a -> int32 -> int64 -> flags -> unit) option;
+
+ (* Miscellaneous. *)
+ dump_plugin : (unit -> unit) option;
+ list_exports : (bool -> bool -> export list) option;
+ default_export : (bool -> bool -> string) option;
+ export_description : ('a -> string) option;
}
(** The plugin with all fields set to [None], so you can write
diff --git a/plugins/ocaml/callbacks.h b/plugins/ocaml/callbacks.h
index 7171ef21..4d29fb73 100644
--- a/plugins/ocaml/callbacks.h
+++ b/plugins/ocaml/callbacks.h
@@ -33,21 +33,8 @@
/* This is not a header file. It is included at various places in
* plugin.c as a convenient way to define per-callback things.
*/
-CB(load)
-CB(unload)
-CB(dump_plugin)
-CB(config)
-CB(config_complete)
-CB(thread_model)
-CB(get_ready)
CB(after_fork)
-CB(preconnect)
-CB(list_exports)
-CB(default_export)
-CB(open)
-CB(close)
-CB(get_size)
-CB(export_description)
+CB(cache)
CB(can_cache)
CB(can_extents)
CB(can_fast_zero)
@@ -57,11 +44,24 @@ CB(can_multi_conn)
CB(can_trim)
CB(can_write)
CB(can_zero)
+CB(close)
+CB(config)
+CB(config_complete)
+CB(default_export)
+CB(dump_plugin)
+CB(export_description)
+CB(extents)
+CB(flush)
+CB(get_ready)
+CB(get_size)
CB(is_rotational)
+CB(list_exports)
+CB(load)
+CB(open)
CB(pread)
+CB(preconnect)
CB(pwrite)
-CB(flush)
+CB(thread_model)
CB(trim)
+CB(unload)
CB(zero)
-CB(extents)
-CB(cache)
diff --git a/tests/test_ocaml_plugin.ml b/tests/test_ocaml_plugin.ml diff --git a/tests/test_ocaml_plugin.ml b/tests/test_ocaml_plugin.ml
index fee8528e..00a65a75 100644 index fee8528e..00a65a75 100644
--- a/tests/test_ocaml_plugin.ml --- a/tests/test_ocaml_plugin.ml
@ -311,5 +311,5 @@ index fee8528e..00a65a75 100644
let () = NBDKit.register_plugin plugin let () = NBDKit.register_plugin plugin
-- --
2.32.0 2.31.1

View File

@ -1,7 +1,7 @@
From 229f106d65e2a54aa21afde9182b0e110a83b0df Mon Sep 17 00:00:00 2001 From 229f106d65e2a54aa21afde9182b0e110a83b0df Mon Sep 17 00:00:00 2001
From: "Richard W.M. Jones" <rjones@redhat.com> From: "Richard W.M. Jones" <rjones@redhat.com>
Date: Thu, 15 Jul 2021 20:41:03 +0100 Date: Thu, 15 Jul 2021 20:41:03 +0100
Subject: [PATCH 5/7] ocaml: Fix comment on plugin .pread field Subject: [PATCH] ocaml: Fix comment on plugin .pread field
Incorrectly updated in earlier commit. Incorrectly updated in earlier commit.
@ -25,5 +25,5 @@ index cda09f44..0f7b87e9 100644
flush : ('a -> flags -> unit) option; flush : ('a -> flags -> unit) option;
trim : ('a -> int32 -> int64 -> flags -> unit) option; trim : ('a -> int32 -> int64 -> flags -> unit) option;
-- --
2.32.0 2.31.1

View File

@ -1,7 +1,7 @@
From 8c86f8bbc326ff1578989a03b3c98b06634f62c1 Mon Sep 17 00:00:00 2001 From 8c86f8bbc326ff1578989a03b3c98b06634f62c1 Mon Sep 17 00:00:00 2001
From: "Richard W.M. Jones" <rjones@redhat.com> From: "Richard W.M. Jones" <rjones@redhat.com>
Date: Thu, 22 Jul 2021 16:31:34 +0100 Date: Thu, 22 Jul 2021 16:31:34 +0100
Subject: [PATCH 6/7] docs: Correct --selinux-label example Subject: [PATCH] docs: Correct --selinux-label example
The actual label you should use for the internal socket is The actual label you should use for the internal socket is
system_u:object_r:svirt_socket_t:s0 (not svirt_t). system_u:object_r:svirt_socket_t:s0 (not svirt_t).
@ -36,5 +36,5 @@ index 68399eca..5b679895 100644
=item B<--swap> =item B<--swap>
-- --
2.32.0 2.31.1

View File

@ -1,7 +1,7 @@
From c0c0728f40466cf4a8ab4868002e331df6d85b1e Mon Sep 17 00:00:00 2001 From c0c0728f40466cf4a8ab4868002e331df6d85b1e Mon Sep 17 00:00:00 2001
From: "Richard W.M. Jones" <rjones@redhat.com> From: "Richard W.M. Jones" <rjones@redhat.com>
Date: Sat, 24 Jul 2021 13:30:55 +0100 Date: Sat, 24 Jul 2021 13:30:55 +0100
Subject: [PATCH 7/7] cow: Fix assert failure in cow_extents Subject: [PATCH] cow: Fix assert failure in cow_extents
$ nbdkit sparse-random 4G --filter=cow --run 'nbdinfo --map $uri' $ nbdkit sparse-random 4G --filter=cow --run 'nbdinfo --map $uri'
nbdkit: cow.c:591: cow_extents: Assertion `count > 0' failed. nbdkit: cow.c:591: cow_extents: Assertion `count > 0' failed.
@ -21,32 +21,12 @@ https://gitlab.com/nbdkit/libnbd/-/blob/c55c5d9960809efd27cd044d007a33ea1636f4b0
(cherry picked from commit 4d66ab72b29fc56190c7a6368eff3a6ba94c0f9f) (cherry picked from commit 4d66ab72b29fc56190c7a6368eff3a6ba94c0f9f)
--- ---
tests/Makefile.am | 2 ++
filters/cow/cow.c | 16 +++++++++--- filters/cow/cow.c | 16 +++++++++---
tests/Makefile.am | 2 ++
tests/test-cow-extents-large.sh | 46 +++++++++++++++++++++++++++++++++ tests/test-cow-extents-large.sh | 46 +++++++++++++++++++++++++++++++++
3 files changed, 61 insertions(+), 3 deletions(-) 3 files changed, 61 insertions(+), 3 deletions(-)
create mode 100755 tests/test-cow-extents-large.sh create mode 100755 tests/test-cow-extents-large.sh
diff --git a/tests/Makefile.am b/tests/Makefile.am
index e0b31ba9..9630205d 100644
--- a/tests/Makefile.am
+++ b/tests/Makefile.am
@@ -1402,6 +1402,7 @@ TESTS += \
test-cow.sh \
test-cow-extents1.sh \
test-cow-extents2.sh \
+ test-cow-extents-large.sh \
test-cow-unaligned.sh \
$(NULL)
endif
@@ -1410,6 +1411,7 @@ EXTRA_DIST += \
test-cow.sh \
test-cow-extents1.sh \
test-cow-extents2.sh \
+ test-cow-extents-large.sh \
test-cow-null.sh \
test-cow-unaligned.sh \
$(NULL)
diff --git a/filters/cow/cow.c b/filters/cow/cow.c diff --git a/filters/cow/cow.c b/filters/cow/cow.c
index 83844845..3bd09399 100644 index 83844845..3bd09399 100644
--- a/filters/cow/cow.c --- a/filters/cow/cow.c
@ -91,6 +71,26 @@ index 83844845..3bd09399 100644
blknum++; blknum++;
offset += BLKSIZE; offset += BLKSIZE;
count -= BLKSIZE; count -= BLKSIZE;
diff --git a/tests/Makefile.am b/tests/Makefile.am
index e0b31ba9..9630205d 100644
--- a/tests/Makefile.am
+++ b/tests/Makefile.am
@@ -1402,6 +1402,7 @@ TESTS += \
test-cow.sh \
test-cow-extents1.sh \
test-cow-extents2.sh \
+ test-cow-extents-large.sh \
test-cow-unaligned.sh \
$(NULL)
endif
@@ -1410,6 +1411,7 @@ EXTRA_DIST += \
test-cow.sh \
test-cow-extents1.sh \
test-cow-extents2.sh \
+ test-cow-extents-large.sh \
test-cow-null.sh \
test-cow-unaligned.sh \
$(NULL)
diff --git a/tests/test-cow-extents-large.sh b/tests/test-cow-extents-large.sh diff --git a/tests/test-cow-extents-large.sh b/tests/test-cow-extents-large.sh
new file mode 100755 new file mode 100755
index 00000000..ea981dcb index 00000000..ea981dcb
@ -144,5 +144,5 @@ index 00000000..ea981dcb
+ nbdkit -U - sparse-random $size --filter=cow --run 'nbdinfo --map $uri' + nbdkit -U - sparse-random $size --filter=cow --run 'nbdinfo --map $uri'
+done +done
-- --
2.32.0 2.31.1

View File

@ -0,0 +1,79 @@
From b436ca6c69ef7d8d826be609820027f10134274d Mon Sep 17 00:00:00 2001
From: "Richard W.M. Jones" <rjones@redhat.com>
Date: Tue, 27 Jul 2021 21:28:48 +0100
Subject: [PATCH] cache: Fix misleading LRU diagram and comment
Only comment changes.
(cherry picked from commit 7b33c86e0910d941dc34bdb481d61806f31cdcef)
---
filters/cache/lru.c | 32 ++++++++++++++++++++------------
1 file changed, 20 insertions(+), 12 deletions(-)
diff --git a/filters/cache/lru.c b/filters/cache/lru.c
index 1c3c3e10..716b4984 100644
--- a/filters/cache/lru.c
+++ b/filters/cache/lru.c
@@ -53,12 +53,14 @@
/* LRU bitmaps. These bitmaps implement a simple, fast LRU structure.
*
- * bm[0] bm[1] blocks not in either bitmap
- * ┌─────────┬──────────────────┬─────────────────────────────┐
- * │ │ │ │
- * └─────────┴──────────────────┴─────────────────────────────┘
- * ↑ c1 bits set
- * c0 bits set
+ * bm[0]
+ * ┌───────────────────────┐
+ * │ X XX X XXX │ c0 bits set
+ * └───────────────────────┘
+ * bm[1]
+ * ┌───────────────────────┐
+ * │ X XX X X │ c1 bits set
+ * └───────────────────────┘
*
* The LRU structure keeps track of the [approx] last N distinct
* blocks which have been most recently accessed. It can answer in
@@ -69,8 +71,7 @@
*
* When a new block is accessed, we set the corresponding bit in bm[0]
* and increment c0 (c0 counts the number of bits set in bm[0]). If
- * c0 == N/2 then we swap the two bitmaps, clear bm[0], and reset c0
- * to 0.
+ * c0 == N/2 then we move bm[1] <- bm[0], clear bm[0] and set c0 <- 0.
*
* To check if a block has been accessed within the previous N
* distinct accesses, we simply have to check both bitmaps. If it is
@@ -78,9 +79,11 @@
* reclaimed.
*
* You'll note that in fact we only keep track of between N/2 and N
- * recently accessed blocks. We could make the estimate more accurate
- * by having more bitmaps, but as this is only a heuristic we choose
- * to keep the implementation simple and memory usage low instead.
+ * recently accessed blocks because the same block can appear in both
+ * bitmaps. bm[1] is a last chance to hold on to blocks which are
+ * soon to be reclaimed. We could make the estimate more accurate by
+ * having more bitmaps, but as this is only a heuristic we choose to
+ * keep the implementation simple and memory usage low instead.
*/
static struct bitmap bm[2];
static unsigned c0 = 0, c1 = 0;
@@ -129,7 +132,12 @@ lru_set_recently_accessed (uint64_t blknum)
bitmap_set_blk (&bm[0], blknum, true);
c0++;
- /* If we've reached N/2 then we need to swap over the bitmaps. */
+ /* If we've reached N/2 then we need to swap over the bitmaps. Note
+ * the purpose of swapping here is to ensure that we do not have to
+ * copy the dynamically allocated bm->bitmap field (the pointers are
+ * swapped instead). The bm[0].bitmap field is immediately zeroed
+ * after the swap.
+ */
if (c0 >= N/2) {
struct bitmap tmp;
--
2.31.1

View File

@ -0,0 +1,104 @@
From b6e1d14a052caf65dcc7e8fec2bf0d079e1f8a38 Mon Sep 17 00:00:00 2001
From: "Richard W.M. Jones" <rjones@redhat.com>
Date: Tue, 27 Jul 2021 22:42:52 +0100
Subject: [PATCH] docs: Improve documentation of .can_cache and .cache methods
(cherry picked from commit 0a6be5ae01a6079767e1fabd70cca73fc8520b1d)
---
docs/nbdkit-plugin.pod | 71 +++++++++++++++++++++++++-----------------
1 file changed, 43 insertions(+), 28 deletions(-)
diff --git a/docs/nbdkit-plugin.pod b/docs/nbdkit-plugin.pod
index 7a1fae8c..5e085e12 100644
--- a/docs/nbdkit-plugin.pod
+++ b/docs/nbdkit-plugin.pod
@@ -1047,19 +1047,29 @@ This callback is not required. If omitted, then we return false.
int can_cache (void *handle);
This is called during the option negotiation phase to find out if the
-plugin supports a cache operation. The nature of the caching is
-unspecified (including whether there are limits on how much can be
-cached at once, and whether writes to a cached region have
-write-through or write-back semantics), but the command exists to let
-clients issue a hint to the server that they will be accessing that
-region of the export.
-
-If this returns C<NBDKIT_CACHE_NONE>, cache support is not advertised
-to the client; if this returns C<NBDKIT_CACHE_EMULATE>, caching is
-emulated by the server calling C<.pread> and ignoring the results; if
-this returns C<NBDKIT_CACHE_NATIVE>, then the C<.cache> callback will
-be used. If there is an error, C<.can_cache> should call
-C<nbdkit_error> with an error message and return C<-1>.
+plugin supports a cache or prefetch operation.
+
+This can return:
+
+=over 4
+
+=item C<NBDKIT_CACHE_NONE>
+
+Cache support is not advertised to the client.
+
+=item C<NBDKIT_CACHE_EMULATE>
+
+Caching is emulated by the server calling C<.pread> and discarding the
+result.
+
+=item C<NBDKIT_CACHE_NATIVE>
+
+The C<.cache> callback will be called.
+
+=back
+
+If there is an error, C<.can_cache> should call C<nbdkit_error> with
+an error message and return C<-1>.
This callback is not required. If omitted, then we return
C<NBDKIT_CACHE_NONE> if the C<.cache> callback is missing, or
@@ -1284,23 +1294,28 @@ called. C<errno> will be set to a suitable value.
During the data serving phase, this callback is used to give the
plugin a hint that the client intends to make further accesses to the
-given region of the export. The nature of caching is not specified
-further by the NBD specification (for example, a server may place
-limits on how much may be cached at once, and there is no way to
-control if writes to a cached area have write-through or write-back
-semantics). In fact, the cache command can always fail and still be
-compliant, and success might not guarantee a performance gain. If
-this callback is omitted, then the results of C<.can_cache> determine
-whether nbdkit will reject cache requests, treat them as instant
-success, or emulate caching by calling C<.pread> over the same region
-and ignoring the results.
+given region of the export.
+
+The nature of caching/prefetching is not specified further by the NBD
+specification. For example, a server may place limits on how much may
+be cached at once, and there is no way to control if writes to a
+cached area have write-through or write-back semantics. In fact, the
+cache command can always fail and still be compliant, and success
+might not guarantee a performance gain.
+
+If this callback is omitted, then the results of C<.can_cache>
+determine whether nbdkit will reject cache requests, treat them as
+instant success, or emulate caching by calling C<.pread> over the same
+region and ignoring the results.
This function will not be called if C<.can_cache> did not return
-C<NBDKIT_CACHE_NATIVE>. The parameter C<flags> exists in case of
-future NBD protocol extensions; at this time, it will be 0 on input. A
-plugin must fail this function if C<flags> includes an unrecognized
-flag, as that may indicate a requirement that the plugin comply must
-with a specific caching semantic.
+C<NBDKIT_CACHE_NATIVE>.
+
+The C<flags> parameter exists in case of future NBD protocol
+extensions; at this time, it will be 0 on input. A plugin must fail
+this function if C<flags> includes an unrecognized flag, as that may
+indicate a requirement that the plugin comply must with a specific
+caching semantic.
If there is an error, C<.cache> should call C<nbdkit_error> with an
error message, and C<nbdkit_set_error> to record an appropriate error
--
2.31.1

View File

@ -0,0 +1,37 @@
From ef0ee0166b0594b04c73376f84a729c2985ca064 Mon Sep 17 00:00:00 2001
From: "Richard W.M. Jones" <rjones@redhat.com>
Date: Tue, 27 Jul 2021 23:10:24 +0100
Subject: [PATCH] cow: Improve documentation of cow-on-cache option
(cherry picked from commit 9731e80d58c3aed2514d249e7925c2053d6eb0e8)
---
filters/cow/nbdkit-cow-filter.pod | 13 +++++++------
1 file changed, 7 insertions(+), 6 deletions(-)
diff --git a/filters/cow/nbdkit-cow-filter.pod b/filters/cow/nbdkit-cow-filter.pod
index 64df3fbd..2a693ebe 100644
--- a/filters/cow/nbdkit-cow-filter.pod
+++ b/filters/cow/nbdkit-cow-filter.pod
@@ -54,12 +54,13 @@ serve the same data to each client.
=item B<cow-on-cache=true>
-Treat a client cache request as a shortcut for copying unmodified data
-from the plugin to the overlay, rather than the default of passing
-cache requests on to the plugin. This parameter defaults to false
-(which leaves the overlay as small as possible), but setting it can be
-useful for converting cache commands into a form of copy-on-read
-behavior, in addition to the filter's normal copy-on-write semantics.
+When the client issues a cache (prefetch) request, preemptively save
+the data from the plugin into the overlay.
+
+=item B<cow-on-cache=false>
+
+Do not save data from cache (prefetch) requests in the overlay. This
+leaves the overlay as small as possible. This is the default.
=back
--
2.31.1

View File

@ -0,0 +1,40 @@
From a09b06f2c104c01d7b0ff5e657c0c64bf1c4cc41 Mon Sep 17 00:00:00 2001
From: "Richard W.M. Jones" <rjones@redhat.com>
Date: Thu, 29 Jul 2021 20:14:24 +0100
Subject: [PATCH] tests: cache: Simplify test-cache-on-read.sh
We can use the memory plugin instead of a backing file.
(cherry picked from commit 5527b28e323b7c9c35af8e1bb6b05e6468e68950)
---
tests/test-cache-on-read.sh | 8 ++------
1 file changed, 2 insertions(+), 6 deletions(-)
diff --git a/tests/test-cache-on-read.sh b/tests/test-cache-on-read.sh
index 3b3c7657..f8584dcd 100755
--- a/tests/test-cache-on-read.sh
+++ b/tests/test-cache-on-read.sh
@@ -38,18 +38,14 @@ requires_filter cache
requires_nbdsh_uri
sock=$(mktemp -u /tmp/nbdkit-test-sock.XXXXXX)
-files="cache-on-read.img $sock cache-on-read.pid"
+files="$sock cache-on-read.pid"
rm -f $files
cleanup_fn rm -f $files
-# Create an empty base image.
-truncate -s 128K cache-on-read.img
-
# Run nbdkit with the caching filter and cache-on-read set.
start_nbdkit -P cache-on-read.pid -U $sock \
--filter=cache \
- file cache-on-read.img \
- cache-on-read=true
+ memory 128K cache-on-read=true
nbdsh --connect "nbd+unix://?socket=$sock" \
-c '
--
2.31.1

View File

@ -0,0 +1,124 @@
From 37844f524c01b54b28755b77b68b7c1ec2b79512 Mon Sep 17 00:00:00 2001
From: "Richard W.M. Jones" <rjones@redhat.com>
Date: Mon, 26 Jul 2021 11:59:43 +0100
Subject: [PATCH] cache: Reduce verbosity of debugging
The cache filter is very verbose in its debugging. Reduce the default
level. Use -D cache.verbose=1 to restore original debugging.
Compare commit 745a0f13662031c2b9c9b69f62b4ae3a6b2f38f0.
(cherry picked from commit 6be735edf7d5fb3fb8350c72e6d9525badbab14d)
---
filters/cache/blk.c | 53 +++++++++++++++++++++++++++------------------
1 file changed, 32 insertions(+), 21 deletions(-)
diff --git a/filters/cache/blk.c b/filters/cache/blk.c
index 12e8407e..f52f30e3 100644
--- a/filters/cache/blk.c
+++ b/filters/cache/blk.c
@@ -93,6 +93,9 @@ enum bm_entry {
BLOCK_DIRTY = 3,
};
+/* Extra debugging (-D cache.verbose=1). */
+NBDKIT_DLL_PUBLIC int cache_debug_verbose = 0;
+
int
blk_init (void)
{
@@ -199,12 +202,14 @@ blk_read (nbdkit_next *next,
reclaim (fd, &bm);
- nbdkit_debug ("cache: blk_read block %" PRIu64 " (offset %" PRIu64 ") is %s",
- blknum, (uint64_t) offset,
- state == BLOCK_NOT_CACHED ? "not cached" :
- state == BLOCK_CLEAN ? "clean" :
- state == BLOCK_DIRTY ? "dirty" :
- "unknown");
+ if (cache_debug_verbose)
+ nbdkit_debug ("cache: blk_read block %" PRIu64
+ " (offset %" PRIu64 ") is %s",
+ blknum, (uint64_t) offset,
+ state == BLOCK_NOT_CACHED ? "not cached" :
+ state == BLOCK_CLEAN ? "clean" :
+ state == BLOCK_DIRTY ? "dirty" :
+ "unknown");
if (state == BLOCK_NOT_CACHED) { /* Read underlying plugin. */
unsigned n = blksize, tail = 0;
@@ -225,9 +230,10 @@ blk_read (nbdkit_next *next,
/* If cache-on-read, copy the block to the cache. */
if (cache_on_read) {
- nbdkit_debug ("cache: cache-on-read block %" PRIu64
- " (offset %" PRIu64 ")",
- blknum, (uint64_t) offset);
+ if (cache_debug_verbose)
+ nbdkit_debug ("cache: cache-on-read block %" PRIu64
+ " (offset %" PRIu64 ")",
+ blknum, (uint64_t) offset);
if (pwrite (fd, block, blksize, offset) == -1) {
*err = errno;
@@ -259,12 +265,14 @@ blk_cache (nbdkit_next *next,
reclaim (fd, &bm);
- nbdkit_debug ("cache: blk_cache block %" PRIu64 " (offset %" PRIu64 ") is %s",
- blknum, (uint64_t) offset,
- state == BLOCK_NOT_CACHED ? "not cached" :
- state == BLOCK_CLEAN ? "clean" :
- state == BLOCK_DIRTY ? "dirty" :
- "unknown");
+ if (cache_debug_verbose)
+ nbdkit_debug ("cache: blk_cache block %" PRIu64
+ " (offset %" PRIu64 ") is %s",
+ blknum, (uint64_t) offset,
+ state == BLOCK_NOT_CACHED ? "not cached" :
+ state == BLOCK_CLEAN ? "clean" :
+ state == BLOCK_DIRTY ? "dirty" :
+ "unknown");
if (state == BLOCK_NOT_CACHED) {
/* Read underlying plugin, copy to cache regardless of cache-on-read. */
@@ -284,8 +292,9 @@ blk_cache (nbdkit_next *next,
*/
memset (block + n, 0, tail);
- nbdkit_debug ("cache: cache block %" PRIu64 " (offset %" PRIu64 ")",
- blknum, (uint64_t) offset);
+ if (cache_debug_verbose)
+ nbdkit_debug ("cache: cache block %" PRIu64 " (offset %" PRIu64 ")",
+ blknum, (uint64_t) offset);
if (pwrite (fd, block, blksize, offset) == -1) {
*err = errno;
@@ -324,8 +333,9 @@ blk_writethrough (nbdkit_next *next,
reclaim (fd, &bm);
- nbdkit_debug ("cache: writethrough block %" PRIu64 " (offset %" PRIu64 ")",
- blknum, (uint64_t) offset);
+ if (cache_debug_verbose)
+ nbdkit_debug ("cache: writethrough block %" PRIu64 " (offset %" PRIu64 ")",
+ blknum, (uint64_t) offset);
if (pwrite (fd, block, blksize, offset) == -1) {
*err = errno;
@@ -357,8 +367,9 @@ blk_write (nbdkit_next *next,
reclaim (fd, &bm);
- nbdkit_debug ("cache: writeback block %" PRIu64 " (offset %" PRIu64 ")",
- blknum, (uint64_t) offset);
+ if (cache_debug_verbose)
+ nbdkit_debug ("cache: writeback block %" PRIu64 " (offset %" PRIu64 ")",
+ blknum, (uint64_t) offset);
if (pwrite (fd, block, blksize, offset) == -1) {
*err = errno;
--
2.31.1

View File

@ -0,0 +1,400 @@
From a118e05670659b3efd1ab191023cc0bc24cf29e7 Mon Sep 17 00:00:00 2001
From: "Richard W.M. Jones" <rjones@redhat.com>
Date: Mon, 26 Jul 2021 13:55:21 +0100
Subject: [PATCH] cache, cow: Add blk_read_multiple function
Currently the cache and cow filters break up large requests into many
single block-sized requests to the underlying plugin. For some
plugins (eg. curl) this is very inefficient and causes huge
slow-downs.
For example I tested nbdkit + curl vs nbdkit + cache + curl against a
slow, remote VMware server. A simple run of virt-inspector was at
least 6-7 times slower with the cache filter. (It was so slow that I
didn't actually let it run to completion - I am estimating the
slowdown multiple using interim debug messages).
Implement a new blk_read_multiple function in the cache filter. It
does not break up "runs" of blocks which all have the same cache
state. The cache .pread method uses the new function to read the
block-aligned part of the request.
(cherry picked from commit ab661ccef5b3369fa22c33d0289baddc251b73bf)
---
filters/cache/blk.c | 83 ++++++++++++++++++++++++++++++++-----------
filters/cache/blk.h | 6 ++++
filters/cache/cache.c | 21 +++++------
filters/cow/blk.c | 63 +++++++++++++++++++++++---------
filters/cow/blk.h | 6 ++++
filters/cow/cow.c | 21 +++++------
6 files changed, 138 insertions(+), 62 deletions(-)
diff --git a/filters/cache/blk.c b/filters/cache/blk.c
index f52f30e3..f85ada35 100644
--- a/filters/cache/blk.c
+++ b/filters/cache/blk.c
@@ -44,6 +44,7 @@
#include <string.h>
#include <unistd.h>
#include <fcntl.h>
+#include <limits.h>
#include <errno.h>
#ifdef HAVE_SYS_STATVFS_H
@@ -193,26 +194,40 @@ blk_set_size (uint64_t new_size)
return 0;
}
-int
-blk_read (nbdkit_next *next,
- uint64_t blknum, uint8_t *block, int *err)
+static int
+_blk_read_multiple (nbdkit_next *next,
+ uint64_t blknum, uint64_t nrblocks,
+ uint8_t *block, int *err)
{
off_t offset = blknum * blksize;
- enum bm_entry state = bitmap_get_blk (&bm, blknum, BLOCK_NOT_CACHED);
+ bool not_cached =
+ bitmap_get_blk (&bm, blknum, BLOCK_NOT_CACHED) == BLOCK_NOT_CACHED;
+ uint64_t b, runblocks;
- reclaim (fd, &bm);
+ assert (nrblocks > 0);
if (cache_debug_verbose)
- nbdkit_debug ("cache: blk_read block %" PRIu64
+ nbdkit_debug ("cache: blk_read_multiple block %" PRIu64
" (offset %" PRIu64 ") is %s",
blknum, (uint64_t) offset,
- state == BLOCK_NOT_CACHED ? "not cached" :
- state == BLOCK_CLEAN ? "clean" :
- state == BLOCK_DIRTY ? "dirty" :
- "unknown");
+ not_cached ? "not cached" : "cached");
- if (state == BLOCK_NOT_CACHED) { /* Read underlying plugin. */
- unsigned n = blksize, tail = 0;
+ /* Find out how many of the following blocks form a "run" with the
+ * same cached/not-cached state. We can process that many blocks in
+ * one go.
+ */
+ for (b = 1, runblocks = 1; b < nrblocks; ++b, ++runblocks) {
+ bool s =
+ bitmap_get_blk (&bm, blknum + b, BLOCK_NOT_CACHED) == BLOCK_NOT_CACHED;
+ if (not_cached != s)
+ break;
+ }
+
+ if (not_cached) { /* Read underlying plugin. */
+ unsigned n, tail = 0;
+
+ assert (blksize * runblocks <= UINT_MAX);
+ n = blksize * runblocks;
if (offset + n > size) {
tail = offset + n - size;
@@ -228,32 +243,60 @@ blk_read (nbdkit_next *next,
*/
memset (block + n, 0, tail);
- /* If cache-on-read, copy the block to the cache. */
+ /* If cache-on-read, copy the blocks to the cache. */
if (cache_on_read) {
if (cache_debug_verbose)
nbdkit_debug ("cache: cache-on-read block %" PRIu64
" (offset %" PRIu64 ")",
blknum, (uint64_t) offset);
- if (pwrite (fd, block, blksize, offset) == -1) {
+ if (pwrite (fd, block, blksize * runblocks, offset) == -1) {
*err = errno;
nbdkit_error ("pwrite: %m");
return -1;
}
- bitmap_set_blk (&bm, blknum, BLOCK_CLEAN);
- lru_set_recently_accessed (blknum);
+ for (b = 0; b < runblocks; ++b) {
+ bitmap_set_blk (&bm, blknum + b, BLOCK_CLEAN);
+ lru_set_recently_accessed (blknum + b);
+ }
}
- return 0;
}
else { /* Read cache. */
- if (pread (fd, block, blksize, offset) == -1) {
+ if (pread (fd, block, blksize * runblocks, offset) == -1) {
*err = errno;
nbdkit_error ("pread: %m");
return -1;
}
- lru_set_recently_accessed (blknum);
- return 0;
+ for (b = 0; b < runblocks; ++b)
+ lru_set_recently_accessed (blknum + b);
}
+
+ /* If all done, return. */
+ if (runblocks == nrblocks)
+ return 0;
+
+ /* Recurse to read remaining blocks. */
+ return _blk_read_multiple (next,
+ blknum + runblocks,
+ nrblocks - runblocks,
+ block + blksize * runblocks,
+ err);
+}
+
+int
+blk_read_multiple (nbdkit_next *next,
+ uint64_t blknum, uint64_t nrblocks,
+ uint8_t *block, int *err)
+{
+ reclaim (fd, &bm);
+ return _blk_read_multiple (next, blknum, nrblocks, block, err);
+}
+
+int
+blk_read (nbdkit_next *next,
+ uint64_t blknum, uint8_t *block, int *err)
+{
+ return blk_read_multiple (next, blknum, 1, block, err);
}
int
diff --git a/filters/cache/blk.h b/filters/cache/blk.h
index 87c753e2..1ee33ed7 100644
--- a/filters/cache/blk.h
+++ b/filters/cache/blk.h
@@ -55,6 +55,12 @@ extern int blk_read (nbdkit_next *next,
uint64_t blknum, uint8_t *block, int *err)
__attribute__((__nonnull__ (1, 3, 4)));
+/* As above, but read multiple blocks. */
+extern int blk_read_multiple (nbdkit_next *next,
+ uint64_t blknum, uint64_t nrblocks,
+ uint8_t *block, int *err)
+ __attribute__((__nonnull__ (1, 4, 5)));
+
/* If a single block is not cached, copy it from the plugin. */
extern int blk_cache (nbdkit_next *next,
uint64_t blknum, uint8_t *block, int *err)
diff --git a/filters/cache/cache.c b/filters/cache/cache.c
index 499aec68..9c081948 100644
--- a/filters/cache/cache.c
+++ b/filters/cache/cache.c
@@ -313,7 +313,7 @@ cache_pread (nbdkit_next *next,
uint32_t flags, int *err)
{
CLEANUP_FREE uint8_t *block = NULL;
- uint64_t blknum, blkoffs;
+ uint64_t blknum, blkoffs, nrblocks;
int r;
assert (!flags);
@@ -348,22 +348,17 @@ cache_pread (nbdkit_next *next,
}
/* Aligned body */
- /* XXX This breaks up large read requests into smaller ones, which
- * is a problem for plugins which have a large, fixed per-request
- * overhead (hello, curl). We should try to keep large requests
- * together as much as possible, but that requires us to be much
- * smarter here.
- */
- while (count >= blksize) {
+ nrblocks = count / blksize;
+ if (nrblocks > 0) {
ACQUIRE_LOCK_FOR_CURRENT_SCOPE (&lock);
- r = blk_read (next, blknum, buf, err);
+ r = blk_read_multiple (next, blknum, nrblocks, buf, err);
if (r == -1)
return -1;
- buf += blksize;
- count -= blksize;
- offset += blksize;
- blknum++;
+ buf += nrblocks * blksize;
+ count -= nrblocks * blksize;
+ offset += nrblocks * blksize;
+ blknum += nrblocks;
}
/* Unaligned tail */
diff --git a/filters/cow/blk.c b/filters/cow/blk.c
index 0f12d510..9e6c8879 100644
--- a/filters/cow/blk.c
+++ b/filters/cow/blk.c
@@ -79,6 +79,7 @@
#include <inttypes.h>
#include <unistd.h>
#include <fcntl.h>
+#include <limits.h>
#include <errno.h>
#include <sys/types.h>
@@ -223,33 +224,48 @@ blk_status (uint64_t blknum, bool *present, bool *trimmed)
*trimmed = state == BLOCK_TRIMMED;
}
-/* These are the block operations. They always read or write a single
- * whole block of size blksize.
+/* These are the block operations. They always read or write whole
+ * blocks of size blksize.
*/
int
-blk_read (nbdkit_next *next,
- uint64_t blknum, uint8_t *block, int *err)
+blk_read_multiple (nbdkit_next *next,
+ uint64_t blknum, uint64_t nrblocks,
+ uint8_t *block, int *err)
{
off_t offset = blknum * BLKSIZE;
enum bm_entry state;
+ uint64_t b, runblocks;
- /* The state might be modified from another thread - for example
- * another thread might write (BLOCK_NOT_ALLOCATED ->
- * BLOCK_ALLOCATED) while we are reading from the plugin, returning
- * the old data. However a read issued after the write returns
- * should always return the correct data.
+ /* Find out how many of the following blocks form a "run" with the
+ * same state. We can process that many blocks in one go.
+ *
+ * About the locking: The state might be modified from another
+ * thread - for example another thread might write
+ * (BLOCK_NOT_ALLOCATED -> BLOCK_ALLOCATED) while we are reading
+ * from the plugin, returning the old data. However a read issued
+ * after the write returns should always return the correct data.
*/
{
ACQUIRE_LOCK_FOR_CURRENT_SCOPE (&lock);
state = bitmap_get_blk (&bm, blknum, BLOCK_NOT_ALLOCATED);
+
+ for (b = 1, runblocks = 1; b < nrblocks; ++b, ++runblocks) {
+ enum bm_entry s = bitmap_get_blk (&bm, blknum + b, BLOCK_NOT_ALLOCATED);
+ if (state != s)
+ break;
+ }
}
if (cow_debug_verbose)
- nbdkit_debug ("cow: blk_read block %" PRIu64 " (offset %" PRIu64 ") is %s",
+ nbdkit_debug ("cow: blk_read_multiple block %" PRIu64
+ " (offset %" PRIu64 ") is %s",
blknum, (uint64_t) offset, state_to_string (state));
if (state == BLOCK_NOT_ALLOCATED) { /* Read underlying plugin. */
- unsigned n = BLKSIZE, tail = 0;
+ unsigned n, tail = 0;
+
+ assert (BLKSIZE * runblocks <= UINT_MAX);
+ n = BLKSIZE * runblocks;
if (offset + n > size) {
tail = offset + n - size;
@@ -264,20 +280,35 @@ blk_read (nbdkit_next *next,
* zeroing the tail.
*/
memset (block + n, 0, tail);
- return 0;
}
else if (state == BLOCK_ALLOCATED) { /* Read overlay. */
- if (pread (fd, block, BLKSIZE, offset) == -1) {
+ if (pread (fd, block, BLKSIZE * runblocks, offset) == -1) {
*err = errno;
nbdkit_error ("pread: %m");
return -1;
}
- return 0;
}
else /* state == BLOCK_TRIMMED */ {
- memset (block, 0, BLKSIZE);
- return 0;
+ memset (block, 0, BLKSIZE * runblocks);
}
+
+ /* If all done, return. */
+ if (runblocks == nrblocks)
+ return 0;
+
+ /* Recurse to read remaining blocks. */
+ return blk_read_multiple (next,
+ blknum + runblocks,
+ nrblocks - runblocks,
+ block + BLKSIZE * runblocks,
+ err);
+}
+
+int
+blk_read (nbdkit_next *next,
+ uint64_t blknum, uint8_t *block, int *err)
+{
+ return blk_read_multiple (next, blknum, 1, block, err);
}
int
diff --git a/filters/cow/blk.h b/filters/cow/blk.h
index e6fd7417..b066c602 100644
--- a/filters/cow/blk.h
+++ b/filters/cow/blk.h
@@ -55,6 +55,12 @@ extern int blk_read (nbdkit_next *next,
uint64_t blknum, uint8_t *block, int *err)
__attribute__((__nonnull__ (1, 3, 4)));
+/* Read multiple blocks from the overlay or plugin. */
+extern int blk_read_multiple (nbdkit_next *next,
+ uint64_t blknum, uint64_t nrblocks,
+ uint8_t *block, int *err)
+ __attribute__((__nonnull__ (1, 4, 5)));
+
/* Cache mode for blocks not already in overlay */
enum cache_mode {
BLK_CACHE_IGNORE, /* Do nothing */
diff --git a/filters/cow/cow.c b/filters/cow/cow.c
index 3bd09399..f74c0a34 100644
--- a/filters/cow/cow.c
+++ b/filters/cow/cow.c
@@ -210,7 +210,7 @@ cow_pread (nbdkit_next *next,
uint32_t flags, int *err)
{
CLEANUP_FREE uint8_t *block = NULL;
- uint64_t blknum, blkoffs;
+ uint64_t blknum, blkoffs, nrblocks;
int r;
if (!IS_ALIGNED (count | offset, BLKSIZE)) {
@@ -243,21 +243,16 @@ cow_pread (nbdkit_next *next,
}
/* Aligned body */
- /* XXX This breaks up large read requests into smaller ones, which
- * is a problem for plugins which have a large, fixed per-request
- * overhead (hello, curl). We should try to keep large requests
- * together as much as possible, but that requires us to be much
- * smarter here.
- */
- while (count >= BLKSIZE) {
- r = blk_read (next, blknum, buf, err);
+ nrblocks = count / BLKSIZE;
+ if (nrblocks > 0) {
+ r = blk_read_multiple (next, blknum, nrblocks, buf, err);
if (r == -1)
return -1;
- buf += BLKSIZE;
- count -= BLKSIZE;
- offset += BLKSIZE;
- blknum++;
+ buf += nrblocks * BLKSIZE;
+ count -= nrblocks * BLKSIZE;
+ offset += nrblocks * BLKSIZE;
+ blknum += nrblocks;
}
/* Unaligned tail */
--
2.31.1

View File

@ -0,0 +1,215 @@
From bf82947dabe08a0d51f87eb14619291900c65574 Mon Sep 17 00:00:00 2001
From: "Richard W.M. Jones" <rjones@redhat.com>
Date: Mon, 26 Jul 2021 15:21:18 +0100
Subject: [PATCH] cache, cow: Use full pread/pwrite operations
Although it probably cannot happen on Linux, POSIX allows pread/pwrite
to return or write fewer bytes than requested. The cache and cow
filters didn't handle this situation. Replace the raw
pread(2)/pwrite(2) syscalls with alternate versions which can handle
this.
(cherry picked from commit ce0db9d7736dd28dd0f10951ce65853e50b35e41)
---
common/utils/Makefile.am | 1 +
common/utils/full-rw.c | 81 ++++++++++++++++++++++++++++++++++++++++
common/utils/utils.h | 2 +
filters/cache/blk.c | 10 ++---
filters/cow/blk.c | 6 +--
5 files changed, 92 insertions(+), 8 deletions(-)
create mode 100644 common/utils/full-rw.c
diff --git a/common/utils/Makefile.am b/common/utils/Makefile.am
index 1708a4c8..14e9dfc4 100644
--- a/common/utils/Makefile.am
+++ b/common/utils/Makefile.am
@@ -40,6 +40,7 @@ libutils_la_SOURCES = \
cleanup-nbdkit.c \
cleanup.h \
environ.c \
+ full-rw.c \
quote.c \
utils.c \
utils.h \
diff --git a/common/utils/full-rw.c b/common/utils/full-rw.c
new file mode 100644
index 00000000..55b32cdd
--- /dev/null
+++ b/common/utils/full-rw.c
@@ -0,0 +1,81 @@
+/* nbdkit
+ * Copyright (C) 2021 Red Hat Inc.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are
+ * met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ *
+ * * Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in the
+ * documentation and/or other materials provided with the distribution.
+ *
+ * * Neither the name of Red Hat nor the names of its contributors may be
+ * used to endorse or promote products derived from this software without
+ * specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY RED HAT AND CONTRIBUTORS ''AS IS'' AND
+ * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO,
+ * THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
+ * PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL RED HAT OR
+ * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
+ * USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
+ * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
+ * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
+ * OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+ * SUCH DAMAGE.
+ */
+
+/* These functions are like pread(2)/pwrite(2) but they always read or
+ * write the full amount, or fail.
+ */
+
+#include <config.h>
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include <errno.h>
+
+ssize_t
+full_pread (int fd, void *buf, size_t count, off_t offset)
+{
+ ssize_t ret = 0, r;
+
+ while (count > 0) {
+ r = pread (fd, buf, count, offset);
+ if (r == -1) return -1;
+ if (r == 0) {
+ /* Presumably the caller wasn't expecting end-of-file here, so
+ * return an error.
+ */
+ errno = EIO;
+ return -1;
+ }
+ ret += r;
+ offset += r;
+ count -= r;
+ }
+
+ return ret;
+}
+
+ssize_t
+full_pwrite (int fd, const void *buf, size_t count, off_t offset)
+{
+ ssize_t ret = 0, r;
+
+ while (count > 0) {
+ r = pwrite (fd, buf, count, offset);
+ if (r == -1) return -1;
+ ret += r;
+ offset += r;
+ count -= r;
+ }
+
+ return ret;
+}
diff --git a/common/utils/utils.h b/common/utils/utils.h
index f8f70212..83397ae1 100644
--- a/common/utils/utils.h
+++ b/common/utils/utils.h
@@ -40,5 +40,7 @@ extern int set_cloexec (int fd);
extern int set_nonblock (int fd);
extern char **copy_environ (char **env, ...) __attribute__((__sentinel__));
extern char *make_temporary_directory (void);
+extern ssize_t full_pread (int fd, void *buf, size_t count, off_t offset);
+extern ssize_t full_pwrite (int fd, const void *buf, size_t count, off_t offset);
#endif /* NBDKIT_UTILS_H */
diff --git a/filters/cache/blk.c b/filters/cache/blk.c
index f85ada35..42bd3779 100644
--- a/filters/cache/blk.c
+++ b/filters/cache/blk.c
@@ -250,7 +250,7 @@ _blk_read_multiple (nbdkit_next *next,
" (offset %" PRIu64 ")",
blknum, (uint64_t) offset);
- if (pwrite (fd, block, blksize * runblocks, offset) == -1) {
+ if (full_pwrite (fd, block, blksize * runblocks, offset) == -1) {
*err = errno;
nbdkit_error ("pwrite: %m");
return -1;
@@ -262,7 +262,7 @@ _blk_read_multiple (nbdkit_next *next,
}
}
else { /* Read cache. */
- if (pread (fd, block, blksize * runblocks, offset) == -1) {
+ if (full_pread (fd, block, blksize * runblocks, offset) == -1) {
*err = errno;
nbdkit_error ("pread: %m");
return -1;
@@ -339,7 +339,7 @@ blk_cache (nbdkit_next *next,
nbdkit_debug ("cache: cache block %" PRIu64 " (offset %" PRIu64 ")",
blknum, (uint64_t) offset);
- if (pwrite (fd, block, blksize, offset) == -1) {
+ if (full_pwrite (fd, block, blksize, offset) == -1) {
*err = errno;
nbdkit_error ("pwrite: %m");
return -1;
@@ -380,7 +380,7 @@ blk_writethrough (nbdkit_next *next,
nbdkit_debug ("cache: writethrough block %" PRIu64 " (offset %" PRIu64 ")",
blknum, (uint64_t) offset);
- if (pwrite (fd, block, blksize, offset) == -1) {
+ if (full_pwrite (fd, block, blksize, offset) == -1) {
*err = errno;
nbdkit_error ("pwrite: %m");
return -1;
@@ -414,7 +414,7 @@ blk_write (nbdkit_next *next,
nbdkit_debug ("cache: writeback block %" PRIu64 " (offset %" PRIu64 ")",
blknum, (uint64_t) offset);
- if (pwrite (fd, block, blksize, offset) == -1) {
+ if (full_pwrite (fd, block, blksize, offset) == -1) {
*err = errno;
nbdkit_error ("pwrite: %m");
return -1;
diff --git a/filters/cow/blk.c b/filters/cow/blk.c
index 9e6c8879..cebd9454 100644
--- a/filters/cow/blk.c
+++ b/filters/cow/blk.c
@@ -282,7 +282,7 @@ blk_read_multiple (nbdkit_next *next,
memset (block + n, 0, tail);
}
else if (state == BLOCK_ALLOCATED) { /* Read overlay. */
- if (pread (fd, block, BLKSIZE * runblocks, offset) == -1) {
+ if (full_pread (fd, block, BLKSIZE * runblocks, offset) == -1) {
*err = errno;
nbdkit_error ("pread: %m");
return -1;
@@ -357,7 +357,7 @@ blk_cache (nbdkit_next *next,
memset (block + n, 0, tail);
if (mode == BLK_CACHE_COW) {
- if (pwrite (fd, block, BLKSIZE, offset) == -1) {
+ if (full_pwrite (fd, block, BLKSIZE, offset) == -1) {
*err = errno;
nbdkit_error ("pwrite: %m");
return -1;
@@ -376,7 +376,7 @@ blk_write (uint64_t blknum, const uint8_t *block, int *err)
nbdkit_debug ("cow: blk_write block %" PRIu64 " (offset %" PRIu64 ")",
blknum, (uint64_t) offset);
- if (pwrite (fd, block, BLKSIZE, offset) == -1) {
+ if (full_pwrite (fd, block, BLKSIZE, offset) == -1) {
*err = errno;
nbdkit_error ("pwrite: %m");
return -1;
--
2.31.1

View File

@ -0,0 +1,151 @@
From b7fe9b7b6c6d317291c76f15910215828bbfd4ff Mon Sep 17 00:00:00 2001
From: "Richard W.M. Jones" <rjones@redhat.com>
Date: Mon, 26 Jul 2021 16:16:15 +0100
Subject: [PATCH] cache: Implement cache-on-read=/PATH
For virt-v2v we will need to be able to turn cache-on-read on while
performing inspection and modification of the guest, and off when
doing the bulk copy. To do that allow the cache-on-read parameter to
refer to a path where the existence of the path toggles the feature.
(We could restart nbdkit between these phases, but this change avoids
doing that.)
(cherry picked from commit c8b575241b15b3bf0adaf15313e67e5ed4270b5a)
---
filters/cache/blk.c | 2 +-
filters/cache/cache.c | 33 ++++++++++++++++++++-------
filters/cache/cache.h | 10 ++++++--
filters/cache/nbdkit-cache-filter.pod | 11 ++++++++-
4 files changed, 44 insertions(+), 12 deletions(-)
diff --git a/filters/cache/blk.c b/filters/cache/blk.c
index 42bd3779..19f79605 100644
--- a/filters/cache/blk.c
+++ b/filters/cache/blk.c
@@ -244,7 +244,7 @@ _blk_read_multiple (nbdkit_next *next,
memset (block + n, 0, tail);
/* If cache-on-read, copy the blocks to the cache. */
- if (cache_on_read) {
+ if (cache_on_read ()) {
if (cache_debug_verbose)
nbdkit_debug ("cache: cache-on-read block %" PRIu64
" (offset %" PRIu64 ")",
diff --git a/filters/cache/cache.c b/filters/cache/cache.c
index 9c081948..8af52106 100644
--- a/filters/cache/cache.c
+++ b/filters/cache/cache.c
@@ -74,7 +74,8 @@ unsigned blksize;
enum cache_mode cache_mode = CACHE_MODE_WRITEBACK;
int64_t max_size = -1;
unsigned hi_thresh = 95, lo_thresh = 80;
-bool cache_on_read = false;
+enum cor_mode cor_mode = COR_OFF;
+const char *cor_path;
static int cache_flush (nbdkit_next *next, void *handle, uint32_t flags,
int *err);
@@ -161,12 +162,16 @@ cache_config (nbdkit_next_config *next, nbdkit_backend *nxdata,
}
#endif /* !HAVE_CACHE_RECLAIM */
else if (strcmp (key, "cache-on-read") == 0) {
- int r;
-
- r = nbdkit_parse_bool (value);
- if (r == -1)
- return -1;
- cache_on_read = r;
+ if (value[0] == '/') {
+ cor_path = value;
+ cor_mode = COR_PATH;
+ }
+ else {
+ int r = nbdkit_parse_bool (value);
+ if (r == -1)
+ return -1;
+ cor_mode = r ? COR_ON : COR_OFF;
+ }
return 0;
}
else {
@@ -177,7 +182,7 @@ cache_config (nbdkit_next_config *next, nbdkit_backend *nxdata,
#define cache_config_help_common \
"cache=MODE Set cache MODE, one of writeback (default),\n" \
" writethrough, or unsafe.\n" \
- "cache-on-read=BOOL Set to true to cache on reads (default false).\n"
+ "cache-on-read=BOOL|/PATH Set to true to cache on reads (default false).\n"
#ifndef HAVE_CACHE_RECLAIM
#define cache_config_help cache_config_help_common
#else
@@ -187,6 +192,18 @@ cache_config (nbdkit_next_config *next, nbdkit_backend *nxdata,
"cache-low-threshold=PCT Percentage of max size where reclaim ends.\n"
#endif
+/* Decide if cache-on-read is currently on or off. */
+bool
+cache_on_read (void)
+{
+ switch (cor_mode) {
+ case COR_ON: return true;
+ case COR_OFF: return false;
+ case COR_PATH: return access (cor_path, F_OK) == 0;
+ default: abort ();
+ }
+}
+
static int
cache_config_complete (nbdkit_next_config_complete *next,
nbdkit_backend *nxdata)
diff --git a/filters/cache/cache.h b/filters/cache/cache.h
index 2b72221f..a559adef 100644
--- a/filters/cache/cache.h
+++ b/filters/cache/cache.h
@@ -49,7 +49,13 @@ extern unsigned blksize;
extern int64_t max_size;
extern unsigned hi_thresh, lo_thresh;
-/* Cache read requests. */
-extern bool cache_on_read;
+/* Cache on read mode. */
+extern enum cor_mode {
+ COR_OFF,
+ COR_ON,
+ COR_PATH,
+} cor_mode;
+extern const char *cor_path;
+extern bool cache_on_read (void);
#endif /* NBDKIT_CACHE_H */
diff --git a/filters/cache/nbdkit-cache-filter.pod b/filters/cache/nbdkit-cache-filter.pod
index 34fd0b29..2ac307e0 100644
--- a/filters/cache/nbdkit-cache-filter.pod
+++ b/filters/cache/nbdkit-cache-filter.pod
@@ -8,7 +8,7 @@ nbdkit-cache-filter - nbdkit caching filter
[cache-max-size=SIZE]
[cache-high-threshold=N]
[cache-low-threshold=N]
- [cache-on-read=true|false]
+ [cache-on-read=true|false|/PATH]
[plugin-args...]
=head1 DESCRIPTION
@@ -87,6 +87,15 @@ the plugin.
Do not cache read requests (this is the default).
+=item B<cache-on-read=/PATH>
+
+(nbdkit E<ge> 1.28)
+
+When F</PATH> (which must be an absolute path) exists, this behaves
+like C<cache-on-read=true>, and when it does not exist like
+C<cache-on-read=false>. This allows you to control the cache-on-read
+behaviour while nbdkit is running.
+
=back
=head1 CACHE MAXIMUM SIZE
--
2.31.1

View File

@ -0,0 +1,278 @@
From 743b49ed9cd8d302d0274fc16ebc7783978b0c2e Mon Sep 17 00:00:00 2001
From: "Richard W.M. Jones" <rjones@redhat.com>
Date: Mon, 26 Jul 2021 16:30:26 +0100
Subject: [PATCH] cache: Add cache-min-block-size parameter
This allows you to choose a larger block size. I found experimentally
that this improves performance because of locality in access patterns.
The idea came from qcow2 which implicitly does the same thing because
of the relatively large cluster size (32K).
nbdkit + cache-filter with 4K block size + cache-on-read + curl
(to a very slow remote site):
=> virt-inspector took 22 mins
same with 64K block size:
=> virt-inspector took 19 mins
However compared to a qcow2 file using qemu's copy-on-read, backed
with nbdkit + curl we are still a lot slower, possibly because having
the cache inside virt-inspector greatly reduces round trip overhead:
=> virt-inspector took 13 mins
(cherry picked from commit 4ceacb6caa64e12bd78af5f90e86ee591e055944)
---
filters/cache/blk.c | 2 +-
filters/cache/cache.c | 36 ++++++++++----
filters/cache/cache.h | 3 ++
filters/cache/nbdkit-cache-filter.pod | 9 ++++
tests/Makefile.am | 2 +
tests/test-cache-block-size.sh | 70 +++++++++++++++++++++++++++
6 files changed, 112 insertions(+), 10 deletions(-)
create mode 100755 tests/test-cache-block-size.sh
diff --git a/filters/cache/blk.c b/filters/cache/blk.c
index 19f79605..6276985f 100644
--- a/filters/cache/blk.c
+++ b/filters/cache/blk.c
@@ -149,7 +149,7 @@ blk_init (void)
nbdkit_error ("fstatvfs: %s: %m", tmpdir);
return -1;
}
- blksize = MAX (4096, statvfs.f_bsize);
+ blksize = MAX (min_block_size, statvfs.f_bsize);
nbdkit_debug ("cache: block size: %u", blksize);
bitmap_init (&bm, blksize, 2 /* bits per block */);
diff --git a/filters/cache/cache.c b/filters/cache/cache.c
index 8af52106..48a20c3b 100644
--- a/filters/cache/cache.c
+++ b/filters/cache/cache.c
@@ -40,6 +40,7 @@
#include <inttypes.h>
#include <unistd.h>
#include <fcntl.h>
+#include <limits.h>
#include <errno.h>
#include <assert.h>
#include <sys/types.h>
@@ -62,6 +63,7 @@
#include "blk.h"
#include "reclaim.h"
#include "isaligned.h"
+#include "ispowerof2.h"
#include "minmax.h"
#include "rounding.h"
@@ -70,7 +72,8 @@
*/
static pthread_mutex_t lock = PTHREAD_MUTEX_INITIALIZER;
-unsigned blksize;
+unsigned blksize; /* actual block size (picked by blk.c) */
+unsigned min_block_size = 4096;
enum cache_mode cache_mode = CACHE_MODE_WRITEBACK;
int64_t max_size = -1;
unsigned hi_thresh = 95, lo_thresh = 80;
@@ -80,13 +83,6 @@ const char *cor_path;
static int cache_flush (nbdkit_next *next, void *handle, uint32_t flags,
int *err);
-static void
-cache_load (void)
-{
- if (blk_init () == -1)
- exit (EXIT_FAILURE);
-}
-
static void
cache_unload (void)
{
@@ -116,6 +112,19 @@ cache_config (nbdkit_next_config *next, nbdkit_backend *nxdata,
return -1;
}
}
+ else if (strcmp (key, "cache-min-block-size") == 0) {
+ int64_t r;
+
+ r = nbdkit_parse_size (value);
+ if (r == -1)
+ return -1;
+ if (r < 4096 || !is_power_of_2 (r) || r > UINT_MAX) {
+ nbdkit_error ("cache-min-block-size is not a power of 2, or is too small or too large");
+ return -1;
+ }
+ min_block_size = r;
+ return 0;
+ }
#ifdef HAVE_CACHE_RECLAIM
else if (strcmp (key, "cache-max-size") == 0) {
int64_t r;
@@ -220,6 +229,15 @@ cache_config_complete (nbdkit_next_config_complete *next,
return next (nxdata);
}
+static int
+cache_get_ready (int thread_model)
+{
+ if (blk_init () == -1)
+ return -1;
+
+ return 0;
+}
+
/* Get the file size, set the cache size. */
static int64_t
cache_get_size (nbdkit_next *next,
@@ -691,11 +709,11 @@ cache_cache (nbdkit_next *next,
static struct nbdkit_filter filter = {
.name = "cache",
.longname = "nbdkit caching filter",
- .load = cache_load,
.unload = cache_unload,
.config = cache_config,
.config_complete = cache_config_complete,
.config_help = cache_config_help,
+ .get_ready = cache_get_ready,
.prepare = cache_prepare,
.get_size = cache_get_size,
.can_cache = cache_can_cache,
diff --git a/filters/cache/cache.h b/filters/cache/cache.h
index a559adef..5c32c37c 100644
--- a/filters/cache/cache.h
+++ b/filters/cache/cache.h
@@ -45,6 +45,9 @@ extern enum cache_mode {
/* Size of a block in the cache. */
extern unsigned blksize;
+/* Minimum block size (cache-min-block-size parameter). */
+extern unsigned min_block_size;
+
/* Maximum size of the cache and high/low thresholds. */
extern int64_t max_size;
extern unsigned hi_thresh, lo_thresh;
diff --git a/filters/cache/nbdkit-cache-filter.pod b/filters/cache/nbdkit-cache-filter.pod
index 2ac307e0..9511e91b 100644
--- a/filters/cache/nbdkit-cache-filter.pod
+++ b/filters/cache/nbdkit-cache-filter.pod
@@ -5,6 +5,7 @@ nbdkit-cache-filter - nbdkit caching filter
=head1 SYNOPSIS
nbdkit --filter=cache plugin [cache=writeback|writethrough|unsafe]
+ [cache-min-block-size=SIZE]
[cache-max-size=SIZE]
[cache-high-threshold=N]
[cache-low-threshold=N]
@@ -59,6 +60,14 @@ This is dangerous and can cause data loss, but this may be acceptable
if you only use it for testing or with data that you don't care about
or can cheaply reconstruct.
+=item B<cache-min-block-size=>SIZE
+
+Set the minimum block size used by the cache. This must be a power of
+2 and E<ge> 4096.
+
+The default is 4096, or the block size of the filesystem which
+contains the temporary file storing the cache (whichever is larger).
+
=item B<cache-max-size=>SIZE
=item B<cache-high-threshold=>N
diff --git a/tests/Makefile.am b/tests/Makefile.am
index 9630205d..a038eabc 100644
--- a/tests/Makefile.am
+++ b/tests/Makefile.am
@@ -1371,12 +1371,14 @@ EXTRA_DIST += test-blocksize.sh test-blocksize-extents.sh
# cache filter test.
TESTS += \
test-cache.sh \
+ test-cache-block-size.sh \
test-cache-on-read.sh \
test-cache-max-size.sh \
test-cache-unaligned.sh \
$(NULL)
EXTRA_DIST += \
test-cache.sh \
+ test-cache-block-size.sh \
test-cache-on-read.sh \
test-cache-max-size.sh \
test-cache-unaligned.sh \
diff --git a/tests/test-cache-block-size.sh b/tests/test-cache-block-size.sh
new file mode 100755
index 00000000..a2a27407
--- /dev/null
+++ b/tests/test-cache-block-size.sh
@@ -0,0 +1,70 @@
+#!/usr/bin/env bash
+# nbdkit
+# Copyright (C) 2018-2021 Red Hat Inc.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions are
+# met:
+#
+# * Redistributions of source code must retain the above copyright
+# notice, this list of conditions and the following disclaimer.
+#
+# * Redistributions in binary form must reproduce the above copyright
+# notice, this list of conditions and the following disclaimer in the
+# documentation and/or other materials provided with the distribution.
+#
+# * Neither the name of Red Hat nor the names of its contributors may be
+# used to endorse or promote products derived from this software without
+# specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY RED HAT AND CONTRIBUTORS ''AS IS'' AND
+# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO,
+# THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
+# PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL RED HAT OR
+# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
+# USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
+# ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
+# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
+# OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+# SUCH DAMAGE.
+
+source ./functions.sh
+set -e
+set -x
+
+requires_filter cache
+requires_nbdsh_uri
+
+sock=$(mktemp -u /tmp/nbdkit-test-sock.XXXXXX)
+files="cache-block-size.img $sock cache-block-size.pid"
+rm -f $files
+cleanup_fn rm -f $files
+
+# Create an empty base image.
+truncate -s 128K cache-block-size.img
+
+# Run nbdkit with the caching filter.
+start_nbdkit -P cache-block-size.pid -U $sock --filter=cache \
+ file cache-block-size.img cache-min-block-size=64K
+
+nbdsh --connect "nbd+unix://?socket=$sock" \
+ -c '
+# Write some pattern data to the overlay and check it reads back OK.
+buf = b"abcd" * 16384
+h.pwrite(buf, 32768)
+zero = h.pread(32768, 0)
+assert zero == bytearray(32768)
+buf2 = h.pread(65536, 32768)
+assert buf == buf2
+
+# Flushing should write through to the underlying file.
+h.flush()
+
+with open("cache-block-size.img", "rb") as file:
+ zero = file.read(32768)
+ assert zero == bytearray(32768)
+ buf2 = file.read(65536)
+ assert buf == buf2
+'
--
2.31.1

View File

@ -0,0 +1,138 @@
From 70e0df6462c34c4946b64e172d163b58121cf424 Mon Sep 17 00:00:00 2001
From: "Richard W.M. Jones" <rjones@redhat.com>
Date: Mon, 26 Jul 2021 17:39:23 +0100
Subject: [PATCH] cache, cow: Use a 64K block size by default
Based on the results presented in the previous commit, use a 64K block
size by default in both the cache and cow filters. For the cache
filter you could go back to a 4K block size if you wanted by using the
cache-min-block-size=4K parameter. For cow it is compiled in so
cannot be adjusted.
(cherry picked from commit c1905b0a28677d961babdb16d6f30ae61042c825)
---
filters/cache/cache.c | 2 +-
filters/cache/nbdkit-cache-filter.pod | 4 ++--
filters/cow/blk.h | 2 +-
tests/test-cache-block-size.sh | 2 +-
tests/test-cow-extents1.sh | 33 +++++++++++++++------------
5 files changed, 23 insertions(+), 20 deletions(-)
diff --git a/filters/cache/cache.c b/filters/cache/cache.c
index 48a20c3b..f7b01039 100644
--- a/filters/cache/cache.c
+++ b/filters/cache/cache.c
@@ -73,7 +73,7 @@
static pthread_mutex_t lock = PTHREAD_MUTEX_INITIALIZER;
unsigned blksize; /* actual block size (picked by blk.c) */
-unsigned min_block_size = 4096;
+unsigned min_block_size = 65536;
enum cache_mode cache_mode = CACHE_MODE_WRITEBACK;
int64_t max_size = -1;
unsigned hi_thresh = 95, lo_thresh = 80;
diff --git a/filters/cache/nbdkit-cache-filter.pod b/filters/cache/nbdkit-cache-filter.pod
index 9511e91b..99707373 100644
--- a/filters/cache/nbdkit-cache-filter.pod
+++ b/filters/cache/nbdkit-cache-filter.pod
@@ -65,8 +65,8 @@ or can cheaply reconstruct.
Set the minimum block size used by the cache. This must be a power of
2 and E<ge> 4096.
-The default is 4096, or the block size of the filesystem which
-contains the temporary file storing the cache (whichever is larger).
+The default is 64K, or the block size of the filesystem which contains
+the temporary file storing the cache (whichever is larger).
=item B<cache-max-size=>SIZE
diff --git a/filters/cow/blk.h b/filters/cow/blk.h
index b066c602..1bc85283 100644
--- a/filters/cow/blk.h
+++ b/filters/cow/blk.h
@@ -36,7 +36,7 @@
/* Size of a block in the overlay. A 4K block size means that we need
* 64 MB of memory to store the bitmap for a 1 TB underlying image.
*/
-#define BLKSIZE 4096
+#define BLKSIZE 65536
/* Initialize the overlay and bitmap. */
extern int blk_init (void);
diff --git a/tests/test-cache-block-size.sh b/tests/test-cache-block-size.sh
index a2a27407..d20cc940 100755
--- a/tests/test-cache-block-size.sh
+++ b/tests/test-cache-block-size.sh
@@ -47,7 +47,7 @@ truncate -s 128K cache-block-size.img
# Run nbdkit with the caching filter.
start_nbdkit -P cache-block-size.pid -U $sock --filter=cache \
- file cache-block-size.img cache-min-block-size=64K
+ file cache-block-size.img cache-min-block-size=4K
nbdsh --connect "nbd+unix://?socket=$sock" \
-c '
diff --git a/tests/test-cow-extents1.sh b/tests/test-cow-extents1.sh
index 8e0e0383..ebfd83f6 100755
--- a/tests/test-cow-extents1.sh
+++ b/tests/test-cow-extents1.sh
@@ -65,7 +65,7 @@ cleanup_fn rm -f $files
# Create a base file which is half allocated, half sparse.
dd if=/dev/urandom of=$base count=128 bs=1K
-truncate -s 256K $base
+truncate -s 4M $base
lastmod="$(stat -c "%y" $base)"
# Run nbdkit with a COW overlay.
@@ -76,30 +76,33 @@ uri="nbd+unix:///?socket=$sock"
nbdinfo --map "$uri" > $out
cat $out
if [ "$(tr -s ' ' < $out | cut -d' ' -f 1-4)" != " 0 131072 0
- 131072 131072 3" ]; then
+ 131072 4063232 3" ]; then
echo "$0: unexpected initial file map"
exit 1
fi
# Punch some holes.
nbdsh -u "$uri" \
- -c 'h.trim(4096, 4096)' \
- -c 'h.trim(4098, 16383)' \
- -c 'h.pwrite(b"1"*4096, 65536)' \
- -c 'h.trim(8192, 131072)' \
- -c 'h.pwrite(b"2"*8192, 196608)'
+ -c 'bs = 65536' \
+ -c 'h.trim(bs, bs)' \
+ -c 'h.trim(bs+2, 4*bs-1)' \
+ -c 'h.pwrite(b"1"*bs, 16*bs)' \
+ -c 'h.trim(2*bs, 32*bs)' \
+ -c 'h.pwrite(b"2"*(2*bs), 48*bs)'
# The extents map should be fully allocated.
nbdinfo --map "$uri" > $out
cat $out
-if [ "$(tr -s ' ' < $out | cut -d' ' -f 1-4)" != " 0 4096 0
- 4096 4096 3
- 8192 8192 0
- 16384 4096 3
- 20480 110592 0
- 131072 65536 3
- 196608 8192 0
- 204800 57344 3" ]; then
+if [ "$(tr -s ' ' < $out | cut -d' ' -f 1-4)" != " 0 65536 0
+ 65536 131072 3
+ 196608 65536 0
+ 262144 65536 3
+ 327680 65536 0
+ 393216 655360 3
+ 1048576 65536 0
+ 1114112 2031616 3
+ 3145728 131072 0
+ 3276800 917504 3" ]; then
echo "$0: unexpected trimmed file map"
exit 1
fi
--
2.31.1

View File

@ -0,0 +1,50 @@
From 9cf300962b9f453972deaf744c202327c42970db Mon Sep 17 00:00:00 2001
From: "Richard W.M. Jones" <rjones@redhat.com>
Date: Tue, 27 Jul 2021 21:16:30 +0100
Subject: [PATCH] cache: Refactor printing state into new function
This minor refactoring just makes the cache and cow filters' blk.c a
little bit more similar.
(cherry picked from commit bdb86ea14c00a950f2a2d34071ac1e0799d29132)
---
filters/cache/blk.c | 16 ++++++++++++----
1 file changed, 12 insertions(+), 4 deletions(-)
diff --git a/filters/cache/blk.c b/filters/cache/blk.c
index 6276985f..e50a7f24 100644
--- a/filters/cache/blk.c
+++ b/filters/cache/blk.c
@@ -94,6 +94,17 @@ enum bm_entry {
BLOCK_DIRTY = 3,
};
+static const char *
+state_to_string (enum bm_entry state)
+{
+ switch (state) {
+ case BLOCK_NOT_CACHED: return "not cached";
+ case BLOCK_CLEAN: return "clean";
+ case BLOCK_DIRTY: return "dirty";
+ default: abort ();
+ }
+}
+
/* Extra debugging (-D cache.verbose=1). */
NBDKIT_DLL_PUBLIC int cache_debug_verbose = 0;
@@ -312,10 +323,7 @@ blk_cache (nbdkit_next *next,
nbdkit_debug ("cache: blk_cache block %" PRIu64
" (offset %" PRIu64 ") is %s",
blknum, (uint64_t) offset,
- state == BLOCK_NOT_CACHED ? "not cached" :
- state == BLOCK_CLEAN ? "clean" :
- state == BLOCK_DIRTY ? "dirty" :
- "unknown");
+ state_to_string (state));
if (state == BLOCK_NOT_CACHED) {
/* Read underlying plugin, copy to cache regardless of cache-on-read. */
--
2.31.1

View File

@ -0,0 +1,147 @@
From a203296a125ce6a28d1d73d248f0899754c3677c Mon Sep 17 00:00:00 2001
From: "Richard W.M. Jones" <rjones@redhat.com>
Date: Thu, 29 Jul 2021 20:16:43 +0100
Subject: [PATCH] tests: cache: Test cache-on-read option really caches
By making use of the delay filter to add a penalty for hitting the
plugin we can check whether or not the cache-on-read option is
working.
(cherry picked from commit 3ae7aa533bb9322ab6dc6deecb687ded76634ab4)
---
tests/Makefile.am | 2 +
tests/test-cache-on-read-caches.sh | 87 ++++++++++++++++++++++++++++++
tests/test-cache-on-read.sh | 5 --
3 files changed, 89 insertions(+), 5 deletions(-)
create mode 100755 tests/test-cache-on-read-caches.sh
diff --git a/tests/Makefile.am b/tests/Makefile.am
index a038eabc..51ca913a 100644
--- a/tests/Makefile.am
+++ b/tests/Makefile.am
@@ -1373,6 +1373,7 @@ TESTS += \
test-cache.sh \
test-cache-block-size.sh \
test-cache-on-read.sh \
+ test-cache-on-read-caches.sh \
test-cache-max-size.sh \
test-cache-unaligned.sh \
$(NULL)
@@ -1380,6 +1381,7 @@ EXTRA_DIST += \
test-cache.sh \
test-cache-block-size.sh \
test-cache-on-read.sh \
+ test-cache-on-read-caches.sh \
test-cache-max-size.sh \
test-cache-unaligned.sh \
$(NULL)
diff --git a/tests/test-cache-on-read-caches.sh b/tests/test-cache-on-read-caches.sh
new file mode 100755
index 00000000..80b34159
--- /dev/null
+++ b/tests/test-cache-on-read-caches.sh
@@ -0,0 +1,87 @@
+#!/usr/bin/env bash
+# nbdkit
+# Copyright (C) 2018-2021 Red Hat Inc.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions are
+# met:
+#
+# * Redistributions of source code must retain the above copyright
+# notice, this list of conditions and the following disclaimer.
+#
+# * Redistributions in binary form must reproduce the above copyright
+# notice, this list of conditions and the following disclaimer in the
+# documentation and/or other materials provided with the distribution.
+#
+# * Neither the name of Red Hat nor the names of its contributors may be
+# used to endorse or promote products derived from this software without
+# specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY RED HAT AND CONTRIBUTORS ''AS IS'' AND
+# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO,
+# THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
+# PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL RED HAT OR
+# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
+# USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
+# ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
+# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
+# OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+# SUCH DAMAGE.
+
+source ./functions.sh
+set -e
+set -x
+
+requires_filter cache
+requires_filter delay
+requires_nbdsh_uri
+
+sock=$(mktemp -u /tmp/nbdkit-test-sock.XXXXXX)
+files="$sock cache-on-read-caches.pid"
+rm -f $files
+cleanup_fn rm -f $files
+
+# Run nbdkit with the cache filter, cache-on-read and a read delay.
+start_nbdkit -P cache-on-read-caches.pid -U $sock \
+ --filter=cache --filter=delay \
+ memory 64K cache-on-read=true rdelay=10
+
+nbdsh --connect "nbd+unix://?socket=$sock" \
+ -c '
+from time import time
+
+# First read should suffer a penalty. Because we are reading
+# a single 64K block (same size as the cache block), we should
+# only suffer one penalty of approx. 10 seconds.
+st = time()
+zb = h.pread(65536, 0)
+et = time()
+el = et-st
+print("elapsed time: %g" % el)
+assert et-st >= 10
+assert zb == bytearray(65536)
+
+# Second read should not suffer a penalty.
+st = time()
+zb = h.pread(65536, 0)
+et = time()
+el = et-st
+print("elapsed time: %g" % el)
+assert el < 10
+assert zb == bytearray(65536)
+
+# Write something.
+buf = b"abcd" * 16384
+h.pwrite(buf, 0)
+
+# Reading back should be quick since it is stored in the overlay.
+st = time()
+buf2 = h.pread(65536, 0)
+et = time()
+el = et-st
+print("elapsed time: %g" % el)
+assert el < 10
+assert buf == buf2
+'
diff --git a/tests/test-cache-on-read.sh b/tests/test-cache-on-read.sh
index f8584dcd..85ca83d4 100755
--- a/tests/test-cache-on-read.sh
+++ b/tests/test-cache-on-read.sh
@@ -56,9 +56,4 @@ zero = h.pread(32768, 0)
assert zero == bytearray(32768)
buf2 = h.pread(65536, 32768)
assert buf == buf2
-
-# XXX Suggestion to improve this test: Use the delay filter below the
-# cache filter, and time reads to prove that the second read is faster
-# because it is not going through the delay filter and plugin.
-# XXX second h.pread here ...
'
--
2.31.1

View File

@ -0,0 +1,457 @@
From 330a2b99378c5bb6c57ab8ffb8069d21e64d5312 Mon Sep 17 00:00:00 2001
From: "Richard W.M. Jones" <rjones@redhat.com>
Date: Tue, 27 Jul 2021 23:01:52 +0100
Subject: [PATCH] cow: Implement cow-on-read
This is very similar to the nbdkit-cache-filter cache-on-read flag.
(cherry picked from commit bd93b3f27246f917de48a6cc2525d9c424c07976)
---
filters/cow/blk.c | 21 ++++++--
filters/cow/blk.h | 10 ++--
filters/cow/cow.c | 56 ++++++++++++++++----
filters/cow/nbdkit-cow-filter.pod | 17 ++++++
tests/Makefile.am | 4 ++
tests/test-cow-on-read-caches.sh | 87 +++++++++++++++++++++++++++++++
tests/test-cow-on-read.sh | 59 +++++++++++++++++++++
7 files changed, 236 insertions(+), 18 deletions(-)
create mode 100755 tests/test-cow-on-read-caches.sh
create mode 100755 tests/test-cow-on-read.sh
diff --git a/filters/cow/blk.c b/filters/cow/blk.c
index cebd9454..9d42b5fc 100644
--- a/filters/cow/blk.c
+++ b/filters/cow/blk.c
@@ -230,7 +230,7 @@ blk_status (uint64_t blknum, bool *present, bool *trimmed)
int
blk_read_multiple (nbdkit_next *next,
uint64_t blknum, uint64_t nrblocks,
- uint8_t *block, int *err)
+ uint8_t *block, bool cow_on_read, int *err)
{
off_t offset = blknum * BLKSIZE;
enum bm_entry state;
@@ -280,6 +280,19 @@ blk_read_multiple (nbdkit_next *next,
* zeroing the tail.
*/
memset (block + n, 0, tail);
+
+ /* If cow-on-read is true then copy the blocks to the cache and
+ * set them as allocated.
+ */
+ if (cow_on_read) {
+ if (full_pwrite (fd, block, BLKSIZE * runblocks, offset) == -1) {
+ *err = errno;
+ nbdkit_error ("pwrite: %m");
+ return -1;
+ }
+ for (b = 0; b < runblocks; ++b)
+ bitmap_set_blk (&bm, blknum+b, BLOCK_ALLOCATED);
+ }
}
else if (state == BLOCK_ALLOCATED) { /* Read overlay. */
if (full_pread (fd, block, BLKSIZE * runblocks, offset) == -1) {
@@ -301,14 +314,14 @@ blk_read_multiple (nbdkit_next *next,
blknum + runblocks,
nrblocks - runblocks,
block + BLKSIZE * runblocks,
- err);
+ cow_on_read, err);
}
int
blk_read (nbdkit_next *next,
- uint64_t blknum, uint8_t *block, int *err)
+ uint64_t blknum, uint8_t *block, bool cow_on_read, int *err)
{
- return blk_read_multiple (next, blknum, 1, block, err);
+ return blk_read_multiple (next, blknum, 1, block, cow_on_read, err);
}
int
diff --git a/filters/cow/blk.h b/filters/cow/blk.h
index 1bc85283..b7e6f092 100644
--- a/filters/cow/blk.h
+++ b/filters/cow/blk.h
@@ -52,14 +52,16 @@ extern void blk_status (uint64_t blknum, bool *present, bool *trimmed);
/* Read a single block from the overlay or plugin. */
extern int blk_read (nbdkit_next *next,
- uint64_t blknum, uint8_t *block, int *err)
- __attribute__((__nonnull__ (1, 3, 4)));
+ uint64_t blknum, uint8_t *block,
+ bool cow_on_read, int *err)
+ __attribute__((__nonnull__ (1, 3, 5)));
/* Read multiple blocks from the overlay or plugin. */
extern int blk_read_multiple (nbdkit_next *next,
uint64_t blknum, uint64_t nrblocks,
- uint8_t *block, int *err)
- __attribute__((__nonnull__ (1, 4, 5)));
+ uint8_t *block,
+ bool cow_on_read, int *err)
+ __attribute__((__nonnull__ (1, 4, 6)));
/* Cache mode for blocks not already in overlay */
enum cache_mode {
diff --git a/filters/cow/cow.c b/filters/cow/cow.c
index f74c0a34..74fcd61c 100644
--- a/filters/cow/cow.c
+++ b/filters/cow/cow.c
@@ -38,6 +38,7 @@
#include <stdbool.h>
#include <inttypes.h>
#include <string.h>
+#include <unistd.h>
#include <errno.h>
#include <pthread.h>
@@ -59,6 +60,15 @@ static pthread_mutex_t rmw_lock = PTHREAD_MUTEX_INITIALIZER;
bool cow_on_cache;
+/* Cache on read ("cow-on-read") mode. */
+extern enum cor_mode {
+ COR_OFF,
+ COR_ON,
+ COR_PATH,
+} cor_mode;
+enum cor_mode cor_mode = COR_OFF;
+const char *cor_path;
+
static void
cow_load (void)
{
@@ -85,13 +95,39 @@ cow_config (nbdkit_next_config *next, nbdkit_backend *nxdata,
cow_on_cache = r;
return 0;
}
+ else if (strcmp (key, "cow-on-read") == 0) {
+ if (value[0] == '/') {
+ cor_path = value;
+ cor_mode = COR_PATH;
+ }
+ else {
+ int r = nbdkit_parse_bool (value);
+ if (r == -1)
+ return -1;
+ cor_mode = r ? COR_ON : COR_OFF;
+ }
+ return 0;
+ }
else {
return next (nxdata, key, value);
}
}
#define cow_config_help \
- "cow-on-cache=<BOOL> Set to true to treat client cache requests as writes.\n"
+ "cow-on-cache=<BOOL> Copy cache (prefetch) requests to the overlay.\n" \
+ "cow-on-read=<BOOL>|/PATH Copy read requests to the overlay."
+
+/* Decide if cow-on-read is currently on or off. */
+bool
+cow_on_read (void)
+{
+ switch (cor_mode) {
+ case COR_ON: return true;
+ case COR_OFF: return false;
+ case COR_PATH: return access (cor_path, F_OK) == 0;
+ default: abort ();
+ }
+}
static void *
cow_open (nbdkit_next_open *next, nbdkit_context *nxdata,
@@ -230,7 +266,7 @@ cow_pread (nbdkit_next *next,
uint64_t n = MIN (BLKSIZE - blkoffs, count);
assert (block);
- r = blk_read (next, blknum, block, err);
+ r = blk_read (next, blknum, block, cow_on_read (), err);
if (r == -1)
return -1;
@@ -245,7 +281,7 @@ cow_pread (nbdkit_next *next,
/* Aligned body */
nrblocks = count / BLKSIZE;
if (nrblocks > 0) {
- r = blk_read_multiple (next, blknum, nrblocks, buf, err);
+ r = blk_read_multiple (next, blknum, nrblocks, buf, cow_on_read (), err);
if (r == -1)
return -1;
@@ -258,7 +294,7 @@ cow_pread (nbdkit_next *next,
/* Unaligned tail */
if (count) {
assert (block);
- r = blk_read (next, blknum, block, err);
+ r = blk_read (next, blknum, block, cow_on_read (), err);
if (r == -1)
return -1;
@@ -299,7 +335,7 @@ cow_pwrite (nbdkit_next *next,
*/
assert (block);
ACQUIRE_LOCK_FOR_CURRENT_SCOPE (&rmw_lock);
- r = blk_read (next, blknum, block, err);
+ r = blk_read (next, blknum, block, cow_on_read (), err);
if (r != -1) {
memcpy (&block[blkoffs], buf, n);
r = blk_write (blknum, block, err);
@@ -329,7 +365,7 @@ cow_pwrite (nbdkit_next *next,
if (count) {
assert (block);
ACQUIRE_LOCK_FOR_CURRENT_SCOPE (&rmw_lock);
- r = blk_read (next, blknum, block, err);
+ r = blk_read (next, blknum, block, cow_on_read (), err);
if (r != -1) {
memcpy (block, buf, count);
r = blk_write (blknum, block, err);
@@ -379,7 +415,7 @@ cow_zero (nbdkit_next *next,
* Hold the rmw_lock over the whole operation.
*/
ACQUIRE_LOCK_FOR_CURRENT_SCOPE (&rmw_lock);
- r = blk_read (next, blknum, block, err);
+ r = blk_read (next, blknum, block, cow_on_read (), err);
if (r != -1) {
memset (&block[blkoffs], 0, n);
r = blk_write (blknum, block, err);
@@ -411,7 +447,7 @@ cow_zero (nbdkit_next *next,
/* Unaligned tail */
if (count) {
ACQUIRE_LOCK_FOR_CURRENT_SCOPE (&rmw_lock);
- r = blk_read (next, blknum, block, err);
+ r = blk_read (next, blknum, block, cow_on_read (), err);
if (r != -1) {
memset (&block[count], 0, BLKSIZE - count);
r = blk_write (blknum, block, err);
@@ -455,7 +491,7 @@ cow_trim (nbdkit_next *next,
* Hold the lock over the whole operation.
*/
ACQUIRE_LOCK_FOR_CURRENT_SCOPE (&rmw_lock);
- r = blk_read (next, blknum, block, err);
+ r = blk_read (next, blknum, block, cow_on_read (), err);
if (r != -1) {
memset (&block[blkoffs], 0, n);
r = blk_write (blknum, block, err);
@@ -482,7 +518,7 @@ cow_trim (nbdkit_next *next,
/* Unaligned tail */
if (count) {
ACQUIRE_LOCK_FOR_CURRENT_SCOPE (&rmw_lock);
- r = blk_read (next, blknum, block, err);
+ r = blk_read (next, blknum, block, cow_on_read (), err);
if (r != -1) {
memset (&block[count], 0, BLKSIZE - count);
r = blk_write (blknum, block, err);
diff --git a/filters/cow/nbdkit-cow-filter.pod b/filters/cow/nbdkit-cow-filter.pod
index 2a693ebe..6366d8a8 100644
--- a/filters/cow/nbdkit-cow-filter.pod
+++ b/filters/cow/nbdkit-cow-filter.pod
@@ -62,6 +62,23 @@ the data from the plugin into the overlay.
Do not save data from cache (prefetch) requests in the overlay. This
leaves the overlay as small as possible. This is the default.
+=item B<cow-on-read=true>
+
+When the client issues a read request, copy the data into the overlay
+so that the same data can be served more quickly later.
+
+=item B<cow-on-read=false>
+
+Do not save data from read requests in the overlay. This leaves the
+overlay as small as possible. This is the default.
+
+=item B<cow-on-read=/PATH>
+
+When F</PATH> (which must be an absolute path) exists, this behaves
+like C<cow-on-read=true>, and when it does not exist like
+C<cow-on-read=false>. This allows you to control the C<cow-on-read>
+behaviour while nbdkit is running.
+
=back
=head1 EXAMPLES
diff --git a/tests/Makefile.am b/tests/Makefile.am
index 51ca913a..edc8d66d 100644
--- a/tests/Makefile.am
+++ b/tests/Makefile.am
@@ -1407,6 +1407,8 @@ TESTS += \
test-cow-extents1.sh \
test-cow-extents2.sh \
test-cow-extents-large.sh \
+ test-cow-on-read.sh \
+ test-cow-on-read-caches.sh \
test-cow-unaligned.sh \
$(NULL)
endif
@@ -1417,6 +1419,8 @@ EXTRA_DIST += \
test-cow-extents2.sh \
test-cow-extents-large.sh \
test-cow-null.sh \
+ test-cow-on-read.sh \
+ test-cow-on-read-caches.sh \
test-cow-unaligned.sh \
$(NULL)
diff --git a/tests/test-cow-on-read-caches.sh b/tests/test-cow-on-read-caches.sh
new file mode 100755
index 00000000..c5b60198
--- /dev/null
+++ b/tests/test-cow-on-read-caches.sh
@@ -0,0 +1,87 @@
+#!/usr/bin/env bash
+# nbdkit
+# Copyright (C) 2018-2021 Red Hat Inc.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions are
+# met:
+#
+# * Redistributions of source code must retain the above copyright
+# notice, this list of conditions and the following disclaimer.
+#
+# * Redistributions in binary form must reproduce the above copyright
+# notice, this list of conditions and the following disclaimer in the
+# documentation and/or other materials provided with the distribution.
+#
+# * Neither the name of Red Hat nor the names of its contributors may be
+# used to endorse or promote products derived from this software without
+# specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY RED HAT AND CONTRIBUTORS ''AS IS'' AND
+# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO,
+# THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
+# PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL RED HAT OR
+# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
+# USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
+# ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
+# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
+# OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+# SUCH DAMAGE.
+
+source ./functions.sh
+set -e
+set -x
+
+requires_filter cow
+requires_filter delay
+requires_nbdsh_uri
+
+sock=$(mktemp -u /tmp/nbdkit-test-sock.XXXXXX)
+files="$sock cow-on-read-caches.pid"
+rm -f $files
+cleanup_fn rm -f $files
+
+# Run nbdkit with the cow filter, cow-on-read and a read delay.
+start_nbdkit -P cow-on-read-caches.pid -U $sock \
+ --filter=cow --filter=delay \
+ memory 64K cow-on-read=true rdelay=10
+
+nbdsh --connect "nbd+unix://?socket=$sock" \
+ -c '
+from time import time
+
+# First read should suffer a penalty. Because we are reading
+# a single 64K block (same size as the COW block), we should
+# only suffer one penalty of approx. 10 seconds.
+st = time()
+zb = h.pread(65536, 0)
+et = time()
+el = et-st
+print("elapsed time: %g" % el)
+assert et-st >= 10
+assert zb == bytearray(65536)
+
+# Second read should not suffer a penalty.
+st = time()
+zb = h.pread(65536, 0)
+et = time()
+el = et-st
+print("elapsed time: %g" % el)
+assert el < 10
+assert zb == bytearray(65536)
+
+# Write something.
+buf = b"abcd" * 16384
+h.pwrite(buf, 0)
+
+# Reading back should be quick since it is stored in the overlay.
+st = time()
+buf2 = h.pread(65536, 0)
+et = time()
+el = et-st
+print("elapsed time: %g" % el)
+assert el < 10
+assert buf == buf2
+'
diff --git a/tests/test-cow-on-read.sh b/tests/test-cow-on-read.sh
new file mode 100755
index 00000000..4f58b33b
--- /dev/null
+++ b/tests/test-cow-on-read.sh
@@ -0,0 +1,59 @@
+#!/usr/bin/env bash
+# nbdkit
+# Copyright (C) 2018-2021 Red Hat Inc.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions are
+# met:
+#
+# * Redistributions of source code must retain the above copyright
+# notice, this list of conditions and the following disclaimer.
+#
+# * Redistributions in binary form must reproduce the above copyright
+# notice, this list of conditions and the following disclaimer in the
+# documentation and/or other materials provided with the distribution.
+#
+# * Neither the name of Red Hat nor the names of its contributors may be
+# used to endorse or promote products derived from this software without
+# specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY RED HAT AND CONTRIBUTORS ''AS IS'' AND
+# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO,
+# THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
+# PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL RED HAT OR
+# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
+# USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
+# ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
+# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
+# OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+# SUCH DAMAGE.
+
+source ./functions.sh
+set -e
+set -x
+
+requires_filter cow
+requires_nbdsh_uri
+
+sock=$(mktemp -u /tmp/nbdkit-test-sock.XXXXXX)
+files="$sock cow-on-read.pid"
+rm -f $files
+cleanup_fn rm -f $files
+
+# Run nbdkit with the cow filter and cow-on-read.
+start_nbdkit -P cow-on-read.pid -U $sock \
+ --filter=cow \
+ memory 128K cow-on-read=true
+
+nbdsh --connect "nbd+unix://?socket=$sock" \
+ -c '
+# Write some pattern data to the overlay and check it reads back OK.
+buf = b"abcd" * 16384
+h.pwrite(buf, 32768)
+zero = h.pread(32768, 0)
+assert zero == bytearray(32768)
+buf2 = h.pread(65536, 32768)
+assert buf == buf2
+'
--
2.31.1

View File

@ -0,0 +1,172 @@
From f1fa60f1388bf177ebd83625cc13a164936a187c Mon Sep 17 00:00:00 2001
From: "Richard W.M. Jones" <rjones@redhat.com>
Date: Fri, 30 Jul 2021 10:19:57 +0100
Subject: [PATCH] delay: Add delay-open and delay-close
Useful for simulating VDDK which has very slow connection.
(cherry picked from commit de8dcd3a34a38b088a0f9a6f8ca754702ad1f598)
---
filters/delay/delay.c | 60 ++++++++++++++++++++++++++-
filters/delay/nbdkit-delay-filter.pod | 27 ++++++++++--
2 files changed, 82 insertions(+), 5 deletions(-)
diff --git a/filters/delay/delay.c b/filters/delay/delay.c
index 7e7fe195..5bd21321 100644
--- a/filters/delay/delay.c
+++ b/filters/delay/delay.c
@@ -49,6 +49,8 @@ static int delay_trim_ms = 0; /* trim delay (milliseconds) */
static int delay_extents_ms = 0;/* extents delay (milliseconds) */
static int delay_cache_ms = 0; /* cache delay (milliseconds) */
static int delay_fast_zero = 1; /* whether delaying zero includes fast zero */
+static int delay_open_ms = 0; /* open delay (milliseconds) */
+static int delay_close_ms = 0; /* close delay (milliseconds) */
static int
parse_delay (const char *key, const char *value)
@@ -128,6 +130,18 @@ cache_delay (int *err)
return delay (delay_cache_ms, err);
}
+static int
+open_delay (int *err)
+{
+ return delay (delay_open_ms, err);
+}
+
+static int
+close_delay (int *err)
+{
+ return delay (delay_close_ms, err);
+}
+
/* Called for each key=value passed on the command line. */
static int
delay_config (nbdkit_next_config *next, nbdkit_backend *nxdata,
@@ -191,6 +205,18 @@ delay_config (nbdkit_next_config *next, nbdkit_backend *nxdata,
return -1;
return 0;
}
+ else if (strcmp (key, "delay-open") == 0) {
+ delay_open_ms = parse_delay (key, value);
+ if (delay_open_ms == -1)
+ return -1;
+ return 0;
+ }
+ else if (strcmp (key, "delay-close") == 0) {
+ delay_close_ms = parse_delay (key, value);
+ if (delay_close_ms == -1)
+ return -1;
+ return 0;
+ }
else
return next (nxdata, key, value);
}
@@ -204,7 +230,9 @@ delay_config (nbdkit_next_config *next, nbdkit_backend *nxdata,
"delay-extents=<NN>[ms] Extents delay in seconds/milliseconds.\n" \
"delay-cache=<NN>[ms] Cache delay in seconds/milliseconds.\n" \
"wdelay=<NN>[ms] Write, zero and trim delay in secs/msecs.\n" \
- "delay-fast-zero=<BOOL> Delay fast zero requests (default true).\n"
+ "delay-fast-zero=<BOOL> Delay fast zero requests (default true).\n" \
+ "delay-open=<NN>[ms] Open delay in seconds/milliseconds.\n" \
+ "delay-close=<NN>[ms] Close delay in seconds/milliseconds."
/* Override the plugin's .can_fast_zero if needed */
static int
@@ -217,6 +245,34 @@ delay_can_fast_zero (nbdkit_next *next,
return next->can_fast_zero (next);
}
+/* Open connection. */
+static void *
+delay_open (nbdkit_next_open *next, nbdkit_context *nxdata,
+ int readonly, const char *exportname, int is_tls)
+{
+ int err;
+
+ if (open_delay (&err) == -1) {
+ errno = err;
+ nbdkit_error ("delay: %m");
+ return NULL;
+ }
+
+ if (next (nxdata, readonly, exportname) == -1)
+ return NULL;
+
+ return NBDKIT_HANDLE_NOT_NEEDED;
+}
+
+/* Close connection. */
+static void
+delay_close (void *handle)
+{
+ int err;
+
+ close_delay (&err);
+}
+
/* Read data. */
static int
delay_pread (nbdkit_next *next,
@@ -294,6 +350,8 @@ static struct nbdkit_filter filter = {
.config = delay_config,
.config_help = delay_config_help,
.can_fast_zero = delay_can_fast_zero,
+ .open = delay_open,
+ .close = delay_close,
.pread = delay_pread,
.pwrite = delay_pwrite,
.zero = delay_zero,
diff --git a/filters/delay/nbdkit-delay-filter.pod b/filters/delay/nbdkit-delay-filter.pod
index d6961a9e..11ae544b 100644
--- a/filters/delay/nbdkit-delay-filter.pod
+++ b/filters/delay/nbdkit-delay-filter.pod
@@ -9,10 +9,15 @@ nbdkit-delay-filter - nbdkit delay filter
nbdkit --filter=delay plugin rdelay=NNms wdelay=NNms [plugin-args...]
nbdkit --filter=delay plugin [plugin-args ...]
- delay-read=(SECS|NNms) delay-write=(SECS|NNms)
- delay-zero=(SECS|NNms) delay-trim=(SECS|NNms)
- delay-extents=(SECS|NNms) delay-cache=(SECS|NNms)
+ delay-read=(SECS|NNms)
+ delay-write=(SECS|NNms)
+ delay-zero=(SECS|NNms)
+ delay-trim=(SECS|NNms)
+ delay-extents=(SECS|NNms)
+ delay-cache=(SECS|NNms)
delay-fast-zero=BOOL
+ delay-open=(SECS|NNms)
+ delay-close=(SECS|NNms)
=head1 DESCRIPTION
@@ -108,6 +113,20 @@ delay as any other zero request; but setting this parameter to false
instantly fails a fast zero response without waiting for or consulting
the plugin.
+=item B<delay-open=>SECS
+
+=item B<delay-open=>NNB<ms>
+
+=item B<delay-close=>SECS
+
+=item B<delay-close=>NNB<ms>
+
+(nbdkit E<ge> 1.28)
+
+Delay open and close operations by C<SECS> seconds or C<NN>
+milliseconds. Open corresponds to client connection. Close may not
+be visible to clients if they abruptly disconnect.
+
=back
=head1 FILES
@@ -140,4 +159,4 @@ Richard W.M. Jones
=head1 COPYRIGHT
-Copyright (C) 2018 Red Hat Inc.
+Copyright (C) 2018-2021 Red Hat Inc.
--
2.31.1

View File

@ -6,7 +6,7 @@ set -e
# directory. Use it like this: # directory. Use it like this:
# ./copy-patches.sh # ./copy-patches.sh
rhel_version=8.3 rhel_version=9.0
# Check we're in the right directory. # Check we're in the right directory.
if [ ! -f nbdkit.spec ]; then if [ ! -f nbdkit.spec ]; then

View File

@ -51,7 +51,7 @@ ExclusiveArch: x86_64
Name: nbdkit Name: nbdkit
Version: 1.26.2 Version: 1.26.2
Release: 1%{?dist}.1 Release: 2%{?dist}
Summary: NBD server Summary: NBD server
License: BSD License: BSD
@ -72,14 +72,31 @@ Source2: libguestfs.keyring
# Maintainer script which helps with handling patches. # Maintainer script which helps with handling patches.
Source3: copy-patches.sh Source3: copy-patches.sh
# Patches in upstream stable-1.26 branch. # Patches come from the upstream repository:
Patch0001: 0001-ocaml-Call-caml_shutdown-when-unloading-the-plugin.patch # https://gitlab.com/nbdkit/nbdkit/-/commits/rhel-9.0/
Patch0002: 0002-ocaml-Fix-valgrinding-by-only-ignoring-caml_stat_all.patch
Patch0003: 0003-ocaml-tests-Actually-call-.get_ready-method-in-test-.patch # Patches.
Patch0004: 0004-ocaml-Rearrange-the-callbacks.patch Patch0001: 0001-ocaml-Call-caml_shutdown-when-unloading-the-plugin.patch
Patch0005: 0005-ocaml-Fix-comment-on-plugin-.pread-field.patch Patch0002: 0002-ocaml-Fix-valgrinding-by-only-ignoring-caml_stat_all.patch
Patch0006: 0006-docs-Correct-selinux-label-example.patch Patch0003: 0003-ocaml-tests-Actually-call-.get_ready-method-in-test-.patch
Patch0007: 0007-cow-Fix-assert-failure-in-cow_extents.patch Patch0004: 0004-ocaml-Rearrange-the-callbacks.patch
Patch0005: 0005-ocaml-Fix-comment-on-plugin-.pread-field.patch
Patch0006: 0006-docs-Correct-selinux-label-example.patch
Patch0007: 0007-cow-Fix-assert-failure-in-cow_extents.patch
Patch0008: 0008-cache-Fix-misleading-LRU-diagram-and-comment.patch
Patch0009: 0009-docs-Improve-documentation-of-.can_cache-and-.cache-.patch
Patch0010: 0010-cow-Improve-documentation-of-cow-on-cache-option.patch
Patch0011: 0011-tests-cache-Simplify-test-cache-on-read.sh.patch
Patch0012: 0012-cache-Reduce-verbosity-of-debugging.patch
Patch0013: 0013-cache-cow-Add-blk_read_multiple-function.patch
Patch0014: 0014-cache-cow-Use-full-pread-pwrite-operations.patch
Patch0015: 0015-cache-Implement-cache-on-read-PATH.patch
Patch0016: 0016-cache-Add-cache-min-block-size-parameter.patch
Patch0017: 0017-cache-cow-Use-a-64K-block-size-by-default.patch
Patch0018: 0018-cache-Refactor-printing-state-into-new-function.patch
Patch0019: 0019-tests-cache-Test-cache-on-read-option-really-caches.patch
Patch0020: 0020-cow-Implement-cow-on-read.patch
Patch0021: 0021-delay-Add-delay-open-and-delay-close.patch
BuildRequires: make BuildRequires: make
%if 0%{patches_touch_autotools} %if 0%{patches_touch_autotools}
@ -1250,8 +1267,15 @@ export LIBGUESTFS_TRACE=1
%changelog %changelog
* Mon Jul 26 2021 Richard W.M. Jones <rjones@redhat.com> - 1.26.2-1.1 * Fri Jul 30 2021 Richard W.M. Jones <rjones@redhat.com> - 1.26.2-2
- Add patches from upstream stable-1.26 branch, fixing a virt-v2v crash - More efficient cache and cow filters.
- Add nbdkit-cow-filter cow-on-read option.
- Add nbdkit-cache-filter cache-on-read=/PATH.
- Add nbdkit-cache-filter cache-min-block-size option.
- Add nbdkit-delay-filter delay-open and delay-close options.
- Reduce verbosity of debugging from virt-v2v.
- Miscellaneous bugfixes
resolves: rhbz#1950632
* Mon Jul 05 2021 Richard W.M. Jones <rjones@redhat.com> - 1.26.2-1 * Mon Jul 05 2021 Richard W.M. Jones <rjones@redhat.com> - 1.26.2-1
- New upstream stable version 1.26.2. - New upstream stable version 1.26.2.