ceph/0036-18.2.1.release.patch

987 lines
43 KiB
Diff

diff -ur ceph-18.2.1~/debian/changelog ceph-18.2.1/debian/changelog
--- ceph-18.2.1~/debian/changelog 2023-11-14 14:36:19.000000000 -0500
+++ ceph-18.2.1/debian/changelog 2023-12-11 16:55:38.000000000 -0500
@@ -2,7 +2,7 @@
* New upstream release
- -- Ceph Release Team <ceph-maintainers@ceph.io> Tue, 14 Nov 2023 19:36:16 +0000
+ -- Ceph Release Team <ceph-maintainers@ceph.io> Mon, 11 Dec 2023 21:55:36 +0000
ceph (18.2.0-1) stable; urgency=medium
diff -ur ceph-18.2.1~/doc/architecture.rst ceph-18.2.1/doc/architecture.rst
--- ceph-18.2.1~/doc/architecture.rst 2023-11-14 14:36:19.000000000 -0500
+++ ceph-18.2.1/doc/architecture.rst 2023-12-11 16:55:38.000000000 -0500
@@ -30,6 +30,8 @@
- :term:`Ceph Manager`
- :term:`Ceph Metadata Server`
+.. _arch_monitor:
+
Ceph Monitors maintain the master copy of the cluster map, which they provide
to Ceph clients. Provisioning multiple monitors within the Ceph cluster ensures
availability in the event that one of the monitor daemons or its host fails.
diff -ur ceph-18.2.1~/doc/glossary.rst ceph-18.2.1/doc/glossary.rst
--- ceph-18.2.1~/doc/glossary.rst 2023-11-14 14:36:19.000000000 -0500
+++ ceph-18.2.1/doc/glossary.rst 2023-12-11 16:55:38.000000000 -0500
@@ -271,7 +271,7 @@
The Ceph manager software, which collects all the state from
the whole cluster in one place.
- MON
+ :ref:`MON<arch_monitor>`
The Ceph monitor software.
Node
@@ -337,6 +337,12 @@
Firefly (v. 0.80). See :ref:`Primary Affinity
<rados_ops_primary_affinity>`.
+ Quorum
+ Quorum is the state that exists when a majority of the
+ :ref:`Monitors<arch_monitor>` in the cluster are ``up``. A
+ minimum of three :ref:`Monitors<arch_monitor>` must exist in
+ the cluster in order for Quorum to be possible.
+
RADOS
**R**\eliable **A**\utonomic **D**\istributed **O**\bject
**S**\tore. RADOS is the object store that provides a scalable
diff -ur ceph-18.2.1~/doc/rados/troubleshooting/troubleshooting-mon.rst ceph-18.2.1/doc/rados/troubleshooting/troubleshooting-mon.rst
--- ceph-18.2.1~/doc/rados/troubleshooting/troubleshooting-mon.rst 2023-11-14 14:36:19.000000000 -0500
+++ ceph-18.2.1/doc/rados/troubleshooting/troubleshooting-mon.rst 2023-12-11 16:55:38.000000000 -0500
@@ -17,59 +17,66 @@
Initial Troubleshooting
=======================
-#. **Make sure that the monitors are running.**
+The first steps in the process of troubleshooting Ceph Monitors involve making
+sure that the Monitors are running and that they are able to communicate with
+the network and on the network. Follow the steps in this section to rule out
+the simplest causes of Monitor malfunction.
+
+#. **Make sure that the Monitors are running.**
+
+ Make sure that the Monitor (*mon*) daemon processes (``ceph-mon``) are
+ running. It might be the case that the mons have not be restarted after an
+ upgrade. Checking for this simple oversight can save hours of painstaking
+ troubleshooting.
+
+ It is also important to make sure that the manager daemons (``ceph-mgr``)
+ are running. Remember that typical cluster configurations provide one
+ Manager (``ceph-mgr``) for each Monitor (``ceph-mon``).
- First, make sure that the monitor (*mon*) daemon processes (``ceph-mon``)
- are running. Sometimes Ceph admins either forget to start the mons or
- forget to restart the mons after an upgrade. Checking for this simple
- oversight can save hours of painstaking troubleshooting. It is also
- important to make sure that the manager daemons (``ceph-mgr``) are running.
- Remember that typical cluster configurations provide one ``ceph-mgr`` for
- each ``ceph-mon``.
+ .. note:: In releases prior to v1.12.5, Rook will not run more than two
+ managers.
- .. note:: Rook will not run more than two managers.
+#. **Make sure that you can reach the Monitor nodes.**
-#. **Make sure that you can reach the monitor nodes.**
-
- In certain rare cases, there may be ``iptables`` rules that block access to
- monitor nodes or TCP ports. These rules might be left over from earlier
+ In certain rare cases, ``iptables`` rules might be blocking access to
+ Monitor nodes or TCP ports. These rules might be left over from earlier
stress testing or rule development. To check for the presence of such
- rules, SSH into the server and then try to connect to the monitor's ports
- (``tcp/3300`` and ``tcp/6789``) using ``telnet``, ``nc``, or a similar
- tool.
-
-#. **Make sure that the ``ceph status`` command runs and receives a reply from the cluster.**
-
- If the ``ceph status`` command does receive a reply from the cluster, then
- the cluster is up and running. The monitors will answer to a ``status``
- request only if there is a formed quorum. Confirm that one or more ``mgr``
- daemons are reported as running. Under ideal conditions, all ``mgr``
- daemons will be reported as running.
-
+ rules, SSH into each Monitor node and use ``telnet`` or ``nc`` or a similar
+ tool to attempt to connect to each of the other Monitor nodes on ports
+ ``tcp/3300`` and ``tcp/6789``.
+
+#. **Make sure that the "ceph status" command runs and receives a reply from the cluster.**
+
+ If the ``ceph status`` command receives a reply from the cluster, then the
+ cluster is up and running. Monitors answer to a ``status`` request only if
+ there is a formed quorum. Confirm that one or more ``mgr`` daemons are
+ reported as running. In a cluster with no deficiencies, ``ceph status``
+ will report that all ``mgr`` daemons are running.
If the ``ceph status`` command does not receive a reply from the cluster,
- then there are probably not enough monitors ``up`` to form a quorum. The
- ``ceph -s`` command with no further options specified connects to an
- arbitrarily selected monitor. In certain cases, however, it might be
- helpful to connect to a specific monitor (or to several specific monitors
+ then there are probably not enough Monitors ``up`` to form a quorum. If the
+ ``ceph -s`` command is run with no further options specified, it connects
+ to an arbitrarily selected Monitor. In certain cases, however, it might be
+ helpful to connect to a specific Monitor (or to several specific Monitors
in sequence) by adding the ``-m`` flag to the command: for example, ``ceph
status -m mymon1``.
#. **None of this worked. What now?**
If the above solutions have not resolved your problems, you might find it
- helpful to examine each individual monitor in turn. Whether or not a quorum
- has been formed, it is possible to contact each monitor individually and
+ helpful to examine each individual Monitor in turn. Even if no quorum has
+ been formed, it is possible to contact each Monitor individually and
request its status by using the ``ceph tell mon.ID mon_status`` command
- (here ``ID`` is the monitor's identifier).
+ (here ``ID`` is the Monitor's identifier).
- Run the ``ceph tell mon.ID mon_status`` command for each monitor in the
+ Run the ``ceph tell mon.ID mon_status`` command for each Monitor in the
cluster. For more on this command's output, see :ref:`Understanding
mon_status
<rados_troubleshoting_troubleshooting_mon_understanding_mon_status>`.
- There is also an alternative method: SSH into each monitor node and query
- the daemon's admin socket. See :ref:`Using the Monitor's Admin
+ There is also an alternative method for contacting each individual Monitor:
+ SSH into each Monitor node and query the daemon's admin socket. See
+ :ref:`Using the Monitor's Admin
Socket<rados_troubleshoting_troubleshooting_mon_using_admin_socket>`.
.. _rados_troubleshoting_troubleshooting_mon_using_admin_socket:
diff -ur ceph-18.2.1~/qa/tasks/cephfs/kernel_mount.py ceph-18.2.1/qa/tasks/cephfs/kernel_mount.py
--- ceph-18.2.1~/qa/tasks/cephfs/kernel_mount.py 2023-11-14 14:36:19.000000000 -0500
+++ ceph-18.2.1/qa/tasks/cephfs/kernel_mount.py 2023-12-11 16:55:38.000000000 -0500
@@ -68,7 +68,10 @@
self.enable_dynamic_debug()
self.ctx[f'kmount_count.{self.client_remote.hostname}'] = kmount_count + 1
- self.gather_mount_info()
+ try:
+ self.gather_mount_info()
+ except:
+ log.warn('failed to fetch mount info - tests depending on mount addr/inst may fail!')
def gather_mount_info(self):
self.id = self._get_global_id()
diff -ur ceph-18.2.1~/qa/tasks/cephfs/mount.py ceph-18.2.1/qa/tasks/cephfs/mount.py
--- ceph-18.2.1~/qa/tasks/cephfs/mount.py 2023-11-14 14:36:19.000000000 -0500
+++ ceph-18.2.1/qa/tasks/cephfs/mount.py 2023-12-11 16:55:38.000000000 -0500
@@ -186,6 +186,12 @@
sudo=True).decode())
def is_blocked(self):
+ if not self.addr:
+ # can't infer if our addr is blocklisted - let the caller try to
+ # umount without lazy/force. If the client was blocklisted, then
+ # the umount would be stuck and the test would fail on timeout.
+ # happens only with Ubuntu 20.04 (missing kclient patches :/).
+ return False
self.fs = Filesystem(self.ctx, name=self.cephfs_name)
try:
diff -ur ceph-18.2.1~/src/ceph-volume/ceph_volume/devices/raw/list.py ceph-18.2.1/src/ceph-volume/ceph_volume/devices/raw/list.py
--- ceph-18.2.1~/src/ceph-volume/ceph_volume/devices/raw/list.py 2023-11-14 14:36:19.000000000 -0500
+++ ceph-18.2.1/src/ceph-volume/ceph_volume/devices/raw/list.py 2023-12-11 16:55:38.000000000 -0500
@@ -5,7 +5,7 @@
from textwrap import dedent
from ceph_volume import decorators, process
from ceph_volume.util import disk
-
+from typing import Any, Dict, List
logger = logging.getLogger(__name__)
@@ -66,50 +66,57 @@
def __init__(self, argv):
self.argv = argv
+ def is_atari_partitions(self, _lsblk: Dict[str, Any]) -> bool:
+ dev = _lsblk['NAME']
+ if _lsblk.get('PKNAME'):
+ parent = _lsblk['PKNAME']
+ try:
+ if disk.has_bluestore_label(parent):
+ logger.warning(('ignoring child device {} whose parent {} is a BlueStore OSD.'.format(dev, parent),
+ 'device is likely a phantom Atari partition. device info: {}'.format(_lsblk)))
+ return True
+ except OSError as e:
+ logger.error(('ignoring child device {} to avoid reporting invalid BlueStore data from phantom Atari partitions.'.format(dev),
+ 'failed to determine if parent device {} is BlueStore. err: {}'.format(parent, e)))
+ return True
+ return False
+
+ def exclude_atari_partitions(self, _lsblk_all: Dict[str, Any]) -> List[Dict[str, Any]]:
+ return [_lsblk for _lsblk in _lsblk_all if not self.is_atari_partitions(_lsblk)]
+
def generate(self, devs=None):
logger.debug('Listing block devices via lsblk...')
- info_devices = disk.lsblk_all(abspath=True)
+ info_devices = []
if not devs or not any(devs):
# If no devs are given initially, we want to list ALL devices including children and
# parents. Parent disks with child partitions may be the appropriate device to return if
# the parent disk has a bluestore header, but children may be the most appropriate
# devices to return if the parent disk does not have a bluestore header.
+ info_devices = disk.lsblk_all(abspath=True)
devs = [device['NAME'] for device in info_devices if device.get('NAME',)]
+ else:
+ for dev in devs:
+ info_devices.append(disk.lsblk(dev, abspath=True))
+
+ # Linux kernels built with CONFIG_ATARI_PARTITION enabled can falsely interpret
+ # bluestore's on-disk format as an Atari partition table. These false Atari partitions
+ # can be interpreted as real OSDs if a bluestore OSD was previously created on the false
+ # partition. See https://tracker.ceph.com/issues/52060 for more info. If a device has a
+ # parent, it is a child. If the parent is a valid bluestore OSD, the child will only
+ # exist if it is a phantom Atari partition, and the child should be ignored. If the
+ # parent isn't bluestore, then the child could be a valid bluestore OSD. If we fail to
+ # determine whether a parent is bluestore, we should err on the side of not reporting
+ # the child so as not to give a false negative.
+ info_devices = self.exclude_atari_partitions(info_devices)
result = {}
logger.debug('inspecting devices: {}'.format(devs))
- for dev in devs:
- # Linux kernels built with CONFIG_ATARI_PARTITION enabled can falsely interpret
- # bluestore's on-disk format as an Atari partition table. These false Atari partitions
- # can be interpreted as real OSDs if a bluestore OSD was previously created on the false
- # partition. See https://tracker.ceph.com/issues/52060 for more info. If a device has a
- # parent, it is a child. If the parent is a valid bluestore OSD, the child will only
- # exist if it is a phantom Atari partition, and the child should be ignored. If the
- # parent isn't bluestore, then the child could be a valid bluestore OSD. If we fail to
- # determine whether a parent is bluestore, we should err on the side of not reporting
- # the child so as not to give a false negative.
- matched_info_devices = [info for info in info_devices if info['NAME'] == dev]
- if not matched_info_devices:
- logger.warning('device {} does not exist'.format(dev))
- continue
- info_device = matched_info_devices[0]
- if 'PKNAME' in info_device and info_device['PKNAME'] != "":
- parent = info_device['PKNAME']
- try:
- if disk.has_bluestore_label(parent):
- logger.warning(('ignoring child device {} whose parent {} is a BlueStore OSD.'.format(dev, parent),
- 'device is likely a phantom Atari partition. device info: {}'.format(info_device)))
- continue
- except OSError as e:
- logger.error(('ignoring child device {} to avoid reporting invalid BlueStore data from phantom Atari partitions.'.format(dev),
- 'failed to determine if parent device {} is BlueStore. err: {}'.format(parent, e)))
- continue
-
- bs_info = _get_bluestore_info(dev)
+ for info_device in info_devices:
+ bs_info = _get_bluestore_info(info_device['NAME'])
if bs_info is None:
# None is also returned in the rare event that there is an issue reading info from
# a BlueStore disk, so be sure to log our assumption that it isn't bluestore
- logger.info('device {} does not have BlueStore information'.format(dev))
+ logger.info('device {} does not have BlueStore information'.format(info_device['NAME']))
continue
uuid = bs_info['osd_uuid']
if uuid not in result:
diff -ur ceph-18.2.1~/src/ceph-volume/ceph_volume/tests/util/test_disk.py ceph-18.2.1/src/ceph-volume/ceph_volume/tests/util/test_disk.py
--- ceph-18.2.1~/src/ceph-volume/ceph_volume/tests/util/test_disk.py 2023-11-14 14:36:19.000000000 -0500
+++ ceph-18.2.1/src/ceph-volume/ceph_volume/tests/util/test_disk.py 2023-12-11 16:55:38.000000000 -0500
@@ -1,7 +1,37 @@
import os
import pytest
from ceph_volume.util import disk
-from mock.mock import patch
+from mock.mock import patch, MagicMock
+
+
+class TestFunctions:
+ @patch('ceph_volume.util.disk.os.path.exists', MagicMock(return_value=False))
+ def test_is_device_path_does_not_exist(self):
+ assert not disk.is_device('/dev/foo')
+
+ @patch('ceph_volume.util.disk.os.path.exists', MagicMock(return_value=True))
+ def test_is_device_dev_doesnt_startswith_dev(self):
+ assert not disk.is_device('/foo')
+
+ @patch('ceph_volume.util.disk.allow_loop_devices', MagicMock(return_value=False))
+ @patch('ceph_volume.util.disk.os.path.exists', MagicMock(return_value=True))
+ def test_is_device_loop_not_allowed(self):
+ assert not disk.is_device('/dev/loop123')
+
+ @patch('ceph_volume.util.disk.lsblk', MagicMock(return_value={'NAME': 'foo', 'TYPE': 'disk'}))
+ @patch('ceph_volume.util.disk.os.path.exists', MagicMock(return_value=True))
+ def test_is_device_type_disk(self):
+ assert disk.is_device('/dev/foo')
+
+ @patch('ceph_volume.util.disk.lsblk', MagicMock(return_value={'NAME': 'foo', 'TYPE': 'mpath'}))
+ @patch('ceph_volume.util.disk.os.path.exists', MagicMock(return_value=True))
+ def test_is_device_type_mpath(self):
+ assert disk.is_device('/dev/foo')
+
+ @patch('ceph_volume.util.disk.lsblk', MagicMock(return_value={'NAME': 'foo1', 'TYPE': 'part'}))
+ @patch('ceph_volume.util.disk.os.path.exists', MagicMock(return_value=True))
+ def test_is_device_type_part(self):
+ assert not disk.is_device('/dev/foo1')
class TestLsblkParser(object):
diff -ur ceph-18.2.1~/src/ceph-volume/ceph_volume/util/disk.py ceph-18.2.1/src/ceph-volume/ceph_volume/util/disk.py
--- ceph-18.2.1~/src/ceph-volume/ceph_volume/util/disk.py 2023-11-14 14:36:19.000000000 -0500
+++ ceph-18.2.1/src/ceph-volume/ceph_volume/util/disk.py 2023-12-11 16:55:38.000000000 -0500
@@ -359,6 +359,10 @@
if not allow_loop_devices():
return False
+ TYPE = lsblk(dev).get('TYPE')
+ if TYPE:
+ return TYPE in ['disk', 'mpath']
+
# fallback to stat
return _stat_is_device(os.lstat(dev).st_mode)
diff -ur ceph-18.2.1~/src/.git_version ceph-18.2.1/src/.git_version
--- ceph-18.2.1~/src/.git_version 2023-11-14 14:37:51.000000000 -0500
+++ ceph-18.2.1/src/.git_version 2023-12-11 16:57:17.000000000 -0500
@@ -1,2 +1,2 @@
-e3fce6809130d78ac0058fc87e537ecd926cd213
+7fe91d5d5842e04be3b4f514d6dd990c54b29c76
18.2.1
diff -ur ceph-18.2.1~/src/messages/MClientRequest.h ceph-18.2.1/src/messages/MClientRequest.h
--- ceph-18.2.1~/src/messages/MClientRequest.h 2023-11-14 14:36:19.000000000 -0500
+++ ceph-18.2.1/src/messages/MClientRequest.h 2023-12-11 16:55:38.000000000 -0500
@@ -234,6 +234,12 @@
copy_from_legacy_head(&head, &old_mds_head);
head.version = 0;
+ head.ext_num_retry = head.num_retry;
+ head.ext_num_fwd = head.num_fwd;
+
+ head.owner_uid = head.caller_uid;
+ head.owner_gid = head.caller_gid;
+
/* Can't set the btime from legacy struct */
if (head.op == CEPH_MDS_OP_SETATTR) {
int localmask = head.args.setattr.mask;
diff -ur ceph-18.2.1~/src/os/bluestore/AvlAllocator.cc ceph-18.2.1/src/os/bluestore/AvlAllocator.cc
--- ceph-18.2.1~/src/os/bluestore/AvlAllocator.cc 2023-11-14 14:36:19.000000000 -0500
+++ ceph-18.2.1/src/os/bluestore/AvlAllocator.cc 2023-12-11 16:55:38.000000000 -0500
@@ -39,7 +39,7 @@
uint64_t search_bytes = 0;
auto rs_start = range_tree.lower_bound(range_t{*cursor, size}, compare);
for (auto rs = rs_start; rs != range_tree.end(); ++rs) {
- uint64_t offset = p2roundup(rs->start, align);
+ uint64_t offset = rs->start;
*cursor = offset + size;
if (offset + size <= rs->end) {
return offset;
@@ -59,7 +59,7 @@
}
// If we reached end, start from beginning till cursor.
for (auto rs = range_tree.begin(); rs != rs_start; ++rs) {
- uint64_t offset = p2roundup(rs->start, align);
+ uint64_t offset = rs->start;
*cursor = offset + size;
if (offset + size <= rs->end) {
return offset;
@@ -82,7 +82,7 @@
const auto compare = range_size_tree.key_comp();
auto rs_start = range_size_tree.lower_bound(range_t{0, size}, compare);
for (auto rs = rs_start; rs != range_size_tree.end(); ++rs) {
- uint64_t offset = p2roundup(rs->start, align);
+ uint64_t offset = rs->start;
if (offset + size <= rs->end) {
return offset;
}
diff -ur ceph-18.2.1~/src/os/bluestore/BlueFS.cc ceph-18.2.1/src/os/bluestore/BlueFS.cc
--- ceph-18.2.1~/src/os/bluestore/BlueFS.cc 2023-11-14 14:36:19.000000000 -0500
+++ ceph-18.2.1/src/os/bluestore/BlueFS.cc 2023-12-11 16:55:38.000000000 -0500
@@ -658,16 +658,24 @@
}
logger->set(l_bluefs_wal_alloc_unit, wal_alloc_size);
+
+ uint64_t shared_alloc_size = cct->_conf->bluefs_shared_alloc_size;
+ if (shared_alloc && shared_alloc->a) {
+ uint64_t unit = shared_alloc->a->get_block_size();
+ shared_alloc_size = std::max(
+ unit,
+ shared_alloc_size);
+ ceph_assert(0 == p2phase(shared_alloc_size, unit));
+ }
if (bdev[BDEV_SLOW]) {
alloc_size[BDEV_DB] = cct->_conf->bluefs_alloc_size;
- alloc_size[BDEV_SLOW] = cct->_conf->bluefs_shared_alloc_size;
- logger->set(l_bluefs_db_alloc_unit, cct->_conf->bluefs_alloc_size);
- logger->set(l_bluefs_main_alloc_unit, cct->_conf->bluefs_shared_alloc_size);
+ alloc_size[BDEV_SLOW] = shared_alloc_size;
} else {
- alloc_size[BDEV_DB] = cct->_conf->bluefs_shared_alloc_size;
- logger->set(l_bluefs_main_alloc_unit, 0);
- logger->set(l_bluefs_db_alloc_unit, cct->_conf->bluefs_shared_alloc_size);
+ alloc_size[BDEV_DB] = shared_alloc_size;
+ alloc_size[BDEV_SLOW] = 0;
}
+ logger->set(l_bluefs_db_alloc_unit, alloc_size[BDEV_DB]);
+ logger->set(l_bluefs_main_alloc_unit, alloc_size[BDEV_SLOW]);
// new wal and db devices are never shared
if (bdev[BDEV_NEWWAL]) {
alloc_size[BDEV_NEWWAL] = cct->_conf->bluefs_alloc_size;
@@ -681,13 +689,13 @@
continue;
}
ceph_assert(bdev[id]->get_size());
- ceph_assert(alloc_size[id]);
if (is_shared_alloc(id)) {
dout(1) << __func__ << " shared, id " << id << std::hex
<< ", capacity 0x" << bdev[id]->get_size()
<< ", block size 0x" << alloc_size[id]
<< std::dec << dendl;
} else {
+ ceph_assert(alloc_size[id]);
std::string name = "bluefs-";
const char* devnames[] = { "wal","db","slow" };
if (id <= BDEV_SLOW)
diff -ur ceph-18.2.1~/src/os/bluestore/BtreeAllocator.cc ceph-18.2.1/src/os/bluestore/BtreeAllocator.cc
--- ceph-18.2.1~/src/os/bluestore/BtreeAllocator.cc 2023-11-14 14:36:19.000000000 -0500
+++ ceph-18.2.1/src/os/bluestore/BtreeAllocator.cc 2023-12-11 16:55:38.000000000 -0500
@@ -25,7 +25,7 @@
{
auto rs_start = range_tree.lower_bound(*cursor);
for (auto rs = rs_start; rs != range_tree.end(); ++rs) {
- uint64_t offset = p2roundup(rs->first, align);
+ uint64_t offset = rs->first;
if (offset + size <= rs->second) {
*cursor = offset + size;
return offset;
@@ -37,7 +37,7 @@
}
// If we reached end, start from beginning till cursor.
for (auto rs = range_tree.begin(); rs != rs_start; ++rs) {
- uint64_t offset = p2roundup(rs->first, align);
+ uint64_t offset = rs->first;
if (offset + size <= rs->second) {
*cursor = offset + size;
return offset;
@@ -53,7 +53,7 @@
// the needs
auto rs_start = range_size_tree.lower_bound(range_value_t{0,size});
for (auto rs = rs_start; rs != range_size_tree.end(); ++rs) {
- uint64_t offset = p2roundup(rs->start, align);
+ uint64_t offset = rs->start;
if (offset + size <= rs->start + rs->size) {
return offset;
}
diff -ur ceph-18.2.1~/src/os/bluestore/fastbmap_allocator_impl.cc ceph-18.2.1/src/os/bluestore/fastbmap_allocator_impl.cc
--- ceph-18.2.1~/src/os/bluestore/fastbmap_allocator_impl.cc 2023-11-14 14:36:19.000000000 -0500
+++ ceph-18.2.1/src/os/bluestore/fastbmap_allocator_impl.cc 2023-12-11 16:55:38.000000000 -0500
@@ -17,19 +17,9 @@
inline interval_t _align2units(uint64_t offset, uint64_t len, uint64_t min_length)
{
- interval_t res;
- if (len >= min_length) {
- res.offset = p2roundup(offset, min_length);
- auto delta_off = res.offset - offset;
- if (len > delta_off) {
- res.length = len - delta_off;
- res.length = p2align<uint64_t>(res.length, min_length);
- if (res.length) {
- return res;
- }
- }
- }
- return interval_t();
+ return len >= min_length ?
+ interval_t(offset, p2align<uint64_t>(len, min_length)) :
+ interval_t();
}
interval_t AllocatorLevel01Loose::_get_longest_from_l0(uint64_t pos0,
diff -ur ceph-18.2.1~/src/os/bluestore/StupidAllocator.cc ceph-18.2.1/src/os/bluestore/StupidAllocator.cc
--- ceph-18.2.1~/src/os/bluestore/StupidAllocator.cc 2023-11-14 14:36:19.000000000 -0500
+++ ceph-18.2.1/src/os/bluestore/StupidAllocator.cc 2023-12-11 16:55:38.000000000 -0500
@@ -52,20 +52,6 @@
}
}
-/// return the effective length of the extent if we align to alloc_unit
-uint64_t StupidAllocator::_aligned_len(
- StupidAllocator::interval_set_t::iterator p,
- uint64_t alloc_unit)
-{
- uint64_t skew = p.get_start() % alloc_unit;
- if (skew)
- skew = alloc_unit - skew;
- if (skew > p.get_len())
- return 0;
- else
- return p.get_len() - skew;
-}
-
int64_t StupidAllocator::allocate_int(
uint64_t want_size, uint64_t alloc_unit, int64_t hint,
uint64_t *offset, uint32_t *length)
@@ -89,7 +75,7 @@
for (bin = orig_bin; bin < (int)free.size(); ++bin) {
p = free[bin].lower_bound(hint);
while (p != free[bin].end()) {
- if (_aligned_len(p, alloc_unit) >= want_size) {
+ if (p.get_len() >= want_size) {
goto found;
}
++p;
@@ -102,7 +88,7 @@
p = free[bin].begin();
auto end = hint ? free[bin].lower_bound(hint) : free[bin].end();
while (p != end) {
- if (_aligned_len(p, alloc_unit) >= want_size) {
+ if (p.get_len() >= want_size) {
goto found;
}
++p;
@@ -114,7 +100,7 @@
for (bin = orig_bin; bin >= 0; --bin) {
p = free[bin].lower_bound(hint);
while (p != free[bin].end()) {
- if (_aligned_len(p, alloc_unit) >= alloc_unit) {
+ if (p.get_len() >= alloc_unit) {
goto found;
}
++p;
@@ -127,7 +113,7 @@
p = free[bin].begin();
auto end = hint ? free[bin].lower_bound(hint) : free[bin].end();
while (p != end) {
- if (_aligned_len(p, alloc_unit) >= alloc_unit) {
+ if (p.get_len() >= alloc_unit) {
goto found;
}
++p;
@@ -137,11 +123,9 @@
return -ENOSPC;
found:
- uint64_t skew = p.get_start() % alloc_unit;
- if (skew)
- skew = alloc_unit - skew;
- *offset = p.get_start() + skew;
- *length = std::min(std::max(alloc_unit, want_size), p2align((p.get_len() - skew), alloc_unit));
+ *offset = p.get_start();
+ *length = std::min(std::max(alloc_unit, want_size), p2align(p.get_len(), alloc_unit));
+
if (cct->_conf->bluestore_debug_small_allocations) {
uint64_t max =
alloc_unit * (rand() % cct->_conf->bluestore_debug_small_allocations);
@@ -158,7 +142,7 @@
free[bin].erase(*offset, *length);
uint64_t off, len;
- if (*offset && free[bin].contains(*offset - skew - 1, &off, &len)) {
+ if (*offset && free[bin].contains(*offset - 1, &off, &len)) {
int newbin = _choose_bin(len);
if (newbin != bin) {
ldout(cct, 30) << __func__ << " demoting 0x" << std::hex << off << "~" << len
diff -ur ceph-18.2.1~/src/os/bluestore/StupidAllocator.h ceph-18.2.1/src/os/bluestore/StupidAllocator.h
--- ceph-18.2.1~/src/os/bluestore/StupidAllocator.h 2023-11-14 14:36:19.000000000 -0500
+++ ceph-18.2.1/src/os/bluestore/StupidAllocator.h 2023-12-11 16:55:38.000000000 -0500
@@ -31,10 +31,6 @@
unsigned _choose_bin(uint64_t len);
void _insert_free(uint64_t offset, uint64_t len);
- uint64_t _aligned_len(
- interval_set_t::iterator p,
- uint64_t alloc_unit);
-
public:
StupidAllocator(CephContext* cct,
int64_t size,
diff -ur ceph-18.2.1~/src/test/objectstore/Allocator_test.cc ceph-18.2.1/src/test/objectstore/Allocator_test.cc
--- ceph-18.2.1~/src/test/objectstore/Allocator_test.cc 2023-11-14 14:36:19.000000000 -0500
+++ ceph-18.2.1/src/test/objectstore/Allocator_test.cc 2023-12-11 16:55:38.000000000 -0500
@@ -516,8 +516,7 @@
PExtentVector extents;
auto need = 0x3f980000;
auto got = alloc->allocate(need, 0x10000, 0, (int64_t)0, &extents);
- EXPECT_GT(got, 0);
- EXPECT_EQ(got, 0x630000);
+ EXPECT_GE(got, 0x630000);
}
TEST_P(AllocTest, test_alloc_50656_best_fit)
diff -ur ceph-18.2.1~/src/test/objectstore/fastbmap_allocator_test.cc ceph-18.2.1/src/test/objectstore/fastbmap_allocator_test.cc
--- ceph-18.2.1~/src/test/objectstore/fastbmap_allocator_test.cc 2023-11-14 14:36:19.000000000 -0500
+++ ceph-18.2.1/src/test/objectstore/fastbmap_allocator_test.cc 2023-12-11 16:55:38.000000000 -0500
@@ -625,6 +625,8 @@
ASSERT_EQ(bins_overall[cbits(num_chunks / 2) - 1], 1u);
{
+ // Original free space disposition (start chunk, count):
+ // <NC/2, NC/2>
size_t to_release = 2 * _1m + 0x1000;
// release 2M + 4K at the beginning
interval_vector_t r;
@@ -637,6 +639,8 @@
ASSERT_EQ(bins_overall[cbits(num_chunks / 2) - 1], 1u);
}
{
+ // Original free space disposition (start chunk, count):
+ // <0, 513>, <NC / 2, NC / 2>
// allocate 4K within the deallocated range
uint64_t allocated4 = 0;
interval_vector_t a4;
@@ -652,79 +656,91 @@
ASSERT_EQ(bins_overall[cbits(num_chunks / 2) - 1], 1u);
}
{
- // allocate 1M - should go to the second 1M chunk
+ // Original free space disposition (start chunk, count):
+ // <1, 512>, <NC / 2, NC / 2>
+ // allocate 1M - should go to offset 4096
uint64_t allocated4 = 0;
interval_vector_t a4;
al2.allocate_l2(_1m, _1m, &allocated4, &a4);
ASSERT_EQ(a4.size(), 1u);
ASSERT_EQ(allocated4, _1m);
- ASSERT_EQ(a4[0].offset, _1m);
+ ASSERT_EQ(a4[0].offset, 4096);
ASSERT_EQ(a4[0].length, _1m);
bins_overall.clear();
al2.collect_stats(bins_overall);
- ASSERT_EQ(bins_overall.size(), 3u);
- ASSERT_EQ(bins_overall[0], 1u);
- ASSERT_EQ(bins_overall[cbits((_1m - 0x1000) / 0x1000) - 1], 1u);
+ ASSERT_EQ(bins_overall.size(), 2u);
+ ASSERT_EQ(bins_overall[cbits(_1m / 0x1000) - 1], 1u);
ASSERT_EQ(bins_overall[cbits(num_chunks / 2) - 1], 1u);
}
{
+ // Original free space disposition (start chunk, count):
+ // <257, 256>, <NC / 2, NC / 2>
// and allocate yet another 8K within the deallocated range
uint64_t allocated4 = 0;
interval_vector_t a4;
al2.allocate_l2(0x2000, 0x1000, &allocated4, &a4);
ASSERT_EQ(a4.size(), 1u);
ASSERT_EQ(allocated4, 0x2000u);
- ASSERT_EQ(a4[0].offset, 0x1000u);
+ ASSERT_EQ(a4[0].offset, _1m + 0x1000u);
ASSERT_EQ(a4[0].length, 0x2000u);
bins_overall.clear();
al2.collect_stats(bins_overall);
- ASSERT_EQ(bins_overall[0], 1u);
- ASSERT_EQ(bins_overall[cbits((_1m - 0x3000) / 0x1000) - 1], 1u);
+ ASSERT_EQ(bins_overall.size(), 2u);
+ ASSERT_EQ(bins_overall[cbits((_1m - 0x2000) / 0x1000) - 1], 1u);
ASSERT_EQ(bins_overall[cbits(num_chunks / 2) - 1], 1u);
}
{
- // release just allocated 1M
+ // Original free space disposition (start chunk, count):
+ // <259, 254>, <NC / 2, NC / 2>
+ // release 4K~1M
interval_vector_t r;
- r.emplace_back(_1m, _1m);
+ r.emplace_back(0x1000, _1m);
al2.free_l2(r);
bins_overall.clear();
al2.collect_stats(bins_overall);
- ASSERT_EQ(bins_overall.size(), 2u);
- ASSERT_EQ(bins_overall[cbits((2 * _1m - 0x3000) / 0x1000) - 1], 1u);
+ ASSERT_EQ(bins_overall.size(), 3u);
+ //ASSERT_EQ(bins_overall[cbits((2 * _1m - 0x3000) / 0x1000) - 1], 1u);
+ ASSERT_EQ(bins_overall[cbits(_1m / 0x1000) - 1], 1u);
+ ASSERT_EQ(bins_overall[cbits((_1m - 0x2000) / 0x1000) - 1], 1u);
ASSERT_EQ(bins_overall[cbits(num_chunks / 2) - 1], 1u);
}
{
- // allocate 3M - should go to the second 1M chunk and @capacity/2
+ // Original free space disposition (start chunk, count):
+ // <1, 257>, <259, 254>, <NC / 2, NC / 2>
+ // allocate 3M - should go to the first 1M chunk and @capacity/2
uint64_t allocated4 = 0;
interval_vector_t a4;
al2.allocate_l2(3 * _1m, _1m, &allocated4, &a4);
ASSERT_EQ(a4.size(), 2u);
ASSERT_EQ(allocated4, 3 * _1m);
- ASSERT_EQ(a4[0].offset, _1m);
+ ASSERT_EQ(a4[0].offset, 0x1000);
ASSERT_EQ(a4[0].length, _1m);
ASSERT_EQ(a4[1].offset, capacity / 2);
ASSERT_EQ(a4[1].length, 2 * _1m);
bins_overall.clear();
al2.collect_stats(bins_overall);
- ASSERT_EQ(bins_overall.size(), 3u);
- ASSERT_EQ(bins_overall[0], 1u);
- ASSERT_EQ(bins_overall[cbits((_1m - 0x3000) / 0x1000) - 1], 1u);
+ ASSERT_EQ(bins_overall.size(), 2u);
+ ASSERT_EQ(bins_overall[cbits((_1m - 0x2000) / 0x1000) - 1], 1u);
ASSERT_EQ(bins_overall[cbits((num_chunks - 512) / 2) - 1], 1u);
}
{
- // release allocated 1M in the second meg chunk except
+ // Original free space disposition (start chunk, count):
+ // <259, 254>, <NC / 2 - 512, NC / 2 - 512>
+ // release allocated 1M in the first meg chunk except
// the first 4K chunk
interval_vector_t r;
- r.emplace_back(_1m + 0x1000, _1m);
+ r.emplace_back(0x1000, _1m);
al2.free_l2(r);
bins_overall.clear();
al2.collect_stats(bins_overall);
ASSERT_EQ(bins_overall.size(), 3u);
ASSERT_EQ(bins_overall[cbits(_1m / 0x1000) - 1], 1u);
- ASSERT_EQ(bins_overall[cbits((_1m - 0x3000) / 0x1000) - 1], 1u);
+ ASSERT_EQ(bins_overall[cbits((_1m - 0x2000) / 0x1000) - 1], 1u);
ASSERT_EQ(bins_overall[cbits((num_chunks - 512) / 2) - 1], 1u);
}
{
+ // Original free space disposition (start chunk, count):
+ // <1, 256>, <259, 254>, <NC / 2 - 512, NC / 2 - 512>
// release 2M @(capacity / 2)
interval_vector_t r;
r.emplace_back(capacity / 2, 2 * _1m);
@@ -733,10 +749,12 @@
al2.collect_stats(bins_overall);
ASSERT_EQ(bins_overall.size(), 3u);
ASSERT_EQ(bins_overall[cbits(_1m / 0x1000) - 1], 1u);
- ASSERT_EQ(bins_overall[cbits((_1m - 0x3000) / 0x1000) - 1], 1u);
+ ASSERT_EQ(bins_overall[cbits((_1m - 0x2000) / 0x1000) - 1], 1u);
ASSERT_EQ(bins_overall[cbits((num_chunks) / 2) - 1], 1u);
}
{
+ // Original free space disposition (start chunk, count):
+ // <1, 256>, <259, 254>, <NC / 2, NC / 2>
// allocate 4x512K - should go to the second halves of
// the first and second 1M chunks and @(capacity / 2)
uint64_t allocated4 = 0;
@@ -744,51 +762,54 @@
al2.allocate_l2(2 * _1m, _1m / 2, &allocated4, &a4);
ASSERT_EQ(a4.size(), 3u);
ASSERT_EQ(allocated4, 2 * _1m);
- ASSERT_EQ(a4[0].offset, _1m / 2);
+ ASSERT_EQ(a4[1].offset, 0x1000);
+ ASSERT_EQ(a4[1].length, _1m);
+ ASSERT_EQ(a4[0].offset, _1m + 0x3000);
ASSERT_EQ(a4[0].length, _1m / 2);
- ASSERT_EQ(a4[1].offset, _1m + _1m / 2);
- ASSERT_EQ(a4[1].length, _1m / 2);
ASSERT_EQ(a4[2].offset, capacity / 2);
- ASSERT_EQ(a4[2].length, _1m);
+ ASSERT_EQ(a4[2].length, _1m / 2);
bins_overall.clear();
al2.collect_stats(bins_overall);
- ASSERT_EQ(bins_overall.size(), 3u);
- ASSERT_EQ(bins_overall[0], 1u);
- // below we have 512K - 4K & 512K - 12K chunks which both fit into
- // the same bin = 6
- ASSERT_EQ(bins_overall[6], 2u);
+ ASSERT_EQ(bins_overall.size(), 2u);
+ ASSERT_EQ(bins_overall[cbits((_1m - 0x2000 - 0x80000) / 0x1000) - 1], 1u);
ASSERT_EQ(bins_overall[cbits((num_chunks - 256) / 2) - 1], 1u);
}
{
- // cleanup first 2M except except the last 4K chunk
+ // Original free space disposition (start chunk, count):
+ // <387, 126>, <NC / 2 + 128, NC / 2 - 128>
+ // cleanup first 1536K except the last 4K chunk
interval_vector_t r;
- r.emplace_back(0, 2 * _1m - 0x1000);
+ r.emplace_back(0, _1m + _1m / 2 - 0x1000);
al2.free_l2(r);
bins_overall.clear();
al2.collect_stats(bins_overall);
ASSERT_EQ(bins_overall.size(), 3u);
- ASSERT_EQ(bins_overall[0], 1u);
- ASSERT_EQ(bins_overall[cbits((_2m - 0x1000) / 0x1000) - 1], 1u);
+ ASSERT_EQ(bins_overall[cbits((_1m + _1m / 2 - 0x1000) / 0x1000) - 1], 1u);
+ ASSERT_EQ(bins_overall[cbits((_1m - 0x2000 - 0x80000) / 0x1000) - 1], 1u);
ASSERT_EQ(bins_overall[cbits((num_chunks - 256) / 2) - 1], 1u);
}
{
- // release 2M @(capacity / 2)
+ // Original free space disposition (start chunk, count):
+ // <0, 383> <387, 126>, <NC / 2 + 128, NC / 2 - 128>
+ // release 512K @(capacity / 2)
interval_vector_t r;
- r.emplace_back(capacity / 2, 2 * _1m);
+ r.emplace_back(capacity / 2, _1m / 2);
al2.free_l2(r);
bins_overall.clear();
al2.collect_stats(bins_overall);
ASSERT_EQ(bins_overall.size(), 3u);
- ASSERT_EQ(bins_overall[0], 1u);
- ASSERT_EQ(bins_overall[cbits((_2m - 0x1000) / 0x1000) - 1], 1u);
+ ASSERT_EQ(bins_overall[cbits((_1m + _1m / 2 - 0x1000) / 0x1000) - 1], 1u);
+ ASSERT_EQ(bins_overall[cbits((_1m - 0x2000 - 0x80000) / 0x1000) - 1], 1u);
ASSERT_EQ(bins_overall[cbits(num_chunks / 2) - 1], 1u);
}
{
- // allocate 132M using 4M granularity should go to (capacity / 2)
+ // Original free space disposition (start chunk, count):
+ // <0, 383> <387, 126>, <NC / 2, NC / 2>
+ // allocate 132M (=33792*4096) = using 4M granularity should go to (capacity / 2)
uint64_t allocated4 = 0;
interval_vector_t a4;
al2.allocate_l2(132 * _1m, 4 * _1m , &allocated4, &a4);
@@ -799,24 +820,40 @@
bins_overall.clear();
al2.collect_stats(bins_overall);
ASSERT_EQ(bins_overall.size(), 3u);
+ ASSERT_EQ(bins_overall[cbits((_1m + _1m / 2 - 0x1000) / 0x1000) - 1], 1u);
+ ASSERT_EQ(bins_overall[cbits((_1m - 0x2000 - 0x80000) / 0x1000) - 1], 1u);
+ ASSERT_EQ(bins_overall[cbits(num_chunks / 2 - 33792) - 1], 1u);
}
{
- // cleanup left 4K chunk in the first 2M
+ // Original free space disposition (start chunk, count):
+ // <0, 383> <387, 126>, <NC / 2 + 33792, NC / 2 - 33792>
+ // cleanup remaining 4*4K chunks in the first 2M
interval_vector_t r;
- r.emplace_back(2 * _1m - 0x1000, 0x1000);
+ r.emplace_back(383 * 4096, 4 * 0x1000);
al2.free_l2(r);
bins_overall.clear();
al2.collect_stats(bins_overall);
ASSERT_EQ(bins_overall.size(), 2u);
+ ASSERT_EQ(bins_overall[cbits((2 * _1m + 0x1000) / 0x1000) - 1], 1u);
+ ASSERT_EQ(bins_overall[cbits(num_chunks / 2 - 33792) - 1], 1u);
}
{
+ // Original free space disposition (start chunk, count):
+ // <0, 513>, <NC / 2 + 33792, NC / 2 - 33792>
// release 132M @(capacity / 2)
interval_vector_t r;
r.emplace_back(capacity / 2, 132 * _1m);
al2.free_l2(r);
+ bins_overall.clear();
+ al2.collect_stats(bins_overall);
+ ASSERT_EQ(bins_overall.size(), 2u);
+ ASSERT_EQ(bins_overall[cbits((2 * _1m + 0x1000) / 0x1000) - 1], 1u);
+ ASSERT_EQ(bins_overall[cbits(num_chunks / 2) - 1], 1u);
}
{
+ // Original free space disposition (start chunk, count):
+ // <0, 513>, <NC / 2, NC / 2>
// allocate 132M using 2M granularity should go to the first chunk and to
// (capacity / 2)
uint64_t allocated4 = 0;
@@ -827,14 +864,31 @@
ASSERT_EQ(a4[0].length, 2 * _1m);
ASSERT_EQ(a4[1].offset, capacity / 2);
ASSERT_EQ(a4[1].length, 130 * _1m);
+
+ bins_overall.clear();
+ al2.collect_stats(bins_overall);
+
+ ASSERT_EQ(bins_overall.size(), 2u);
+ ASSERT_EQ(bins_overall[cbits(0)], 1u);
+ ASSERT_EQ(bins_overall[cbits(num_chunks / 2 - 33792) - 1], 1u);
}
{
+ // Original free space disposition (start chunk, count):
+ // <512, 1>, <NC / 2 + 33792, NC / 2 - 33792>
// release 130M @(capacity / 2)
interval_vector_t r;
r.emplace_back(capacity / 2, 132 * _1m);
al2.free_l2(r);
+ bins_overall.clear();
+ al2.collect_stats(bins_overall);
+
+ ASSERT_EQ(bins_overall.size(), 2u);
+ ASSERT_EQ(bins_overall[cbits(0)], 1u);
+ ASSERT_EQ(bins_overall[cbits(num_chunks / 2) - 1], 1u);
}
{
+ // Original free space disposition (start chunk, count):
+ // <512,1>, <NC / 2, NC / 2>
// release 4K~16K
// release 28K~32K
// release 68K~24K
@@ -843,21 +897,46 @@
r.emplace_back(0x7000, 0x8000);
r.emplace_back(0x11000, 0x6000);
al2.free_l2(r);
+
+ bins_overall.clear();
+ al2.collect_stats(bins_overall);
+
+ ASSERT_EQ(bins_overall.size(), 4u);
+ ASSERT_EQ(bins_overall[cbits(0)], 1u);
+ ASSERT_EQ(bins_overall[cbits(0x4000 / 0x1000) - 1], 2u); // accounts both 0x4000 & 0x6000
+ ASSERT_EQ(bins_overall[cbits(0x8000 / 0x1000) - 1], 1u);
+ ASSERT_EQ(bins_overall[cbits(num_chunks / 2) - 1], 1u);
}
{
- // allocate 32K using 16K granularity - should bypass the first
- // unaligned extent, use the second free extent partially given
- // the 16K alignment and then fallback to capacity / 2
+ // Original free space disposition (start chunk, count):
+ // <1, 4>, <7, 8>, <17, 6> <512,1>, <NC / 2, NC / 2>
+ // allocate 80K using 16K granularity
uint64_t allocated4 = 0;
interval_vector_t a4;
- al2.allocate_l2(0x8000, 0x4000, &allocated4, &a4);
- ASSERT_EQ(a4.size(), 2u);
- ASSERT_EQ(a4[0].offset, 0x8000u);
- ASSERT_EQ(a4[0].length, 0x4000u);
- ASSERT_EQ(a4[1].offset, capacity / 2);
+ al2.allocate_l2(0x14000, 0x4000, &allocated4, &a4);
+
+ ASSERT_EQ(a4.size(), 4);
+ ASSERT_EQ(a4[1].offset, 0x1000u);
ASSERT_EQ(a4[1].length, 0x4000u);
- }
+ ASSERT_EQ(a4[0].offset, 0x7000u);
+ ASSERT_EQ(a4[0].length, 0x8000u);
+ ASSERT_EQ(a4[2].offset, 0x11000u);
+ ASSERT_EQ(a4[2].length, 0x4000u);
+ ASSERT_EQ(a4[3].offset, capacity / 2);
+ ASSERT_EQ(a4[3].length, 0x4000u);
+
+ bins_overall.clear();
+ al2.collect_stats(bins_overall);
+ ASSERT_EQ(bins_overall.size(), 3u);
+ ASSERT_EQ(bins_overall[cbits(0)], 1u);
+ ASSERT_EQ(bins_overall[cbits(0x2000 / 0x1000) - 1], 1u);
+ ASSERT_EQ(bins_overall[cbits(num_chunks / 2 - 1) - 1], 1u);
+ }
+ {
+ // Original free space disposition (start chunk, count):
+ // <21, 2> <512,1>, <NC / 2 + 1, NC / 2 - 1>
+ }
}
std::cout << "Done L2 cont aligned" << std::endl;
}
@@ -913,7 +992,7 @@
al2.allocate_l2(0x3e000000, _1m, &allocated4, &a4);
ASSERT_EQ(a4.size(), 2u);
ASSERT_EQ(allocated4, 0x3e000000u);
- ASSERT_EQ(a4[0].offset, 0x5fed00000u);
+ ASSERT_EQ(a4[0].offset, 0x5fec30000u);
ASSERT_EQ(a4[0].length, 0x1300000u);
ASSERT_EQ(a4[1].offset, 0x628000000u);
ASSERT_EQ(a4[1].length, 0x3cd00000u);
diff -ur ceph-18.2.1~/src/test/objectstore/store_test.cc ceph-18.2.1/src/test/objectstore/store_test.cc
--- ceph-18.2.1~/src/test/objectstore/store_test.cc 2023-11-14 14:36:19.000000000 -0500
+++ ceph-18.2.1/src/test/objectstore/store_test.cc 2023-12-11 16:55:38.000000000 -0500
@@ -9524,9 +9524,9 @@
string key;
_key_encode_u64(1, &key);
bluestore_shared_blob_t sb(1);
- sb.ref_map.get(0x2000, block_size);
- sb.ref_map.get(0x4000, block_size);
- sb.ref_map.get(0x4000, block_size);
+ sb.ref_map.get(0x822000, block_size);
+ sb.ref_map.get(0x824000, block_size);
+ sb.ref_map.get(0x824000, block_size);
bufferlist bl;
encode(sb, bl);
bstore->inject_broken_shared_blob_key(key, bl);