Revert OL changes

This commit is contained in:
eabdullin 2025-03-19 14:54:35 +03:00
parent affd2ddd5a
commit 54fab10085
3 changed files with 1 additions and 251 deletions

View File

@ -1,118 +0,0 @@
From 4c23dc37600e4ca96855d7d818f171cf84f57034 Mon Sep 17 00:00:00 2001
From: Shminderjit Singh <shminderjit.singh@oracle.com>
Date: Fri, 13 Sep 2024 06:56:04 +0000
Subject: [PATCH] mdadm: Fix socket connection failure when mdmon runs in
foreground mode.
While creating an IMSM RAID, mdadm will wait for the mdmon main process
to finish if mdmon runs in forking mode. This is because with
"Type=forking" in the mdmon service unit file, "systemctl start service"
will block until the main process of mdmon exits. At that moment, mdmon
has already created the socket, so the subsequent socket connect from
mdadm will succeed.
However, when mdmon runs in foreground mode (without "Type=forking" in
the service unit file), "systemctl start service" will return once the
mdmon process starts. This causes mdadm and mdmon to run in parallel,
which may lead to a socket connection failure since mdmon has not yet
initialized the socket when mdadm tries to connect. If the next
instruction/command is to access this device and try to write to it, a
permission error will occur since mdmon has not yet set the array to RW
mode.
(cherry picked from commit 3cbe13403ec0c78374343dcd889609aefe791f9b)
Signed-off-by: Shminderjit Singh <shminderjit.singh@oracle.com>
---
Create.c | 6 ++++--
mdadm.h | 1 +
util.c | 45 +++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 50 insertions(+), 2 deletions(-)
diff --git a/Create.c b/Create.c
index a280c7bc7655..35c3496cf7df 100644
--- a/Create.c
+++ b/Create.c
@@ -1311,9 +1311,11 @@ int Create(struct supertype *st, struct mddev_ident *ident, int subdevs,
if (c->verbose >= 0)
pr_info("array %s started.\n", chosen_name);
if (st->ss->external && st->container_devnm[0]) {
- if (need_mdmon)
+ if (need_mdmon) {
start_mdmon(st->container_devnm);
-
+ if (wait_for_mdmon_control_socket(st->container_devnm) != MDADM_STATUS_SUCCESS)
+ goto abort;
+ }
ping_monitor(st->container_devnm);
close(container_fd);
}
diff --git a/mdadm.h b/mdadm.h
index 4b0b1d1bd18a..e041263d3d2b 100644
--- a/mdadm.h
+++ b/mdadm.h
@@ -1751,6 +1751,7 @@ extern int is_subarray_active(char *subarray, char *devname);
extern int open_subarray(char *dev, char *subarray, struct supertype *st, int quiet);
extern struct superswitch *version_to_superswitch(char *vers);
+extern mdadm_status_t wait_for_mdmon_control_socket(const char *container_devnm);
extern int mdmon_running(const char *devnm);
extern int mdmon_pid(const char *devnm);
extern mdadm_status_t wait_for_mdmon(const char *devnm);
diff --git a/util.c b/util.c
index 4d2329a9988b..bd8be4b851cf 100644
--- a/util.c
+++ b/util.c
@@ -1954,6 +1954,51 @@ int mdmon_running(const char *devnm)
return 0;
}
+/*
+ * wait_for_mdmon_control_socket() - Waits for mdmon control socket
+ * to be created within specified time.
+ * @container_devnm: Device for which mdmon control socket should start.
+ *
+ * In foreground mode, when mdadm is trying to connect to control
+ * socket it is possible that the mdmon has not created it yet.
+ * Give some time to mdmon to create socket. Timeout set to 2 sec.
+ *
+ * Return: MDADM_STATUS_SUCCESS if connect succeed, otherwise return
+ * error code.
+ */
+mdadm_status_t wait_for_mdmon_control_socket(const char *container_devnm)
+{
+ enum mdadm_status status = MDADM_STATUS_SUCCESS;
+ int sfd, rv, retry_count = 0;
+ struct sockaddr_un addr;
+ char path[PATH_MAX];
+
+ snprintf(path, PATH_MAX, "%s/%s.sock", MDMON_DIR, container_devnm);
+ sfd = socket(PF_LOCAL, SOCK_STREAM, 0);
+ if (!is_fd_valid(sfd))
+ return MDADM_STATUS_ERROR;
+
+ addr.sun_family = PF_LOCAL;
+ strncpy(addr.sun_path, path, sizeof(addr.sun_path) - 1);
+ addr.sun_path[sizeof(addr.sun_path) - 1] = '\0';
+
+ for (retry_count = 0; retry_count < 10; retry_count++) {
+ rv = connect(sfd, (struct sockaddr*)&addr, sizeof(addr));
+ if (rv < 0) {
+ sleep_for(0, MSEC_TO_NSEC(200), true);
+ continue;
+ }
+ break;
+ }
+
+ if (rv < 0) {
+ pr_err("Failed to connect to control socket.\n");
+ status = MDADM_STATUS_ERROR;
+ }
+ close(sfd);
+ return status;
+}
+
/*
* wait_for_mdmon() - Waits for mdmon within specified time.
* @devnm: Device for which mdmon should start.
--
2.39.3

View File

@ -1,124 +0,0 @@
From 371604603dd6d5da0465df968039737287cd7ab4 Mon Sep 17 00:00:00 2001
From: Junxiao Bi <junxiao.bi@oracle.com>
Date: Thu, 27 Feb 2025 17:52:05 +0000
Subject: [PATCH] mdmon: imsm: fix metadata corruption when managing new array
When manager thread detects new array, it will invoke manage_new().
For imsm array, it will further invoke imsm_open_new(). Since
commit bbab0940fa75("imsm: write bad block log on metadata sync"),
it preallocates bad block log when opening the array, that requires
increasing the mpb buffer size.
For that, imsm_open_new() invokes function imsm_update_metadata_locally(),
which first uses imsm_prepare_update() to allocate a larger mpb buffer
and store it at "mpb->next_buf", and then invoke imsm_process_update()
to copy the content from current mpb buffer "mpb->buf" to "mpb->next_buf",
and then free the current mpb buffer and set the new buffer as current.
There is a small race window, when monitor thread is syncing metadata,
it gets current buffer pointer in imsm_sync_metadata()->write_super_imsm(),
but before flushing the buffer to disk, manager thread does above switching
buffer which frees current buffer, then monitor thread will run into
use-after-free issue and could cause on-disk metadata corruption.
If system keeps running, further metadata update could fix the corruption,
because after switching buffer, the new buffer will contain good metadata,
but if panic/power cycle happens while disk metadata is corrupted,
the system will run into bootup failure if array is used as root,
otherwise the array can not be assembled after boot if not used as root.
This issue will not happen for imsm array with only one member array,
because the memory array has not be opened yet, monitor thread will not
do any metadata updates.
This can happen for imsm array with at lease two member array, in the
following two scenarios:
1. Restarting mdmon process with at least two member array
This will happen during system boot up or user restart mdmon after mdadm
upgrade
2. Adding new member array to exist imsm array with at least one member
array.
To fix this, delay the switching buffer operation to monitor thread.
Orabug: 37635990
Fixes: bbab0940fa75 ("imsm: write bad block log on metadata sync")
Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
---
managemon.c | 10 ++++++++--
super-intel.c | 14 +++++++++++---
2 files changed, 19 insertions(+), 5 deletions(-)
diff --git a/managemon.c b/managemon.c
index 358459e79435..f5e42662dffd 100644
--- a/managemon.c
+++ b/managemon.c
@@ -660,11 +660,12 @@ static void manage_new(struct mdstat_ent *mdstat,
* the monitor.
*/
+ struct metadata_update *update = NULL;
struct active_array *new = NULL;
struct mdinfo *mdi = NULL, *di;
- int i, inst;
- int failed = 0;
char buf[SYSFS_MAX_BUF_SIZE];
+ int failed = 0;
+ int i, inst;
/* check if array is ready to be monitored */
if (!mdstat->active || !mdstat->level)
@@ -763,9 +764,14 @@ static void manage_new(struct mdstat_ent *mdstat,
/* if everything checks out tell the metadata handler we want to
* manage this instance
*/
+ container->update_tail = &update;
if (!aa_ready(new) || container->ss->open_new(container, new, inst) < 0) {
+ container->update_tail = NULL;
goto error;
} else {
+ if (update)
+ queue_metadata_update(update);
+ container->update_tail = NULL;
replace_array(container, victim, new);
if (failed) {
new->check_degraded = 1;
diff --git a/super-intel.c b/super-intel.c
index 2b8b6fda976c..2363b50a174c 100644
--- a/super-intel.c
+++ b/super-intel.c
@@ -8396,12 +8396,15 @@ static int imsm_count_failed(struct intel_super *super, struct imsm_dev *dev,
return failed;
}
+static int imsm_prepare_update(struct supertype *st,
+ struct metadata_update *update);
static int imsm_open_new(struct supertype *c, struct active_array *a,
int inst)
{
struct intel_super *super = c->sb;
struct imsm_super *mpb = super->anchor;
- struct imsm_update_prealloc_bb_mem u;
+ struct imsm_update_prealloc_bb_mem *u;
+ struct metadata_update mu;
if (inst >= mpb->num_raid_devs) {
pr_err("subarry index %d, out of range\n", inst);
@@ -8411,8 +8414,13 @@ static int imsm_open_new(struct supertype *c, struct active_array *a,
dprintf("imsm: open_new %d\n", inst);
a->info.container_member = inst;
- u.type = update_prealloc_badblocks_mem;
- imsm_update_metadata_locally(c, &u, sizeof(u));
+ u = xmalloc(sizeof(*u));
+ u->type = update_prealloc_badblocks_mem;
+ mu.len = sizeof(*u);
+ mu.buf = (char *)u;
+ imsm_prepare_update(c, &mu);
+ if (c->update_tail)
+ append_metadata_update(c, u, sizeof(*u));
return 0;
}
--
2.39.3

View File

@ -2,7 +2,7 @@ Name: mdadm
Version: 4.3
# extraversion is used to define rhel internal version
%define extraversion 4
Release: %{extraversion}.0.1%{?dist}
Release: %{extraversion}%{?dist}
Summary: The mdadm program controls Linux md devices (software RAID arrays)
URL: http://www.kernel.org/pub/linux/utils/raid/mdadm/
License: GPLv2+
@ -94,10 +94,6 @@ Patch068: 0071-mdadm-Increase-number-limit-in-md-device-name-to-102.patch
Patch200: mdadm-udev.patch
Patch201: mdadm-2.5.2-static.patch
#Oracle Patch
Patch1001: 1001-mdadm-Fix-socket-connection-failure-when-mdmon-runs-.patch
Patch1003: 1003-mdmon-imsm-fix-metadata-corruption-when-managing-new.patch
BuildRequires: make
BuildRequires: systemd-rpm-macros binutils-devel gcc systemd-devel
Requires: libreport-filesystem
@ -170,10 +166,6 @@ install -m644 %{SOURCE5} %{buildroot}/etc/libreport/events.d
/usr/share/mdadm/mdcheck
%changelog
* Tue Mar 18 2025 Akshata Konala <akshata.konala@oracle.com> - 4.3-4.0.1
- mdmon: imsm: fix metadata corruption when managing new array. [Orabug: 37635990]
- Fix socket connection failure when mdmon runs in foreground mode. [Orabug: 36077756]
* Mon Dec 16 2024 Xiao Ni <xni@redhat.com> 4.3-4
- Increase number limit in md device name to 1024
- Resolves RHEL-71365