Fix bug where IMSM arrays stay inactive in case of reboot during reshape

Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
This commit is contained in:
Jes Sorensen 2012-04-30 14:29:26 +02:00
parent 71165988bf
commit 4ac0f8fa3e
17 changed files with 892 additions and 5 deletions

View File

@ -0,0 +1,30 @@
From 111e9fdaa8a5084bd329819a0906a685b2271c0d Mon Sep 17 00:00:00 2001
From: Adam Kwolek <adam.kwolek@intel.com>
Date: Tue, 7 Feb 2012 15:03:19 +0100
Subject: [PATCH 03/12] FIX: Array is not run when expansion disks are added
When added disk is disk added by expansion and this is last disk added
to array, assemble_container_content() will not even try to run such array.
Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
---
Assemble.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/Assemble.c b/Assemble.c
index ad4eb9c..13adfc3 100644
--- a/Assemble.c
+++ b/Assemble.c
@@ -1557,7 +1557,7 @@ int assemble_container_content(struct supertype *st, int mdfd,
working++;
} else if (errno == EEXIST)
preexist++;
- if (working == 0)
+ if (working + expansion == 0)
return 1;/* Nothing new, don't try to start */
map_update(&map, fd2devnum(mdfd),
--
1.7.4.4

View File

@ -0,0 +1,63 @@
From d7592845d4a9b885af7121f8ff6ba4f77610cd32 Mon Sep 17 00:00:00 2001
From: Adam Kwolek <adam.kwolek@intel.com>
Date: Thu, 16 Feb 2012 14:16:04 +0100
Subject: [PATCH 15/15] FIX: Changes in '0' case for reshape position
verification
Reading sysfs entry that is '0' long should cause an error.
Reshape position cannot be empty.
Absence of reshape position should be ignored. It is possible
that we are about raid0 reshape continuation and it is before takeover.
This means that according metadata (changed by mdmon) it should be reshaped
but md knows nothing about it at this moment. Reshape continuation
in reshape_array() will change it to raid4 and reshape position appears
in sysfs.
Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
---
Grow.c | 12 ++++++++++--
1 files changed, 10 insertions(+), 2 deletions(-)
diff --git a/Grow.c b/Grow.c
index 53a7cad..239b50d 100644
--- a/Grow.c
+++ b/Grow.c
@@ -1876,9 +1876,12 @@ static int verify_reshape_position(struct mdinfo *info, int level)
{
int ret_val = 0;
char buf[40];
+ int rv;
/* read sync_max, failure can mean raid0 array */
- if (sysfs_get_str(info, NULL, "sync_max", buf, 40) > 0) {
+ rv = sysfs_get_str(info, NULL, "sync_max", buf, 40);
+
+ if (rv > 0) {
char *ep;
unsigned long long position = strtoull(buf, &ep, 0);
@@ -1906,6 +1909,11 @@ static int verify_reshape_position(struct mdinfo *info, int level)
ret_val = 1;
}
}
+ } else if (rv == 0) {
+ /* for valid sysfs entry, 0-length content
+ * should be indicated as error
+ */
+ ret_val = -1;
}
return ret_val;
@@ -3975,7 +3983,7 @@ int Grow_continue_command(char *devname, int fd,
* correct position
*/
if (verify_reshape_position(content,
- map_name(pers, mdstat->level)) <= 0) {
+ map_name(pers, mdstat->level)) < 0) {
ret_val = 1;
goto Grow_continue_command_exit;
}
--
1.7.4.4

View File

@ -0,0 +1,50 @@
From 1ca90aa6484a6f5d4fdd6122ad1d2015209bd8e0 Mon Sep 17 00:00:00 2001
From: Adam Kwolek <adam.kwolek@intel.com>
Date: Thu, 9 Feb 2012 12:38:15 +1100
Subject: [PATCH 12/12] FIX: Do not try to (continue) reshape using inactive
array
When one of arrays is inactive, do not try to continue reshape
on this array. Just skip it.
Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
---
Grow.c | 14 ++++++++++++++
1 files changed, 14 insertions(+), 0 deletions(-)
diff --git a/Grow.c b/Grow.c
index 61adefa..53a7cad 100644
--- a/Grow.c
+++ b/Grow.c
@@ -2626,6 +2626,13 @@ int reshape_container(char *container, char *devname,
devname2devnum(container));
if (!mdstat)
continue;
+ if (mdstat->active == 0) {
+ fprintf(stderr, Name ": Skipping inactive "
+ "array md%i.\n", mdstat->devnum);
+ free_mdstat(mdstat);
+ mdstat = NULL;
+ continue;
+ }
break;
}
if (!content)
@@ -3922,6 +3929,13 @@ int Grow_continue_command(char *devname, int fd,
mdstat = mdstat_by_subdev(array, container_dev);
if (!mdstat)
continue;
+ if (mdstat->active == 0) {
+ fprintf(stderr, Name ": Skipping inactive "
+ "array md%i.\n", mdstat->devnum);
+ free_mdstat(mdstat);
+ mdstat = NULL;
+ continue;
+ }
break;
}
if (!content) {
--
1.7.4.4

View File

@ -0,0 +1,35 @@
From 5d1c7cdaca575d8a32a7a82517d88e2099f6a213 Mon Sep 17 00:00:00 2001
From: Adam Kwolek <adam.kwolek@intel.com>
Date: Thu, 12 Jan 2012 08:12:39 +0100
Subject: [PATCH] FIX: External metadata sometimes is not updated
External metadata sometimes is not updated.
It can be observed during 2 raid0 arrays Capacity Expansion.
New array size is not set, because metadata is not updated and on the reshape
end mdadm doesn't read new array size from metadata.
This happens when mdmon finishes his work (due to takeover to raid0),
before all metadata updates are processed.
Make sure that all updates are flushed to disk before executing takeover.
Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
---
Grow.c | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)
diff --git a/Grow.c b/Grow.c
index b2c1360..89f563c 100644
--- a/Grow.c
+++ b/Grow.c
@@ -2396,6 +2396,7 @@ started:
/* Re-load the metadata as much could have changed */
int cfd = open_dev(st->container_dev);
if (cfd >= 0) {
+ ping_manager(container);
ping_monitor(container);
st->ss->free_super(st);
st->ss->load_container(st, cfd, container);
--
1.7.4.4

View File

@ -0,0 +1,38 @@
From 92d49ecfaabcd015cf9957a0863996eaa5755747 Mon Sep 17 00:00:00 2001
From: Adam Kwolek <adam.kwolek@intel.com>
Date: Tue, 7 Feb 2012 15:03:03 +0100
Subject: [PATCH 01/12] FIX: NULL pointer to strdup() can be passed
When result from strchr() is NULL and it is assigned to subarray,
NULL pointer can be passed to strdup() function and coredump file
is generated.
Subarray is checked for NULL pointer, so it is assumed that it can
be NULL at this moment.
Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
---
util.c | 5 +++--
1 files changed, 3 insertions(+), 2 deletions(-)
diff --git a/util.c b/util.c
index e5f7a20..7abbff7 100644
--- a/util.c
+++ b/util.c
@@ -966,9 +966,10 @@ struct supertype *super_by_fd(int fd, char **subarrayp)
char *dev = verstr+1;
subarray = strchr(dev, '/');
- if (subarray)
+ if (subarray) {
*subarray++ = '\0';
- subarray = strdup(subarray);
+ subarray = strdup(subarray);
+ }
container = devname2devnum(dev);
if (sra)
sysfs_free(sra);
--
1.7.4.4

View File

@ -0,0 +1,39 @@
From 3c20f9899bc95b35f5b9544c6741b4fccd616326 Mon Sep 17 00:00:00 2001
From: Adam Kwolek <adam.kwolek@intel.com>
Date: Thu, 12 Jan 2012 08:12:47 +0100
Subject: [PATCH] FIX: mdmon check in reshape_container() can cause a problem
When raid0 reshape is executed mdmon can dissappear due to raid level
takeover operation. If this happen before mdmon check, mdadm would treat
it as error condition. It is not true for this case.
Remove mdmon check from reshape_container() function.
Error condition check will remain using reshape_array() reentry test
for the same array (line 2577).
Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
---
Grow.c | 6 ------
1 files changed, 0 insertions(+), 6 deletions(-)
diff --git a/Grow.c b/Grow.c
index 89f563c..c1bc1ca 100644
--- a/Grow.c
+++ b/Grow.c
@@ -2608,12 +2608,6 @@ int reshape_container(char *container, char *devname,
restart = 0;
if (rv)
break;
- rv = !mdmon_running(devname2devnum(container));
- if (rv) {
- printf(Name ": Mdmon is not found. "
- "Cannot continue container reshape.\n");
- break;
- }
}
if (!rv)
unfreeze(st);
--
1.7.4.4

View File

@ -0,0 +1,45 @@
From e1dd332a09c66ef0df68229cc633b8f2521e5db4 Mon Sep 17 00:00:00 2001
From: Adam Kwolek <adam.kwolek@intel.com>
Date: Thu, 9 Feb 2012 12:37:40 +1100
Subject: [PATCH 11/12] FIX: restart reshape when reshape process is stopped
just between 2 reshapes
When reshape is restarted from '0', very begin of array
it is possible that for external metadata reshape and array
configuration doesn't happen.
Check if md has the same opinion, and reshape is restarted
from 0. If so, this is regular reshape start after reshape
switch in metadata to next array only.
Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
---
Grow.c | 12 ++++++++++++
1 files changed, 12 insertions(+), 0 deletions(-)
diff --git a/Grow.c b/Grow.c
index 6b1380a..61adefa 100644
--- a/Grow.c
+++ b/Grow.c
@@ -1980,6 +1980,18 @@ static int reshape_array(char *container, int fd, char *devname,
goto release;
}
+ if (st->ss->external && restart && (info->reshape_progress == 0)) {
+ /* When reshape is restarted from '0', very begin of array
+ * it is possible that for external metadata reshape and array
+ * configuration doesn't happen.
+ * Check if md has the same opinion, and reshape is restarted
+ * from 0. If so, this is regular reshape start after reshape
+ * switch in metadata to next array only.
+ */
+ if ((verify_reshape_position(info, reshape.level) >= 0) &&
+ (info->reshape_progress == 0))
+ restart = 0;
+ }
if (restart) {
/* reshape already started. just skip to monitoring the reshape */
if (reshape.backup_blocks == 0)
--
1.7.4.4

View File

@ -0,0 +1,139 @@
From f93346ef078fde20e46849901efa16dd1b05ec33 Mon Sep 17 00:00:00 2001
From: Adam Kwolek <adam.kwolek@intel.com>
Date: Thu, 9 Feb 2012 12:36:41 +1100
Subject: [PATCH 08/12] FIX: use md position to reshape restart
When reshape is broken, it can occur that metadata is not saved properly.
This can cause that reshape process is farther in md than metadata states.
On reshape restart use md position as start position, if it is farther than
position specified in metadata. Opposite situation treat as error.
Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
---
Grow.c | 86 ++++++++++++++++++++++++++++++++++++++++++++-------------------
1 files changed, 60 insertions(+), 26 deletions(-)
diff --git a/Grow.c b/Grow.c
index 70bdee1..6b1380a 100644
--- a/Grow.c
+++ b/Grow.c
@@ -1862,6 +1862,55 @@ release:
return rv;
}
+/* verify_reshape_position()
+ * Function checks if reshape position in metadata is not farther
+ * than position in md.
+ * Return value:
+ * 0 : not valid sysfs entry
+ * it can be caused by not started reshape, it should be started
+ * by reshape array or raid0 array is before takeover
+ * -1 : error, reshape position is obviously wrong
+ * 1 : success, reshape progress correct or updated
+*/
+static int verify_reshape_position(struct mdinfo *info, int level)
+{
+ int ret_val = 0;
+ char buf[40];
+
+ /* read sync_max, failure can mean raid0 array */
+ if (sysfs_get_str(info, NULL, "sync_max", buf, 40) > 0) {
+ char *ep;
+ unsigned long long position = strtoull(buf, &ep, 0);
+
+ dprintf(Name": Read sync_max sysfs entry is: %s\n", buf);
+ if (!(ep == buf || (*ep != 0 && *ep != '\n' && *ep != ' '))) {
+ position *= get_data_disks(level,
+ info->new_layout,
+ info->array.raid_disks);
+ if (info->reshape_progress < position) {
+ dprintf("Corrected reshape progress (%llu) to "
+ "md position (%llu)\n",
+ info->reshape_progress, position);
+ info->reshape_progress = position;
+ ret_val = 1;
+ } else if (info->reshape_progress > position) {
+ fprintf(stderr, Name ": Fatal error: array "
+ "reshape was not properly frozen "
+ "(expected reshape position is %llu, "
+ "but reshape progress is %llu.\n",
+ position, info->reshape_progress);
+ ret_val = -1;
+ } else {
+ dprintf("Reshape position in md and metadata "
+ "are the same;");
+ ret_val = 1;
+ }
+ }
+ }
+
+ return ret_val;
+}
+
static int reshape_array(char *container, int fd, char *devname,
struct supertype *st, struct mdinfo *info,
int force, struct mddev_dev *devlist,
@@ -2251,9 +2300,16 @@ started:
sra->new_chunk = info->new_chunk;
- if (restart)
+ if (restart) {
+ /* for external metadata checkpoint saved by mdmon can be lost
+ * or missed /due to e.g. crash/. Check if md is not during
+ * restart farther than metadata points to.
+ * If so, this means metadata information is obsolete.
+ */
+ if (st->ss->external)
+ verify_reshape_position(info, reshape.level);
sra->reshape_progress = info->reshape_progress;
- else {
+ } else {
sra->reshape_progress = 0;
if (reshape.after.data_disks < reshape.before.data_disks)
/* start from the end of the new array */
@@ -3765,8 +3821,6 @@ int Grow_continue_command(char *devname, int fd,
char buf[40];
int cfd = -1;
int fd2 = -1;
- char *ep;
- unsigned long long position;
dprintf("Grow continue from command line called for %s\n",
devname);
@@ -3894,28 +3948,8 @@ int Grow_continue_command(char *devname, int fd,
/* verify that array under reshape is started from
* correct position
*/
- ret_val = sysfs_get_str(content, NULL, "sync_max", buf, 40);
- if (ret_val <= 0) {
- fprintf(stderr, Name
- ": cannot open verify reshape progress for %s (%i)\n",
- content->sys_name, ret_val);
- ret_val = 1;
- goto Grow_continue_command_exit;
- }
- dprintf(Name ": Read sync_max sysfs entry is: %s\n", buf);
- position = strtoull(buf, &ep, 0);
- if (ep == buf || (*ep != 0 && *ep != '\n' && *ep != ' ')) {
- fprintf(stderr, Name ": Fatal error: array reshape was"
- " not properly frozen\n");
- ret_val = 1;
- goto Grow_continue_command_exit;
- }
- position *= get_data_disks(map_name(pers, mdstat->level),
- content->new_layout,
- content->array.raid_disks);
- if (position != content->reshape_progress) {
- fprintf(stderr, Name ": Fatal error: array reshape was"
- " not properly frozen.\n");
+ if (verify_reshape_position(content,
+ map_name(pers, mdstat->level)) <= 0) {
ret_val = 1;
goto Grow_continue_command_exit;
}
--
1.7.4.4

View File

@ -0,0 +1,46 @@
From 50927b1323a4cfcbf3729ff552c496695d6199eb Mon Sep 17 00:00:00 2001
From: Adam Kwolek <adam.kwolek@intel.com>
Date: Tue, 7 Feb 2012 15:03:35 +0100
Subject: [PATCH 05/12] Fix: Sometimes mdmon throws core dump during reshape
Problem was found during reshaping 2 volumes /raid0 and raid5/ in container.
Sometimes mdmon throws core dump due to NULL pointer exception.
Problem occurs in scenario:
- managemon: is about spare activation (degraded raid4 volume == raid0 under takeover)
- managemon: detect level change and signals monitor (manage_member() calls replace_array())
- monitor: detects transition raid4/5->raid0 and sets a->container to NULL
to indicate array deactivation
- managemon : continues his work and tries to activate spare (a->check_degraded is set).
NULL pointer is passed to metadata handler activate_spare()
Core dump is generated.
To resolve this situation managemon (after monitor kick) checks again
a->container pointer to learn if current array is not to be deactivated.
Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
---
managemon.c | 6 ++++++
1 files changed, 6 insertions(+), 0 deletions(-)
diff --git a/managemon.c b/managemon.c
index cde0d8b..6c21ecb 100644
--- a/managemon.c
+++ b/managemon.c
@@ -486,6 +486,12 @@ static void manage_member(struct mdstat_ent *mdstat,
}
}
+ /* we are after monitor kick,
+ * so container field can be cleared - check it again
+ */
+ if (a->container == NULL)
+ return;
+
/* We don't check the array while any update is pending, as it
* might container a change (such as a spare assignment) which
* could affect our decisions.
--
1.7.4.4

View File

@ -0,0 +1,96 @@
From 78340e26a54db960de238b511f5cdc74aebe4453 Mon Sep 17 00:00:00 2001
From: Adam Kwolek <adam.kwolek@intel.com>
Date: Tue, 7 Feb 2012 15:03:43 +0100
Subject: [PATCH 06/12] Flush mdmon before next reshape step during container
operation
Using takeover operation for grow purposes, mdadm has to be sure
that mdmon processes all updates, and if necessary it will be closed
at takeover to raid0 operation. If mdmon is late, next array in container
is processed and due to race condition mdmon closes itself instead to monitor
next reshape operation.
Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
---
Grow.c | 12 ++++++++++--
msg.c | 10 ++++++++++
msg.h | 1 +
3 files changed, 21 insertions(+), 2 deletions(-)
diff --git a/Grow.c b/Grow.c
index 36a1de7..70bdee1 100644
--- a/Grow.c
+++ b/Grow.c
@@ -2003,6 +2003,9 @@ static int reshape_array(char *container, int fd, char *devname,
if (reshape.level > 0 && st->ss->external) {
/* make sure mdmon is aware of the new level */
+ if (mdmon_running(st->container_dev))
+ flush_mdmon(container);
+
if (!mdmon_running(st->container_dev))
start_mdmon(st->container_dev);
ping_monitor(container);
@@ -2396,8 +2399,7 @@ started:
/* Re-load the metadata as much could have changed */
int cfd = open_dev(st->container_dev);
if (cfd >= 0) {
- ping_manager(container);
- ping_monitor(container);
+ flush_mdmon(container);
st->ss->free_super(st);
st->ss->load_container(st, cfd, container);
close(cfd);
@@ -2594,6 +2596,9 @@ int reshape_container(char *container, char *devname,
sysfs_init(content, fd, mdstat->devnum);
+ if (mdmon_running(devname2devnum(container)))
+ flush_mdmon(container);
+
rv = reshape_array(container, fd, adev, st,
content, force, NULL,
backup_file, quiet, 1, restart,
@@ -2608,6 +2613,9 @@ int reshape_container(char *container, char *devname,
restart = 0;
if (rv)
break;
+
+ if (mdmon_running(devname2devnum(container)))
+ flush_mdmon(container);
}
if (!rv)
unfreeze(st);
diff --git a/msg.c b/msg.c
index dc780b3..44aad1f 100644
--- a/msg.c
+++ b/msg.c
@@ -487,3 +487,13 @@ int ping_manager(char *devname)
close(sfd);
return err;
}
+
+/* using takeover operation for grow purposes, mdadm has to be sure
+ * that mdmon processes all updates, and if necessary it will be closed
+ * at takeover to raid0 operation
+ */
+void flush_mdmon(char *container)
+{
+ ping_manager(container);
+ ping_monitor(container);
+}
diff --git a/msg.h b/msg.h
index c6d037d..eefa649 100644
--- a/msg.h
+++ b/msg.h
@@ -34,5 +34,6 @@ extern int block_monitor(char *container, const int freeze);
extern void unblock_monitor(char *container, const int unfreeze);
extern int fping_monitor(int sock);
extern int ping_manager(char *devname);
+extern void flush_mdmon(char *container);
#define MSG_MAX_LEN (4*1024*1024)
--
1.7.4.4

View File

@ -0,0 +1,32 @@
From e1742195ff3dba97929f81af6b7633481a23397a Mon Sep 17 00:00:00 2001
From: Adam Kwolek <adam.kwolek@intel.com>
Date: Tue, 7 Feb 2012 15:03:51 +0100
Subject: [PATCH 07/12] imsm: FIX: Chunk size migration problem
When chunk size migration occurs (e.g. 128k->4k) first checkpoint cannot
be set in md due to too small step. Correct migration record initialization
to allow whole copy area usage and increase migration checkpoint step.
Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
---
super-intel.c | 3 ++-
1 files changed, 2 insertions(+), 1 deletions(-)
diff --git a/super-intel.c b/super-intel.c
index 19a2c84..f5762d8 100644
--- a/super-intel.c
+++ b/super-intel.c
@@ -8913,7 +8913,8 @@ void init_migr_record_imsm(struct supertype *st, struct imsm_dev *dev,
migr_rec->dest_depth_per_unit = GEN_MIGR_AREA_SIZE /
max(map_dest->blocks_per_strip, map_src->blocks_per_strip);
- migr_rec->dest_depth_per_unit *= map_dest->blocks_per_strip;
+ migr_rec->dest_depth_per_unit *=
+ max(map_dest->blocks_per_strip, map_src->blocks_per_strip);
new_data_disks = imsm_num_data_members(dev, MAP_0);
migr_rec->blocks_per_unit =
__cpu_to_le32(migr_rec->dest_depth_per_unit * new_data_disks);
--
1.7.4.4

View File

@ -0,0 +1,88 @@
From 51d83f5d119f9b727cc715b22b1625332bd0130b Mon Sep 17 00:00:00 2001
From: Adam Kwolek <adam.kwolek@intel.com>
Date: Thu, 9 Feb 2012 12:37:04 +1100
Subject: [PATCH 10/12] imsm: FIX: Clear migration record when migration
switches to next volume.
When OLCE is in progress, checkpoint steps are getting bigger due to added space during process.
When mdadm fails after saving "max" to sync_max, mdmon will monitor process
and switch reshape to next array. At this moment we have got information
inconsistency between metadata and migration record.
To avoid this, clear migration record by mdmon /exception from the rule
that migration record is maintained by mdadm/ when reshape switches
to next array.
Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
---
super-intel.c | 19 ++++++++++++++++---
1 files changed, 16 insertions(+), 3 deletions(-)
diff --git a/super-intel.c b/super-intel.c
index 5f451f3..958edb5 100644
--- a/super-intel.c
+++ b/super-intel.c
@@ -353,6 +353,9 @@ struct intel_super {
void *migr_rec_buf; /* buffer for I/O operations */
struct migr_record *migr_rec; /* migration record */
};
+ int clean_migration_record_by_mdmon; /* when reshape is switched to next
+ array, it indicates that mdmon is allowed to clean migration
+ record */
size_t len; /* size of the 'buf' allocation */
void *next_buf; /* for realloc'ing buf from the manager */
size_t next_len;
@@ -3465,6 +3468,7 @@ static int load_imsm_mpb(int fd, struct intel_super *super, char *devname)
free(super->buf);
return 2;
}
+ super->clean_migration_record_by_mdmon = 0;
if (!sectors) {
check_sum = __gen_imsm_checksum(super->anchor);
@@ -5029,6 +5033,10 @@ static int write_super_imsm(struct supertype *st, int doclose)
sum = __gen_imsm_checksum(mpb);
mpb->check_sum = __cpu_to_le32(sum);
+ if (super->clean_migration_record_by_mdmon) {
+ clear_migration_record = 1;
+ super->clean_migration_record_by_mdmon = 0;
+ }
if (clear_migration_record)
memset(super->migr_rec_buf, 0, MIGR_REC_BUF_SIZE);
@@ -5036,9 +5044,6 @@ static int write_super_imsm(struct supertype *st, int doclose)
for (d = super->disks; d ; d = d->next) {
if (d->index < 0 || is_failed(&d->disk))
continue;
- if (store_imsm_mpb(d->fd, mpb))
- fprintf(stderr, "%s: failed for device %d:%d (fd: %d)%s\n",
- __func__, d->major, d->minor, d->fd, strerror(errno));
if (clear_migration_record) {
unsigned long long dsize;
@@ -5050,6 +5055,13 @@ static int write_super_imsm(struct supertype *st, int doclose)
perror("Write migr_rec failed");
}
}
+
+ if (store_imsm_mpb(d->fd, mpb))
+ fprintf(stderr,
+ "%s: failed for device %d:%d (fd: %d)%s\n",
+ __func__, d->major, d->minor,
+ d->fd, strerror(errno));
+
if (doclose) {
close(d->fd);
d->fd = -1;
@@ -6928,6 +6940,7 @@ static void imsm_progress_container_reshape(struct intel_super *super)
map2->num_members = prev_num_members;
imsm_set_array_size(dev);
+ super->clean_migration_record_by_mdmon = 1;
super->updates_pending++;
}
}
--
1.7.4.4

View File

@ -0,0 +1,44 @@
From d2bde6d3aa9468ddf0965f09907a666b92186e42 Mon Sep 17 00:00:00 2001
From: Adam Kwolek <adam.kwolek@intel.com>
Date: Tue, 7 Feb 2012 15:03:11 +0100
Subject: [PATCH 02/12] imsm: FIX: No new missing disks are allowed during
general migration
When during incremental assembly general migration is in progress,
starting degraded array causes that no more disks (even present)
can be added later as array is already started.
Request all previously present disks during general migration for assembly.
Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
---
super-intel.c | 12 +++++++++++-
1 files changed, 11 insertions(+), 1 deletions(-)
diff --git a/super-intel.c b/super-intel.c
index eba11d6..17034bb 100644
--- a/super-intel.c
+++ b/super-intel.c
@@ -2683,7 +2683,17 @@ static void getinfo_super_imsm(struct supertype *st, struct mdinfo *info, char *
enough = 0;
else /* we're normal, or already degraded */
enough = 1;
-
+ if (is_gen_migration(dev) && missing) {
+ /* during general migration we need all disks
+ * that process is running on.
+ * No new missing disk is allowed.
+ */
+ max_enough = -1;
+ enough = -1;
+ /* no more checks necessary
+ */
+ break;
+ }
/* in the missing/failed disk case check to see
* if at least one array is runnable
*/
--
1.7.4.4

View File

@ -0,0 +1,29 @@
From bf5cf7c705f292a070746c83f9dd00d7662f458d Mon Sep 17 00:00:00 2001
From: Adam Kwolek <adam.kwolek@intel.com>
Date: Tue, 7 Feb 2012 15:03:27 +0100
Subject: [PATCH 04/12] imsm: FIX: imsm_get_allowed_degradation() doesn't
count degradation for raid1
Missing case raid1 added to function.
Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
---
super-intel.c | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)
diff --git a/super-intel.c b/super-intel.c
index 17034bb..19a2c84 100644
--- a/super-intel.c
+++ b/super-intel.c
@@ -8771,6 +8771,7 @@ static int imsm_get_allowed_degradation(int level, int raid_disks,
struct imsm_dev *dev)
{
switch (level) {
+ case 1:
case 10:{
int ret_val = 0;
struct imsm_map *map;
--
1.7.4.4

View File

@ -0,0 +1,43 @@
From 6a75c8ca79b4cf89a5d1ac24b484b75e8a7e9fb4 Mon Sep 17 00:00:00 2001
From: Adam Kwolek <adam.kwolek@intel.com>
Date: Thu, 9 Feb 2012 12:36:42 +1100
Subject: [PATCH 09/12] imsm: FIX: use md position to reshape restart
When reshape is broken it can occur that metadata is not saved properly.
This can cause that reshape process is farther in md than metadata states.
On restart save checkpoint to store current position /probably farther/
that can be read from md.
Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
---
super-intel.c | 12 ++++++++++++
1 files changed, 12 insertions(+), 0 deletions(-)
diff --git a/super-intel.c b/super-intel.c
index f5762d8..5f451f3 100644
--- a/super-intel.c
+++ b/super-intel.c
@@ -10067,6 +10067,18 @@ static int imsm_manage_reshape(
"are present in copy area.\n");
goto abort;
}
+ /* Save checkpoint to update migration record for current
+ * reshape position (in md). It can be farther than current
+ * reshape position in metadata.
+ */
+ if (save_checkpoint_imsm(st, sra, UNIT_SRC_NORMAL) == 1) {
+ /* ignore error == 2, this can mean end of reshape here
+ */
+ dprintf("imsm: Cannot write checkpoint to "
+ "migration record (UNIT_SRC_NORMAL, "
+ "initial save)\n");
+ goto abort;
+ }
}
/* size for data */
--
1.7.4.4

View File

@ -0,0 +1,31 @@
From 30602f533f151c24fe1345f495f02c30f98895f1 Mon Sep 17 00:00:00 2001
From: Labun, Marcin <Marcin.Labun@intel.com>
Date: Mon, 30 Jan 2012 12:00:43 +1100
Subject: [PATCH] imsm: display fd in error trace when when store_imsm_mpb
failes
Signed-off-by: Marcin Labun <marcin.labun@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
---
super-intel.c | 5 +++--
1 files changed, 3 insertions(+), 2 deletions(-)
diff --git a/super-intel.c b/super-intel.c
index 7db5177..8d67a14 100644
--- a/super-intel.c
+++ b/super-intel.c
@@ -5027,8 +5027,9 @@ static int write_super_imsm(struct supertype *st, int doclose)
if (d->index < 0 || is_failed(&d->disk))
continue;
if (store_imsm_mpb(d->fd, mpb))
- fprintf(stderr, "%s: failed for device %d:%d %s\n",
- __func__, d->major, d->minor, strerror(errno));
+ fprintf(stderr, "%s: failed for device %d:%d (fd: %d)%s\n",
+ __func__, d->major, d->minor, d->fd, strerror(errno));
+
if (clear_migration_record) {
unsigned long long dsize;
--
1.7.4.4

View File

@ -1,7 +1,7 @@
Summary: The mdadm program controls Linux md devices (software RAID arrays)
Name: mdadm
Version: 3.2.3
Release: 7%{?dist}
Release: 8%{?dist}
Source: http://www.kernel.org/pub/linux/utils/raid/mdadm/mdadm-%{version}.tar.bz2
Source1: mdmonitor.init
Source2: raid-check
@ -22,8 +22,24 @@ Patch8: mdadm-3.2.3-super1-make-aread-awrite-always-use-an-aligned-buffe.pa
Patch9: mdadm-3.2.3-avoid-double-free-upon-old-buggy-kernel-sysfs_read-f.patch
Patch10: mdadm-3.2.3-Print-error-message-if-failing-to-write-super-for-1..patch
Patch11: mdadm-3.2.3-Incremental-fix-adding-devices-with-incremental.patch
Patch19: mdadm-3.2.3-udev.patch
Patch20: mdadm-2.5.2-static.patch
Patch12: mdadm-3.2.3-FIX-External-metadata-sometimes-is-not-updated.patch
Patch13: mdadm-3.2.3-FIX-mdmon-check-in-reshape_container-can-cause-a-pro.patch
Patch14: mdadm-3.2.3-imsm-display-fd-in-error-trace-when-when-store_imsm_.patch
Patch15: mdadm-3.2.3-FIX-NULL-pointer-to-strdup-can-be-passed.patch
Patch16: mdadm-3.2.3-imsm-FIX-No-new-missing-disks-are-allowed-during-gen.patch
Patch17: mdadm-3.2.3-FIX-Array-is-not-run-when-expansion-disks-are-added.patch
Patch18: mdadm-3.2.3-imsm-FIX-imsm_get_allowed_degradation-doesn-t-count-.patch
Patch19: mdadm-3.2.3-Fix-Sometimes-mdmon-throws-core-dump-during-reshape.patch
Patch20: mdadm-3.2.3-Flush-mdmon-before-next-reshape-step-during-containe.patch
Patch21: mdadm-3.2.3-imsm-FIX-Chunk-size-migration-problem.patch
Patch22: mdadm-3.2.3-FIX-use-md-position-to-reshape-restart.patch
Patch23: mdadm-3.2.3-imsm-FIX-use-md-position-to-reshape-restart.patch
Patch24: mdadm-3.2.3-imsm-FIX-Clear-migration-record-when-migration-switc.patch
Patch25: mdadm-3.2.3-FIX-restart-reshape-when-reshape-process-is-stopped-.patch
Patch26: mdadm-3.2.3-FIX-Do-not-try-to-continue-reshape-using-inactive-ar.patch
Patch27: mdadm-3.2.3-FIX-Changes-in-0-case-for-reshape-position-verificat.patch
Patch98: mdadm-3.2.3-udev.patch
Patch99: mdadm-2.5.2-static.patch
URL: http://www.kernel.org/pub/linux/utils/raid/mdadm/
License: GPLv2+
Group: System Environment/Base
@ -67,8 +83,26 @@ is not used as the system init process.
%patch9 -p1 -b .double
%patch10 -p1 -b .print
%patch11 -p1 -b .incremental
%patch19 -p1 -b .udev
%patch20 -p1 -b .static
%patch12 -p1 -b .update
%patch13 -p1 -b .mdmon
%patch14 -p1 -b .display
%patch15 -p1 -b .strdup
%patch16 -p1 -b .missing
%patch17 -p1 -b .exp
%patch18 -p1 -b .allowed
%patch19 -p1 -b .core
%patch20 -p1 -b .flush
%patch21 -p1 -b .chunk
%patch22 -p1 -b .position
%patch23 -p1 -b .reshape
%patch24 -p1 -b .record
%patch25 -p1 -b .restart
%patch26 -p1 -b .nocontinue
%patch27 -p1 -b .0case
# Fedora customization patches
%patch98 -p1 -b .udev
%patch99 -p1 -b .static
%build
make %{?_smp_mflags} CXFLAGS="$RPM_OPT_FLAGS" SYSCONFDIR="%{_sysconfdir}" mdadm mdmon
@ -141,6 +175,11 @@ fi
%{_initrddir}/*
%changelog
* Mon Apr 30 2012 Jes Sorensen <Jes.Sorensen@redhat.com> - 3.2.3-8
- Fix bug where IMSM arrays stay inactive in case a reboot is
- performed during the reshape process.
- Resolves: bz817522 (f17) bz817535 (f16) bz817537 (f15)
* Wed Mar 28 2012 Jes Sorensen <Jes.Sorensen@redhat.com> - 3.2.3-7
- Fix issue when re-adding drive to a raid1 array with bitmap
- Resolves: bz807743 (f17) bz769323 (f16) bz791159 (f15)