2020-10-15 12:45:57 +00:00
|
|
|
#!/bin/sh
|
2021-11-03 11:04:34 +00:00
|
|
|
#
|
2021-11-03 11:36:46 +00:00
|
|
|
# The main kdump routine in capture kernel, bash may not be the
|
|
|
|
# default shell. Any code added must be POSIX compliant.
|
2020-10-15 12:45:57 +00:00
|
|
|
|
|
|
|
. /lib/dracut-lib.sh
|
2021-11-03 11:05:42 +00:00
|
|
|
. /lib/kdump-logger.sh
|
2020-10-15 12:45:57 +00:00
|
|
|
. /lib/kdump-lib-initramfs.sh
|
|
|
|
|
2021-11-03 11:05:42 +00:00
|
|
|
#initiate the kdump logger
|
2021-11-03 11:10:54 +00:00
|
|
|
if ! dlog_init; then
|
2021-11-03 11:11:53 +00:00
|
|
|
echo "failed to initiate the kdump logger."
|
|
|
|
exit 1
|
2021-11-03 11:05:42 +00:00
|
|
|
fi
|
|
|
|
|
|
|
|
KDUMP_PATH="/var/crash"
|
|
|
|
KDUMP_LOG_FILE="/run/initramfs/kexec-dmesg.log"
|
Re-introduce vmcore creation notification to kdump
Upstream: fedora
Resolves: RHEL-70214
Conflict: Yes, the conflict is the same as the original c9s commit
c5aa4609 ("Introduce vmcore creation notification to kdump")
9ec61f6c ("Return the correct exit code of rebuild initrd")
Also this patch cherry-picked the ipv6 fixed in [1].
[1]: https://github.com/rhkdump/kdump-utils/pull/60/files
commit 24e76222c740def1d03a506652400fe55959e024
Author: Tao Liu <ltao@redhat.com>
Date: Fri Nov 29 16:15:18 2024 +1300
Re-introduce vmcore creation notification to kdump
Motivation
==========
People may forget to recheck to ensure kdump works, which as a result, a
possibility of no vmcores generated after a real system crash. It is
unexpected for kdump.
It is highly recommended people to test kdump after any system modification,
such as:
a. after kernel patching or whole yum update, as it might break something
on which kdump is dependent, maybe due to introduction of any new bug etc.
b. after any change at hardware level, maybe storage, networking,
firmware upgrading etc.
c. after implementing any new application, like which involves 3rd party modules
etc.
Though these exceed the range of kdump, however a simple vmcore creation
status notification is good to have for now.
Design
======
Kdump currently will check any relating files/fs/drivers modified before
determine if initrd should rebuild when (re)start. A rebuild is an
indicator of such modification, and kdump need to be tested. This will
clear the vmcore creation status specified in $VMCORE_CREATION_STATUS,
and as a result, a notification of vmcore creation test will be
outputted.
To test kdump, there is an entry for doing that by "kdumpctl test". It
will generate a timestamp string as the ID of the current test, along
with a "pending" status in $VMCORE_CREATION_STATUS, then a real crash &
dump process will be triggered.
After system reboot back to normal, a vmcore creation check will start at
"kdumpctl (re)start/status", and will report the results as
success/fail/manual status to users.
To achieve that, program will first check the status in $VMCORE_CREATION_STATUS.
If "pending" status if found, which means the test result is
undetermined and need a retrive from remote/local dump folder. Then if test
id is found in the dump folder and vmcore is complete, then "pending"
would be overwritten by "success", which indicates a successful kdump
test. If test id is found in the dump folder but vmcore is incomplete,
then it is a "fail" kdump test. If no test id is found, then it is a "manual"
status, which indicates users should check the test results manually.
If $VMCORE_CREATION_STATUS is already success/fail/manual status, it indicates
the test result has already been determined, so the program will not access
the remote/local dump folder again. This can limite any unnecessary
access to dump target, shorten the time consumption.
User should check for the root cause of fail/manual status when get
reports.
$VMCORE_CREATION_STATUS is used for recording the vmcore creation status of
the current env. The format is like:
<status> kdump_test_id=<timestamp sec>-<timestamp nanosec>
e.g:
success kdump_test_id=1729823462-938751820
Which means, there has been a successful kdump test at
$(date -d "@1729823462") timestamp for the current env. Timestamp
nanosec is only meaningful for uniquify id string.
Difference
==========
Previously there is one commit 88525ebf ("Introduce vmcore creation
notification to kdump") merged and addressing the same issue, but
implemented differently:
The prev one:
Save the $VMCORE_CREATION_STATUS to local drive during the 2nd kernel
dumping. If vmcore dumping target is different from $VMCORE_CREATION_STATUS's
drive, then the latter one need to be mounted in 2nd kernel.
This one:
Save the $VMCORE_CREATION_STATUS to local drive only in 1nd kernel, that
is, the test result is retrived after 2nd kernel dumping. So it doesn't
load or mount other drive in 2nd kernel.
The advantage:
Extra mounting in 2nd kernel will introduce higher risk of failure,
as a result, lower the success of vmcore dumping, which is
unaccepted. So keep the code for 2nd kernel as simple is preferred.
Usage
=====
[root@localhost ~]# kdumpctl restart
kdump: kexec: unloaded kdump kernel
kdump: Stopping kdump: [OK]
kdump: kexec: loaded kdump kernel
kdump: Starting kdump: [OK]
kdump: Notice: No vmcore creation test performed!
[root@localhost ~]# kdumpctl status
kdump: Kdump is operational
kdump: Notice: No vmcore creation test performed!
[root@localhost ~]# kdumpctl test
[root@localhost ~]# cat /var/lib/kdump/vmcore-creation.status
pending kdump_test_id=1729823462-938751820
[root@localhost ~]# kdumpctl status
kdump: Kdump is operational
kdump: Notice: Last successful vmcore creation on Fri Oct 25 02:31:02 AM UTC 2024
[root@localhost ~]# cat /var/lib/kdump/vmcore-creation.status
success kdump_test_id=1729823462-938751820
[root@localhost ~]# kdumpctl restart
kdump: kexec: unloaded kdump kernel
kdump: Stopping kdump: [OK]
kdump: kexec: loaded kdump kernel
kdump: Starting kdump: [OK]
kdump: Notice: Last successful vmcore creation on Fri Oct 25 02:31:02 AM UTC 2024
Note: the notification for kdumpctl (re)start/status can be disabled by
setting VMCORE_CREATION_NOTIFICATION in /etc/sysconfig/kdump. And fadump
is NOT supported for this feature.
Signed-off-by: Tao Liu <ltao@redhat.com>
Signed-off-by: Tao Liu <ltao@redhat.com>
2024-12-05 04:08:38 +00:00
|
|
|
KDUMP_TEST_ID=""
|
|
|
|
KDUMP_TEST_STATUS=""
|
2021-11-03 11:05:42 +00:00
|
|
|
CORE_COLLECTOR=""
|
|
|
|
DEFAULT_CORE_COLLECTOR="makedumpfile -l --message-level 7 -d 31"
|
|
|
|
DMESG_COLLECTOR="/sbin/vmcore-dmesg"
|
|
|
|
FAILURE_ACTION="systemctl reboot -f"
|
2021-11-03 11:10:54 +00:00
|
|
|
DATEDIR=$(date +%Y-%m-%d-%T)
|
2021-11-03 11:05:42 +00:00
|
|
|
HOST_IP='127.0.0.1'
|
|
|
|
DUMP_INSTRUCTION=""
|
|
|
|
SSH_KEY_LOCATION="/root/.ssh/kdump_id_rsa"
|
|
|
|
DD_BLKSIZE=512
|
|
|
|
FINAL_ACTION="systemctl reboot -f"
|
|
|
|
KDUMP_PRE=""
|
|
|
|
KDUMP_POST=""
|
|
|
|
NEWROOT="/sysroot"
|
|
|
|
OPALCORE="/sys/firmware/opal/mpipl/core"
|
2021-11-03 11:10:54 +00:00
|
|
|
KDUMP_CONF_PARSED="/tmp/kdump.conf.$$"
|
2021-11-03 11:05:42 +00:00
|
|
|
|
2021-11-03 11:10:12 +00:00
|
|
|
# POSIX doesn't have pipefail, only apply when using bash
|
|
|
|
# shellcheck disable=SC3040
|
|
|
|
[ -n "$BASH" ] && set -o pipefail
|
|
|
|
|
2020-10-15 12:45:57 +00:00
|
|
|
DUMP_RETVAL=0
|
|
|
|
|
2021-11-03 11:10:54 +00:00
|
|
|
kdump_read_conf > $KDUMP_CONF_PARSED
|
|
|
|
|
2021-11-03 11:05:42 +00:00
|
|
|
get_kdump_confs()
|
|
|
|
{
|
2021-11-03 11:11:53 +00:00
|
|
|
while read -r config_opt config_val; do
|
|
|
|
# remove inline comments after the end of a directive.
|
|
|
|
case "$config_opt" in
|
|
|
|
path)
|
|
|
|
KDUMP_PATH="$config_val"
|
|
|
|
;;
|
|
|
|
core_collector)
|
|
|
|
[ -n "$config_val" ] && CORE_COLLECTOR="$config_val"
|
|
|
|
;;
|
|
|
|
sshkey)
|
|
|
|
if [ -f "$config_val" ]; then
|
|
|
|
SSH_KEY_LOCATION=$config_val
|
|
|
|
fi
|
|
|
|
;;
|
|
|
|
kdump_pre)
|
|
|
|
KDUMP_PRE="$config_val"
|
|
|
|
;;
|
|
|
|
kdump_post)
|
|
|
|
KDUMP_POST="$config_val"
|
|
|
|
;;
|
|
|
|
fence_kdump_args)
|
|
|
|
FENCE_KDUMP_ARGS="$config_val"
|
|
|
|
;;
|
|
|
|
fence_kdump_nodes)
|
|
|
|
FENCE_KDUMP_NODES="$config_val"
|
|
|
|
;;
|
|
|
|
failure_action | default)
|
|
|
|
case $config_val in
|
|
|
|
shell)
|
|
|
|
FAILURE_ACTION="kdump_emergency_shell"
|
|
|
|
;;
|
|
|
|
reboot)
|
|
|
|
FAILURE_ACTION="systemctl reboot -f && exit"
|
|
|
|
;;
|
|
|
|
halt)
|
|
|
|
FAILURE_ACTION="halt && exit"
|
|
|
|
;;
|
|
|
|
poweroff)
|
|
|
|
FAILURE_ACTION="systemctl poweroff -f && exit"
|
|
|
|
;;
|
|
|
|
dump_to_rootfs)
|
|
|
|
FAILURE_ACTION="dump_to_rootfs"
|
|
|
|
;;
|
|
|
|
esac
|
|
|
|
;;
|
|
|
|
final_action)
|
|
|
|
case $config_val in
|
|
|
|
reboot)
|
|
|
|
FINAL_ACTION="systemctl reboot -f"
|
|
|
|
;;
|
|
|
|
halt)
|
|
|
|
FINAL_ACTION="halt"
|
|
|
|
;;
|
|
|
|
poweroff)
|
|
|
|
FINAL_ACTION="systemctl poweroff -f"
|
|
|
|
;;
|
|
|
|
esac
|
|
|
|
;;
|
|
|
|
esac
|
|
|
|
done < "$KDUMP_CONF_PARSED"
|
|
|
|
|
|
|
|
if [ -z "$CORE_COLLECTOR" ]; then
|
|
|
|
CORE_COLLECTOR="$DEFAULT_CORE_COLLECTOR"
|
|
|
|
if is_ssh_dump_target || is_raw_dump_target; then
|
|
|
|
CORE_COLLECTOR="$CORE_COLLECTOR -F"
|
|
|
|
fi
|
|
|
|
fi
|
2021-11-03 11:05:42 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
# store the kexec kernel log to a file.
|
|
|
|
save_log()
|
|
|
|
{
|
2021-11-03 11:11:53 +00:00
|
|
|
dmesg -T > $KDUMP_LOG_FILE
|
2021-11-03 11:05:42 +00:00
|
|
|
|
2021-11-03 11:11:53 +00:00
|
|
|
if command -v journalctl > /dev/null; then
|
|
|
|
journalctl -ab >> $KDUMP_LOG_FILE
|
|
|
|
fi
|
|
|
|
chmod 600 $KDUMP_LOG_FILE
|
2021-11-03 11:05:42 +00:00
|
|
|
}
|
|
|
|
|
2021-11-03 11:10:54 +00:00
|
|
|
# $1: dump path, must be a mount point
|
2021-11-03 11:05:42 +00:00
|
|
|
dump_fs()
|
|
|
|
{
|
2021-11-03 11:11:53 +00:00
|
|
|
ddebug "dump_fs _mp=$1"
|
|
|
|
|
|
|
|
if ! is_mounted "$1"; then
|
|
|
|
dinfo "dump path '$1' is not mounted, trying to mount..."
|
|
|
|
if ! mount --target "$1"; then
|
|
|
|
derror "failed to dump to '$1', it's not a mount point!"
|
|
|
|
return 1
|
|
|
|
fi
|
|
|
|
fi
|
|
|
|
|
|
|
|
# Remove -F in makedumpfile case. We don't want a flat format dump here.
|
|
|
|
case $CORE_COLLECTOR in
|
|
|
|
*makedumpfile*)
|
|
|
|
CORE_COLLECTOR=$(echo "$CORE_COLLECTOR" | sed -e "s/-F//g")
|
|
|
|
;;
|
|
|
|
esac
|
|
|
|
|
Re-introduce vmcore creation notification to kdump
Upstream: fedora
Resolves: RHEL-70214
Conflict: Yes, the conflict is the same as the original c9s commit
c5aa4609 ("Introduce vmcore creation notification to kdump")
9ec61f6c ("Return the correct exit code of rebuild initrd")
Also this patch cherry-picked the ipv6 fixed in [1].
[1]: https://github.com/rhkdump/kdump-utils/pull/60/files
commit 24e76222c740def1d03a506652400fe55959e024
Author: Tao Liu <ltao@redhat.com>
Date: Fri Nov 29 16:15:18 2024 +1300
Re-introduce vmcore creation notification to kdump
Motivation
==========
People may forget to recheck to ensure kdump works, which as a result, a
possibility of no vmcores generated after a real system crash. It is
unexpected for kdump.
It is highly recommended people to test kdump after any system modification,
such as:
a. after kernel patching or whole yum update, as it might break something
on which kdump is dependent, maybe due to introduction of any new bug etc.
b. after any change at hardware level, maybe storage, networking,
firmware upgrading etc.
c. after implementing any new application, like which involves 3rd party modules
etc.
Though these exceed the range of kdump, however a simple vmcore creation
status notification is good to have for now.
Design
======
Kdump currently will check any relating files/fs/drivers modified before
determine if initrd should rebuild when (re)start. A rebuild is an
indicator of such modification, and kdump need to be tested. This will
clear the vmcore creation status specified in $VMCORE_CREATION_STATUS,
and as a result, a notification of vmcore creation test will be
outputted.
To test kdump, there is an entry for doing that by "kdumpctl test". It
will generate a timestamp string as the ID of the current test, along
with a "pending" status in $VMCORE_CREATION_STATUS, then a real crash &
dump process will be triggered.
After system reboot back to normal, a vmcore creation check will start at
"kdumpctl (re)start/status", and will report the results as
success/fail/manual status to users.
To achieve that, program will first check the status in $VMCORE_CREATION_STATUS.
If "pending" status if found, which means the test result is
undetermined and need a retrive from remote/local dump folder. Then if test
id is found in the dump folder and vmcore is complete, then "pending"
would be overwritten by "success", which indicates a successful kdump
test. If test id is found in the dump folder but vmcore is incomplete,
then it is a "fail" kdump test. If no test id is found, then it is a "manual"
status, which indicates users should check the test results manually.
If $VMCORE_CREATION_STATUS is already success/fail/manual status, it indicates
the test result has already been determined, so the program will not access
the remote/local dump folder again. This can limite any unnecessary
access to dump target, shorten the time consumption.
User should check for the root cause of fail/manual status when get
reports.
$VMCORE_CREATION_STATUS is used for recording the vmcore creation status of
the current env. The format is like:
<status> kdump_test_id=<timestamp sec>-<timestamp nanosec>
e.g:
success kdump_test_id=1729823462-938751820
Which means, there has been a successful kdump test at
$(date -d "@1729823462") timestamp for the current env. Timestamp
nanosec is only meaningful for uniquify id string.
Difference
==========
Previously there is one commit 88525ebf ("Introduce vmcore creation
notification to kdump") merged and addressing the same issue, but
implemented differently:
The prev one:
Save the $VMCORE_CREATION_STATUS to local drive during the 2nd kernel
dumping. If vmcore dumping target is different from $VMCORE_CREATION_STATUS's
drive, then the latter one need to be mounted in 2nd kernel.
This one:
Save the $VMCORE_CREATION_STATUS to local drive only in 1nd kernel, that
is, the test result is retrived after 2nd kernel dumping. So it doesn't
load or mount other drive in 2nd kernel.
The advantage:
Extra mounting in 2nd kernel will introduce higher risk of failure,
as a result, lower the success of vmcore dumping, which is
unaccepted. So keep the code for 2nd kernel as simple is preferred.
Usage
=====
[root@localhost ~]# kdumpctl restart
kdump: kexec: unloaded kdump kernel
kdump: Stopping kdump: [OK]
kdump: kexec: loaded kdump kernel
kdump: Starting kdump: [OK]
kdump: Notice: No vmcore creation test performed!
[root@localhost ~]# kdumpctl status
kdump: Kdump is operational
kdump: Notice: No vmcore creation test performed!
[root@localhost ~]# kdumpctl test
[root@localhost ~]# cat /var/lib/kdump/vmcore-creation.status
pending kdump_test_id=1729823462-938751820
[root@localhost ~]# kdumpctl status
kdump: Kdump is operational
kdump: Notice: Last successful vmcore creation on Fri Oct 25 02:31:02 AM UTC 2024
[root@localhost ~]# cat /var/lib/kdump/vmcore-creation.status
success kdump_test_id=1729823462-938751820
[root@localhost ~]# kdumpctl restart
kdump: kexec: unloaded kdump kernel
kdump: Stopping kdump: [OK]
kdump: kexec: loaded kdump kernel
kdump: Starting kdump: [OK]
kdump: Notice: Last successful vmcore creation on Fri Oct 25 02:31:02 AM UTC 2024
Note: the notification for kdumpctl (re)start/status can be disabled by
setting VMCORE_CREATION_NOTIFICATION in /etc/sysconfig/kdump. And fadump
is NOT supported for this feature.
Signed-off-by: Tao Liu <ltao@redhat.com>
Signed-off-by: Tao Liu <ltao@redhat.com>
2024-12-05 04:08:38 +00:00
|
|
|
if [ -z "$KDUMP_TEST_ID" ]; then
|
|
|
|
_dump_fs_path=$(echo "$1/$KDUMP_PATH/$HOST_IP-$DATEDIR/" | tr -s /)
|
|
|
|
else
|
|
|
|
_dump_fs_path=$(echo "$1/$KDUMP_PATH/" | tr -s /)
|
|
|
|
fi
|
|
|
|
|
2021-11-03 11:11:53 +00:00
|
|
|
dinfo "saving to $_dump_fs_path"
|
|
|
|
|
|
|
|
# Only remount to read-write mode if the dump target is mounted read-only.
|
|
|
|
_dump_mnt_op=$(get_mount_info OPTIONS target "$1" -f)
|
|
|
|
case $_dump_mnt_op in
|
|
|
|
ro*)
|
|
|
|
dinfo "Remounting the dump target in rw mode."
|
|
|
|
mount -o remount,rw "$1" || return 1
|
|
|
|
;;
|
|
|
|
esac
|
|
|
|
|
|
|
|
mkdir -p "$_dump_fs_path" || return 1
|
|
|
|
|
|
|
|
save_vmcore_dmesg_fs ${DMESG_COLLECTOR} "$_dump_fs_path"
|
|
|
|
save_opalcore_fs "$_dump_fs_path"
|
|
|
|
|
|
|
|
dinfo "saving vmcore"
|
|
|
|
$CORE_COLLECTOR /proc/vmcore "$_dump_fs_path/vmcore-incomplete"
|
|
|
|
_dump_exitcode=$?
|
|
|
|
if [ $_dump_exitcode -eq 0 ]; then
|
2022-11-09 07:54:14 +00:00
|
|
|
sync -f "$_dump_fs_path/vmcore-incomplete"
|
|
|
|
_sync_exitcode=$?
|
|
|
|
if [ $_sync_exitcode -eq 0 ]; then
|
|
|
|
mv "$_dump_fs_path/vmcore-incomplete" "$_dump_fs_path/vmcore"
|
|
|
|
dinfo "saving vmcore complete"
|
|
|
|
else
|
|
|
|
derror "sync vmcore failed, exitcode:$_sync_exitcode"
|
|
|
|
return 1
|
|
|
|
fi
|
2021-11-03 11:11:53 +00:00
|
|
|
else
|
|
|
|
derror "saving vmcore failed, exitcode:$_dump_exitcode"
|
|
|
|
fi
|
|
|
|
|
|
|
|
dinfo "saving the $KDUMP_LOG_FILE to $_dump_fs_path/"
|
|
|
|
save_log
|
|
|
|
mv "$KDUMP_LOG_FILE" "$_dump_fs_path/"
|
|
|
|
if [ $_dump_exitcode -ne 0 ]; then
|
|
|
|
return 1
|
|
|
|
fi
|
|
|
|
|
|
|
|
# improper kernel cmdline can cause the failure of echo, we can ignore this kind of failure
|
|
|
|
return 0
|
2021-11-03 11:05:42 +00:00
|
|
|
}
|
|
|
|
|
2021-11-03 11:10:54 +00:00
|
|
|
# $1: dmesg collector
|
|
|
|
# $2: dump path
|
2021-11-03 11:11:53 +00:00
|
|
|
save_vmcore_dmesg_fs()
|
|
|
|
{
|
|
|
|
dinfo "saving vmcore-dmesg.txt to $2"
|
|
|
|
if $1 /proc/vmcore > "$2/vmcore-dmesg-incomplete.txt"; then
|
|
|
|
mv "$2/vmcore-dmesg-incomplete.txt" "$2/vmcore-dmesg.txt"
|
|
|
|
chmod 600 "$2/vmcore-dmesg.txt"
|
|
|
|
|
|
|
|
# Make sure file is on disk. There have been instances where later
|
|
|
|
# saving vmcore failed and system rebooted without sync and there
|
|
|
|
# was no vmcore-dmesg.txt available.
|
|
|
|
sync
|
|
|
|
dinfo "saving vmcore-dmesg.txt complete"
|
|
|
|
else
|
|
|
|
if [ -f "$2/vmcore-dmesg-incomplete.txt" ]; then
|
|
|
|
chmod 600 "$2/vmcore-dmesg-incomplete.txt"
|
|
|
|
fi
|
|
|
|
derror "saving vmcore-dmesg.txt failed"
|
|
|
|
fi
|
2021-11-03 11:05:42 +00:00
|
|
|
}
|
|
|
|
|
2021-11-03 11:10:54 +00:00
|
|
|
# $1: dump path
|
2021-11-03 11:11:53 +00:00
|
|
|
save_opalcore_fs()
|
|
|
|
{
|
|
|
|
if [ ! -f $OPALCORE ]; then
|
|
|
|
# Check if we are on an old kernel that uses a different path
|
|
|
|
if [ -f /sys/firmware/opal/core ]; then
|
|
|
|
OPALCORE="/sys/firmware/opal/core"
|
|
|
|
else
|
|
|
|
return 0
|
|
|
|
fi
|
|
|
|
fi
|
|
|
|
|
|
|
|
dinfo "saving opalcore:$OPALCORE to $1/opalcore"
|
|
|
|
if ! cp $OPALCORE "$1/opalcore"; then
|
|
|
|
derror "saving opalcore failed"
|
|
|
|
return 1
|
|
|
|
fi
|
|
|
|
|
|
|
|
sync
|
|
|
|
dinfo "saving opalcore complete"
|
|
|
|
return 0
|
2021-11-03 11:05:42 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
dump_to_rootfs()
|
|
|
|
{
|
|
|
|
|
2021-11-03 11:11:53 +00:00
|
|
|
if [ "$(systemctl status dracut-initqueue | sed -n "s/^\s*Active: \(\S*\)\s.*$/\1/p")" = "inactive" ]; then
|
|
|
|
dinfo "Trying to bring up initqueue for rootfs mount"
|
|
|
|
systemctl start dracut-initqueue
|
|
|
|
fi
|
|
|
|
|
|
|
|
dinfo "Clean up dead systemd services"
|
|
|
|
systemctl cancel
|
|
|
|
dinfo "Waiting for rootfs mount, will timeout after 90 seconds"
|
|
|
|
systemctl start --no-block sysroot.mount
|
|
|
|
|
|
|
|
_loop=0
|
|
|
|
while [ $_loop -lt 90 ] && ! is_mounted /sysroot; do
|
|
|
|
sleep 1
|
|
|
|
_loop=$((_loop + 1))
|
|
|
|
done
|
|
|
|
|
|
|
|
if ! is_mounted /sysroot; then
|
|
|
|
derror "Failed to mount rootfs"
|
|
|
|
return
|
|
|
|
fi
|
|
|
|
|
|
|
|
ddebug "NEWROOT=$NEWROOT"
|
|
|
|
dump_fs $NEWROOT
|
2021-11-03 11:05:42 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
kdump_emergency_shell()
|
|
|
|
{
|
2021-11-03 11:11:53 +00:00
|
|
|
ddebug "Switching to kdump emergency shell..."
|
|
|
|
|
|
|
|
[ -f /etc/profile ] && . /etc/profile
|
|
|
|
export PS1='kdump:${PWD}# '
|
|
|
|
|
|
|
|
. /lib/dracut-lib.sh
|
|
|
|
if [ -f /dracut-state.sh ]; then
|
|
|
|
. /dracut-state.sh 2> /dev/null
|
|
|
|
fi
|
|
|
|
|
|
|
|
source_conf /etc/conf.d
|
|
|
|
|
|
|
|
type plymouth > /dev/null 2>&1 && plymouth quit
|
|
|
|
|
|
|
|
source_hook "emergency"
|
|
|
|
while read -r _tty rest; do
|
|
|
|
(
|
|
|
|
echo
|
|
|
|
echo
|
|
|
|
echo 'Entering kdump emergency mode.'
|
|
|
|
echo 'Type "journalctl" to view system logs.'
|
|
|
|
echo 'Type "rdsosreport" to generate a sosreport, you can then'
|
|
|
|
echo 'save it elsewhere and attach it to a bug report.'
|
|
|
|
echo
|
|
|
|
echo
|
|
|
|
) > "/dev/$_tty"
|
|
|
|
done < /proc/consoles
|
|
|
|
sh -i -l
|
|
|
|
/bin/rm -f -- /.console_lock
|
2021-11-03 11:05:42 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
do_failure_action()
|
|
|
|
{
|
2021-11-03 11:11:53 +00:00
|
|
|
dinfo "Executing failure action $FAILURE_ACTION"
|
|
|
|
eval $FAILURE_ACTION
|
2021-11-03 11:05:42 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
do_final_action()
|
|
|
|
{
|
2021-11-03 11:11:53 +00:00
|
|
|
dinfo "Executing final action $FINAL_ACTION"
|
|
|
|
eval $FINAL_ACTION
|
2021-11-03 11:05:42 +00:00
|
|
|
}
|
2021-11-03 11:10:54 +00:00
|
|
|
|
2020-10-15 12:45:57 +00:00
|
|
|
do_dump()
|
|
|
|
{
|
2024-12-05 01:51:26 +00:00
|
|
|
eval $DUMP_INSTRUCTION
|
2021-11-03 11:11:53 +00:00
|
|
|
_ret=$?
|
2020-10-15 12:45:57 +00:00
|
|
|
|
2021-11-03 11:11:53 +00:00
|
|
|
if [ $_ret -ne 0 ]; then
|
|
|
|
derror "saving vmcore failed"
|
|
|
|
fi
|
2020-10-15 12:45:57 +00:00
|
|
|
|
2021-11-03 11:11:53 +00:00
|
|
|
return $_ret
|
2020-10-15 12:45:57 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
do_kdump_pre()
|
|
|
|
{
|
2021-11-03 11:11:53 +00:00
|
|
|
if [ -n "$KDUMP_PRE" ]; then
|
|
|
|
"$KDUMP_PRE"
|
|
|
|
_ret=$?
|
|
|
|
if [ $_ret -ne 0 ]; then
|
|
|
|
derror "$KDUMP_PRE exited with $_ret status"
|
|
|
|
return $_ret
|
|
|
|
fi
|
|
|
|
fi
|
|
|
|
|
|
|
|
# if any script fails, it just raises warning and continues
|
|
|
|
if [ -d /etc/kdump/pre.d ]; then
|
|
|
|
for file in /etc/kdump/pre.d/*; do
|
|
|
|
"$file"
|
|
|
|
_ret=$?
|
|
|
|
if [ $_ret -ne 0 ]; then
|
|
|
|
derror "$file exited with $_ret status"
|
|
|
|
fi
|
|
|
|
done
|
|
|
|
fi
|
|
|
|
return 0
|
2020-10-15 12:45:57 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
do_kdump_post()
|
|
|
|
{
|
2021-11-03 11:11:53 +00:00
|
|
|
if [ -d /etc/kdump/post.d ]; then
|
|
|
|
for file in /etc/kdump/post.d/*; do
|
|
|
|
"$file" "$1"
|
|
|
|
_ret=$?
|
|
|
|
if [ $_ret -ne 0 ]; then
|
|
|
|
derror "$file exited with $_ret status"
|
|
|
|
fi
|
|
|
|
done
|
|
|
|
fi
|
|
|
|
|
|
|
|
if [ -n "$KDUMP_POST" ]; then
|
|
|
|
"$KDUMP_POST" "$1"
|
|
|
|
_ret=$?
|
|
|
|
if [ $_ret -ne 0 ]; then
|
|
|
|
derror "$KDUMP_POST exited with $_ret status"
|
|
|
|
fi
|
|
|
|
fi
|
2020-10-15 12:45:57 +00:00
|
|
|
}
|
|
|
|
|
2021-11-03 11:10:54 +00:00
|
|
|
# $1: block target, eg. /dev/sda
|
2020-10-15 12:45:57 +00:00
|
|
|
dump_raw()
|
|
|
|
{
|
2021-11-03 11:11:53 +00:00
|
|
|
[ -b "$1" ] || return 1
|
2020-10-15 12:45:57 +00:00
|
|
|
|
2021-11-03 11:11:53 +00:00
|
|
|
dinfo "saving to raw disk $1"
|
2020-10-15 12:45:57 +00:00
|
|
|
|
2021-11-03 11:11:53 +00:00
|
|
|
if ! echo "$CORE_COLLECTOR" | grep -q makedumpfile; then
|
|
|
|
_src_size=$(stat --format %s /proc/vmcore)
|
|
|
|
_src_size_mb=$((_src_size / 1048576))
|
|
|
|
/kdumpscripts/monitor_dd_progress $_src_size_mb &
|
|
|
|
fi
|
2020-10-15 12:45:57 +00:00
|
|
|
|
2021-11-03 11:11:53 +00:00
|
|
|
dinfo "saving vmcore"
|
|
|
|
$CORE_COLLECTOR /proc/vmcore | dd of="$1" bs=$DD_BLKSIZE >> /tmp/dd_progress_file 2>&1 || return 1
|
|
|
|
sync
|
2020-10-15 12:45:57 +00:00
|
|
|
|
2021-11-03 11:11:53 +00:00
|
|
|
dinfo "saving vmcore complete"
|
|
|
|
return 0
|
2020-10-15 12:45:57 +00:00
|
|
|
}
|
|
|
|
|
2021-11-03 11:10:54 +00:00
|
|
|
# $1: ssh key file
|
|
|
|
# $2: ssh address in <user>@<host> format
|
2020-10-15 12:45:57 +00:00
|
|
|
dump_ssh()
|
|
|
|
{
|
2021-11-03 11:11:53 +00:00
|
|
|
_ret=0
|
|
|
|
_ssh_opt="-i $1 -o BatchMode=yes -o StrictHostKeyChecking=yes"
|
Re-introduce vmcore creation notification to kdump
Upstream: fedora
Resolves: RHEL-70214
Conflict: Yes, the conflict is the same as the original c9s commit
c5aa4609 ("Introduce vmcore creation notification to kdump")
9ec61f6c ("Return the correct exit code of rebuild initrd")
Also this patch cherry-picked the ipv6 fixed in [1].
[1]: https://github.com/rhkdump/kdump-utils/pull/60/files
commit 24e76222c740def1d03a506652400fe55959e024
Author: Tao Liu <ltao@redhat.com>
Date: Fri Nov 29 16:15:18 2024 +1300
Re-introduce vmcore creation notification to kdump
Motivation
==========
People may forget to recheck to ensure kdump works, which as a result, a
possibility of no vmcores generated after a real system crash. It is
unexpected for kdump.
It is highly recommended people to test kdump after any system modification,
such as:
a. after kernel patching or whole yum update, as it might break something
on which kdump is dependent, maybe due to introduction of any new bug etc.
b. after any change at hardware level, maybe storage, networking,
firmware upgrading etc.
c. after implementing any new application, like which involves 3rd party modules
etc.
Though these exceed the range of kdump, however a simple vmcore creation
status notification is good to have for now.
Design
======
Kdump currently will check any relating files/fs/drivers modified before
determine if initrd should rebuild when (re)start. A rebuild is an
indicator of such modification, and kdump need to be tested. This will
clear the vmcore creation status specified in $VMCORE_CREATION_STATUS,
and as a result, a notification of vmcore creation test will be
outputted.
To test kdump, there is an entry for doing that by "kdumpctl test". It
will generate a timestamp string as the ID of the current test, along
with a "pending" status in $VMCORE_CREATION_STATUS, then a real crash &
dump process will be triggered.
After system reboot back to normal, a vmcore creation check will start at
"kdumpctl (re)start/status", and will report the results as
success/fail/manual status to users.
To achieve that, program will first check the status in $VMCORE_CREATION_STATUS.
If "pending" status if found, which means the test result is
undetermined and need a retrive from remote/local dump folder. Then if test
id is found in the dump folder and vmcore is complete, then "pending"
would be overwritten by "success", which indicates a successful kdump
test. If test id is found in the dump folder but vmcore is incomplete,
then it is a "fail" kdump test. If no test id is found, then it is a "manual"
status, which indicates users should check the test results manually.
If $VMCORE_CREATION_STATUS is already success/fail/manual status, it indicates
the test result has already been determined, so the program will not access
the remote/local dump folder again. This can limite any unnecessary
access to dump target, shorten the time consumption.
User should check for the root cause of fail/manual status when get
reports.
$VMCORE_CREATION_STATUS is used for recording the vmcore creation status of
the current env. The format is like:
<status> kdump_test_id=<timestamp sec>-<timestamp nanosec>
e.g:
success kdump_test_id=1729823462-938751820
Which means, there has been a successful kdump test at
$(date -d "@1729823462") timestamp for the current env. Timestamp
nanosec is only meaningful for uniquify id string.
Difference
==========
Previously there is one commit 88525ebf ("Introduce vmcore creation
notification to kdump") merged and addressing the same issue, but
implemented differently:
The prev one:
Save the $VMCORE_CREATION_STATUS to local drive during the 2nd kernel
dumping. If vmcore dumping target is different from $VMCORE_CREATION_STATUS's
drive, then the latter one need to be mounted in 2nd kernel.
This one:
Save the $VMCORE_CREATION_STATUS to local drive only in 1nd kernel, that
is, the test result is retrived after 2nd kernel dumping. So it doesn't
load or mount other drive in 2nd kernel.
The advantage:
Extra mounting in 2nd kernel will introduce higher risk of failure,
as a result, lower the success of vmcore dumping, which is
unaccepted. So keep the code for 2nd kernel as simple is preferred.
Usage
=====
[root@localhost ~]# kdumpctl restart
kdump: kexec: unloaded kdump kernel
kdump: Stopping kdump: [OK]
kdump: kexec: loaded kdump kernel
kdump: Starting kdump: [OK]
kdump: Notice: No vmcore creation test performed!
[root@localhost ~]# kdumpctl status
kdump: Kdump is operational
kdump: Notice: No vmcore creation test performed!
[root@localhost ~]# kdumpctl test
[root@localhost ~]# cat /var/lib/kdump/vmcore-creation.status
pending kdump_test_id=1729823462-938751820
[root@localhost ~]# kdumpctl status
kdump: Kdump is operational
kdump: Notice: Last successful vmcore creation on Fri Oct 25 02:31:02 AM UTC 2024
[root@localhost ~]# cat /var/lib/kdump/vmcore-creation.status
success kdump_test_id=1729823462-938751820
[root@localhost ~]# kdumpctl restart
kdump: kexec: unloaded kdump kernel
kdump: Stopping kdump: [OK]
kdump: kexec: loaded kdump kernel
kdump: Starting kdump: [OK]
kdump: Notice: Last successful vmcore creation on Fri Oct 25 02:31:02 AM UTC 2024
Note: the notification for kdumpctl (re)start/status can be disabled by
setting VMCORE_CREATION_NOTIFICATION in /etc/sysconfig/kdump. And fadump
is NOT supported for this feature.
Signed-off-by: Tao Liu <ltao@redhat.com>
Signed-off-by: Tao Liu <ltao@redhat.com>
2024-12-05 04:08:38 +00:00
|
|
|
if [ -z "$KDUMP_TEST_ID" ]; then
|
|
|
|
_ssh_dir="$KDUMP_PATH/$HOST_IP-$DATEDIR"
|
|
|
|
else
|
|
|
|
_ssh_dir="$KDUMP_PATH"
|
|
|
|
fi
|
|
|
|
|
2021-11-03 11:11:53 +00:00
|
|
|
if is_ipv6_address "$2"; then
|
|
|
|
_scp_address=${2%@*}@"[${2#*@}]"
|
|
|
|
else
|
|
|
|
_scp_address=$2
|
|
|
|
fi
|
|
|
|
|
|
|
|
dinfo "saving to $2:$_ssh_dir"
|
|
|
|
|
|
|
|
cat /var/lib/random-seed > /dev/urandom
|
|
|
|
ssh -q $_ssh_opt "$2" mkdir -p "$_ssh_dir" || return 1
|
|
|
|
|
|
|
|
save_vmcore_dmesg_ssh "$DMESG_COLLECTOR" "$_ssh_dir" "$_ssh_opt" "$2"
|
|
|
|
dinfo "saving vmcore"
|
|
|
|
|
|
|
|
save_opalcore_ssh "$_ssh_dir" "$_ssh_opt" "$2" "$_scp_address"
|
|
|
|
|
|
|
|
if [ "${CORE_COLLECTOR%%[[:blank:]]*}" = "scp" ]; then
|
|
|
|
scp -q $_ssh_opt /proc/vmcore "$_scp_address:$_ssh_dir/vmcore-incomplete"
|
|
|
|
_ret=$?
|
|
|
|
_vmcore="vmcore"
|
|
|
|
else
|
|
|
|
$CORE_COLLECTOR /proc/vmcore | ssh $_ssh_opt "$2" "umask 0077 && dd bs=512 of='$_ssh_dir/vmcore-incomplete'"
|
|
|
|
_ret=$?
|
|
|
|
_vmcore="vmcore.flat"
|
|
|
|
fi
|
|
|
|
|
|
|
|
if [ $_ret -eq 0 ]; then
|
|
|
|
ssh $_ssh_opt "$2" "mv '$_ssh_dir/vmcore-incomplete' '$_ssh_dir/$_vmcore'"
|
|
|
|
_ret=$?
|
|
|
|
if [ $_ret -ne 0 ]; then
|
|
|
|
derror "moving vmcore failed, exitcode:$_ret"
|
|
|
|
else
|
|
|
|
dinfo "saving vmcore complete"
|
|
|
|
fi
|
|
|
|
else
|
|
|
|
derror "saving vmcore failed, exitcode:$_ret"
|
|
|
|
fi
|
|
|
|
|
|
|
|
dinfo "saving the $KDUMP_LOG_FILE to $2:$_ssh_dir/"
|
|
|
|
save_log
|
|
|
|
if ! scp -q $_ssh_opt $KDUMP_LOG_FILE "$_scp_address:$_ssh_dir/"; then
|
|
|
|
derror "saving log file failed, _exitcode:$_ret"
|
|
|
|
fi
|
|
|
|
|
|
|
|
return $_ret
|
2020-10-15 12:45:57 +00:00
|
|
|
}
|
|
|
|
|
2021-11-03 11:10:54 +00:00
|
|
|
# $1: dump path
|
|
|
|
# $2: ssh opts
|
|
|
|
# $3: ssh address in <user>@<host> format
|
|
|
|
# $4: scp address, similar with ssh address but IPv6 addresses are quoted
|
2021-11-03 11:11:53 +00:00
|
|
|
save_opalcore_ssh()
|
|
|
|
{
|
|
|
|
if [ ! -f $OPALCORE ]; then
|
|
|
|
# Check if we are on an old kernel that uses a different path
|
|
|
|
if [ -f /sys/firmware/opal/core ]; then
|
|
|
|
OPALCORE="/sys/firmware/opal/core"
|
|
|
|
else
|
|
|
|
return 0
|
|
|
|
fi
|
|
|
|
fi
|
|
|
|
|
|
|
|
dinfo "saving opalcore:$OPALCORE to $3:$1"
|
|
|
|
|
|
|
|
if ! scp $2 $OPALCORE "$4:$1/opalcore-incomplete"; then
|
|
|
|
derror "saving opalcore failed"
|
|
|
|
return 1
|
|
|
|
fi
|
|
|
|
|
|
|
|
ssh $2 "$3" mv "$1/opalcore-incomplete" "$1/opalcore"
|
|
|
|
dinfo "saving opalcore complete"
|
|
|
|
return 0
|
2020-10-15 12:45:57 +00:00
|
|
|
}
|
|
|
|
|
2021-11-03 11:10:54 +00:00
|
|
|
# $1: dmesg collector
|
|
|
|
# $2: dump path
|
|
|
|
# $3: ssh opts
|
|
|
|
# $4: ssh address in <user>@<host> format
|
2021-11-03 11:11:53 +00:00
|
|
|
save_vmcore_dmesg_ssh()
|
|
|
|
{
|
|
|
|
dinfo "saving vmcore-dmesg.txt to $4:$2"
|
|
|
|
if $1 /proc/vmcore | ssh $3 "$4" "umask 0077 && dd of='$2/vmcore-dmesg-incomplete.txt'"; then
|
|
|
|
ssh -q $3 "$4" mv "$2/vmcore-dmesg-incomplete.txt" "$2/vmcore-dmesg.txt"
|
|
|
|
dinfo "saving vmcore-dmesg.txt complete"
|
|
|
|
else
|
|
|
|
derror "saving vmcore-dmesg.txt failed"
|
|
|
|
fi
|
2020-10-15 12:45:57 +00:00
|
|
|
}
|
|
|
|
|
2022-11-23 01:42:33 +00:00
|
|
|
wait_online_network()
|
|
|
|
{
|
|
|
|
# In some cases, network may still not be ready because nm-online is called
|
|
|
|
# with "-s" which means to wait for NetworkManager startup to complete, rather
|
|
|
|
# than waiting for network connectivity specifically. Wait 10mins more for the
|
|
|
|
# network to be truely ready in these cases.
|
|
|
|
_loop=0
|
|
|
|
while [ $_loop -lt 600 ]; do
|
|
|
|
sleep 1
|
|
|
|
_loop=$((_loop + 1))
|
|
|
|
if _route=$(kdump_get_ip_route "$1" 2> /dev/null); then
|
|
|
|
printf "%s" "$_route"
|
|
|
|
return
|
2023-04-18 07:26:17 +00:00
|
|
|
else
|
|
|
|
dwarn "Waiting for network to be ready (${_loop}s / 10min)"
|
2022-11-23 01:42:33 +00:00
|
|
|
fi
|
|
|
|
done
|
|
|
|
|
|
|
|
derror "Oops. The network still isn't ready after waiting 10mins."
|
|
|
|
exit 1
|
|
|
|
}
|
|
|
|
|
2020-10-15 12:45:57 +00:00
|
|
|
get_host_ip()
|
|
|
|
{
|
2022-11-23 01:42:18 +00:00
|
|
|
|
|
|
|
if ! is_nfs_dump_target && ! is_ssh_dump_target; then
|
|
|
|
return 0
|
2021-11-03 11:11:53 +00:00
|
|
|
fi
|
2022-11-23 01:42:18 +00:00
|
|
|
|
|
|
|
_kdump_remote_ip=$(getarg kdump_remote_ip=)
|
|
|
|
|
|
|
|
if [ -z "$_kdump_remote_ip" ]; then
|
|
|
|
derror "failed to get remote IP address!"
|
|
|
|
return 1
|
|
|
|
fi
|
2022-11-23 01:42:33 +00:00
|
|
|
|
|
|
|
if ! _route=$(wait_online_network "$_kdump_remote_ip"); then
|
|
|
|
return 1
|
|
|
|
fi
|
|
|
|
|
2022-11-23 01:42:18 +00:00
|
|
|
_netdev=$(kdump_get_ip_route_field "$_route" "dev")
|
|
|
|
|
|
|
|
if ! _kdumpip=$(ip addr show dev "$_netdev" | grep '[ ]*inet'); then
|
|
|
|
derror "Failed to get IP of $_netdev"
|
|
|
|
return 1
|
|
|
|
fi
|
|
|
|
|
|
|
|
_kdumpip=$(echo "$_kdumpip" | head -n 1 | awk '{print $2}')
|
|
|
|
_kdumpip="${_kdumpip%%/*}"
|
|
|
|
HOST_IP=$_kdumpip
|
2020-10-15 12:45:57 +00:00
|
|
|
}
|
|
|
|
|
2021-11-03 07:12:31 +00:00
|
|
|
read_kdump_confs()
|
2020-10-15 12:45:57 +00:00
|
|
|
{
|
2021-11-03 11:11:53 +00:00
|
|
|
if [ ! -f "$KDUMP_CONFIG_FILE" ]; then
|
|
|
|
derror "$KDUMP_CONFIG_FILE not found"
|
|
|
|
return
|
|
|
|
fi
|
|
|
|
|
|
|
|
get_kdump_confs
|
|
|
|
|
|
|
|
# rescan for add code for dump target
|
|
|
|
while read -r config_opt config_val; do
|
|
|
|
# remove inline comments after the end of a directive.
|
|
|
|
case "$config_opt" in
|
|
|
|
dracut_args)
|
|
|
|
config_val=$(get_dracut_args_target "$config_val")
|
|
|
|
if [ -n "$config_val" ]; then
|
|
|
|
config_val=$(get_mntpoint_from_target "$config_val")
|
|
|
|
DUMP_INSTRUCTION="dump_fs $config_val"
|
|
|
|
fi
|
|
|
|
;;
|
2022-10-26 02:24:57 +00:00
|
|
|
ext[234] | xfs | btrfs | minix | nfs | virtiofs)
|
2021-11-03 11:11:53 +00:00
|
|
|
config_val=$(get_mntpoint_from_target "$config_val")
|
|
|
|
DUMP_INSTRUCTION="dump_fs $config_val"
|
|
|
|
;;
|
|
|
|
raw)
|
|
|
|
DUMP_INSTRUCTION="dump_raw $config_val"
|
|
|
|
;;
|
|
|
|
ssh)
|
|
|
|
DUMP_INSTRUCTION="dump_ssh $SSH_KEY_LOCATION $config_val"
|
|
|
|
;;
|
|
|
|
esac
|
|
|
|
done < "$KDUMP_CONF_PARSED"
|
2020-10-15 12:45:57 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
fence_kdump_notify()
|
|
|
|
{
|
2021-11-03 11:11:53 +00:00
|
|
|
if [ -n "$FENCE_KDUMP_NODES" ]; then
|
|
|
|
# shellcheck disable=SC2086
|
|
|
|
$FENCE_KDUMP_SEND $FENCE_KDUMP_ARGS $FENCE_KDUMP_NODES &
|
|
|
|
fi
|
2020-10-15 12:45:57 +00:00
|
|
|
}
|
|
|
|
|
Re-introduce vmcore creation notification to kdump
Upstream: fedora
Resolves: RHEL-70214
Conflict: Yes, the conflict is the same as the original c9s commit
c5aa4609 ("Introduce vmcore creation notification to kdump")
9ec61f6c ("Return the correct exit code of rebuild initrd")
Also this patch cherry-picked the ipv6 fixed in [1].
[1]: https://github.com/rhkdump/kdump-utils/pull/60/files
commit 24e76222c740def1d03a506652400fe55959e024
Author: Tao Liu <ltao@redhat.com>
Date: Fri Nov 29 16:15:18 2024 +1300
Re-introduce vmcore creation notification to kdump
Motivation
==========
People may forget to recheck to ensure kdump works, which as a result, a
possibility of no vmcores generated after a real system crash. It is
unexpected for kdump.
It is highly recommended people to test kdump after any system modification,
such as:
a. after kernel patching or whole yum update, as it might break something
on which kdump is dependent, maybe due to introduction of any new bug etc.
b. after any change at hardware level, maybe storage, networking,
firmware upgrading etc.
c. after implementing any new application, like which involves 3rd party modules
etc.
Though these exceed the range of kdump, however a simple vmcore creation
status notification is good to have for now.
Design
======
Kdump currently will check any relating files/fs/drivers modified before
determine if initrd should rebuild when (re)start. A rebuild is an
indicator of such modification, and kdump need to be tested. This will
clear the vmcore creation status specified in $VMCORE_CREATION_STATUS,
and as a result, a notification of vmcore creation test will be
outputted.
To test kdump, there is an entry for doing that by "kdumpctl test". It
will generate a timestamp string as the ID of the current test, along
with a "pending" status in $VMCORE_CREATION_STATUS, then a real crash &
dump process will be triggered.
After system reboot back to normal, a vmcore creation check will start at
"kdumpctl (re)start/status", and will report the results as
success/fail/manual status to users.
To achieve that, program will first check the status in $VMCORE_CREATION_STATUS.
If "pending" status if found, which means the test result is
undetermined and need a retrive from remote/local dump folder. Then if test
id is found in the dump folder and vmcore is complete, then "pending"
would be overwritten by "success", which indicates a successful kdump
test. If test id is found in the dump folder but vmcore is incomplete,
then it is a "fail" kdump test. If no test id is found, then it is a "manual"
status, which indicates users should check the test results manually.
If $VMCORE_CREATION_STATUS is already success/fail/manual status, it indicates
the test result has already been determined, so the program will not access
the remote/local dump folder again. This can limite any unnecessary
access to dump target, shorten the time consumption.
User should check for the root cause of fail/manual status when get
reports.
$VMCORE_CREATION_STATUS is used for recording the vmcore creation status of
the current env. The format is like:
<status> kdump_test_id=<timestamp sec>-<timestamp nanosec>
e.g:
success kdump_test_id=1729823462-938751820
Which means, there has been a successful kdump test at
$(date -d "@1729823462") timestamp for the current env. Timestamp
nanosec is only meaningful for uniquify id string.
Difference
==========
Previously there is one commit 88525ebf ("Introduce vmcore creation
notification to kdump") merged and addressing the same issue, but
implemented differently:
The prev one:
Save the $VMCORE_CREATION_STATUS to local drive during the 2nd kernel
dumping. If vmcore dumping target is different from $VMCORE_CREATION_STATUS's
drive, then the latter one need to be mounted in 2nd kernel.
This one:
Save the $VMCORE_CREATION_STATUS to local drive only in 1nd kernel, that
is, the test result is retrived after 2nd kernel dumping. So it doesn't
load or mount other drive in 2nd kernel.
The advantage:
Extra mounting in 2nd kernel will introduce higher risk of failure,
as a result, lower the success of vmcore dumping, which is
unaccepted. So keep the code for 2nd kernel as simple is preferred.
Usage
=====
[root@localhost ~]# kdumpctl restart
kdump: kexec: unloaded kdump kernel
kdump: Stopping kdump: [OK]
kdump: kexec: loaded kdump kernel
kdump: Starting kdump: [OK]
kdump: Notice: No vmcore creation test performed!
[root@localhost ~]# kdumpctl status
kdump: Kdump is operational
kdump: Notice: No vmcore creation test performed!
[root@localhost ~]# kdumpctl test
[root@localhost ~]# cat /var/lib/kdump/vmcore-creation.status
pending kdump_test_id=1729823462-938751820
[root@localhost ~]# kdumpctl status
kdump: Kdump is operational
kdump: Notice: Last successful vmcore creation on Fri Oct 25 02:31:02 AM UTC 2024
[root@localhost ~]# cat /var/lib/kdump/vmcore-creation.status
success kdump_test_id=1729823462-938751820
[root@localhost ~]# kdumpctl restart
kdump: kexec: unloaded kdump kernel
kdump: Stopping kdump: [OK]
kdump: kexec: loaded kdump kernel
kdump: Starting kdump: [OK]
kdump: Notice: Last successful vmcore creation on Fri Oct 25 02:31:02 AM UTC 2024
Note: the notification for kdumpctl (re)start/status can be disabled by
setting VMCORE_CREATION_NOTIFICATION in /etc/sysconfig/kdump. And fadump
is NOT supported for this feature.
Signed-off-by: Tao Liu <ltao@redhat.com>
Signed-off-by: Tao Liu <ltao@redhat.com>
2024-12-05 04:08:38 +00:00
|
|
|
kdump_test_set_status() {
|
|
|
|
_status="$1"
|
|
|
|
|
|
|
|
[ -n "$KDUMP_TEST_STATUS" ] || return
|
|
|
|
|
|
|
|
case "$_status" in
|
|
|
|
success|fail) ;;
|
|
|
|
*)
|
|
|
|
derror "Unknown test status $_status"
|
|
|
|
return 1
|
|
|
|
;;
|
|
|
|
esac
|
|
|
|
|
|
|
|
if is_ssh_dump_target; then
|
|
|
|
_ssh_opts="-i $SSH_KEY_LOCATION -o BatchMode=yes -o StrictHostKeyChecking=yes"
|
|
|
|
_ssh_host=$(echo "$DUMP_INSTRUCTION" | awk '{print $3}')
|
|
|
|
|
|
|
|
ssh -q $_ssh_opts "$_ssh_host" "mkdir -p ${KDUMP_TEST_STATUS%/*}" \
|
|
|
|
|| return 1
|
|
|
|
ssh -q $_ssh_opts "$_ssh_host" "echo $_status kdump_test_id=$KDUMP_TEST_ID > $KDUMP_TEST_STATUS" \
|
|
|
|
|| return 1
|
|
|
|
else
|
|
|
|
_target=$(echo "$DUMP_INSTRUCTION" | awk '{print $2}')
|
|
|
|
|
|
|
|
mkdir -p "$_target/$KDUMP_PATH" || return 1
|
|
|
|
echo "$_status kdump_test_id=$KDUMP_TEST_ID" > "$_target/$KDUMP_TEST_STATUS"
|
|
|
|
sync -f "$_target/$KDUMP_TEST_STATUS"
|
|
|
|
fi
|
|
|
|
}
|
|
|
|
|
|
|
|
kdump_test_init() {
|
|
|
|
is_raw_dump_target && return
|
|
|
|
|
|
|
|
KDUMP_TEST_ID=$(getarg kdump_test_id=)
|
|
|
|
[ -z "$KDUMP_TEST_ID" ] && return
|
|
|
|
|
|
|
|
KDUMP_PATH="$KDUMP_PATH/kdump-test-$KDUMP_TEST_ID"
|
|
|
|
KDUMP_TEST_STATUS="$KDUMP_PATH/vmcore-creation.status"
|
|
|
|
|
|
|
|
kdump_test_set_status 'fail'
|
|
|
|
}
|
|
|
|
|
2021-11-03 11:04:34 +00:00
|
|
|
if [ "$1" = "--error-handler" ]; then
|
2021-11-03 11:11:53 +00:00
|
|
|
get_kdump_confs
|
|
|
|
do_failure_action
|
|
|
|
do_final_action
|
2021-11-03 11:04:34 +00:00
|
|
|
|
2021-11-03 11:11:53 +00:00
|
|
|
exit $?
|
2021-11-03 11:04:34 +00:00
|
|
|
fi
|
|
|
|
|
|
|
|
# continue here only if we have to save dump.
|
|
|
|
if [ -f /etc/fadump.initramfs ] && [ ! -f /proc/device-tree/rtas/ibm,kernel-dump ] && [ ! -f /proc/device-tree/ibm,opal/dump/mpipl-boot ]; then
|
2021-11-03 11:11:53 +00:00
|
|
|
exit 0
|
2021-11-03 11:04:34 +00:00
|
|
|
fi
|
|
|
|
|
2021-11-03 07:12:31 +00:00
|
|
|
read_kdump_confs
|
2020-10-15 12:45:57 +00:00
|
|
|
fence_kdump_notify
|
|
|
|
|
2021-11-03 11:10:54 +00:00
|
|
|
if ! get_host_ip; then
|
2021-11-03 11:11:53 +00:00
|
|
|
derror "get_host_ip exited with non-zero status!"
|
|
|
|
exit 1
|
2020-10-15 12:45:57 +00:00
|
|
|
fi
|
|
|
|
|
|
|
|
if [ -z "$DUMP_INSTRUCTION" ]; then
|
2021-11-03 11:11:53 +00:00
|
|
|
DUMP_INSTRUCTION="dump_fs $NEWROOT"
|
2020-10-15 12:45:57 +00:00
|
|
|
fi
|
|
|
|
|
Re-introduce vmcore creation notification to kdump
Upstream: fedora
Resolves: RHEL-70214
Conflict: Yes, the conflict is the same as the original c9s commit
c5aa4609 ("Introduce vmcore creation notification to kdump")
9ec61f6c ("Return the correct exit code of rebuild initrd")
Also this patch cherry-picked the ipv6 fixed in [1].
[1]: https://github.com/rhkdump/kdump-utils/pull/60/files
commit 24e76222c740def1d03a506652400fe55959e024
Author: Tao Liu <ltao@redhat.com>
Date: Fri Nov 29 16:15:18 2024 +1300
Re-introduce vmcore creation notification to kdump
Motivation
==========
People may forget to recheck to ensure kdump works, which as a result, a
possibility of no vmcores generated after a real system crash. It is
unexpected for kdump.
It is highly recommended people to test kdump after any system modification,
such as:
a. after kernel patching or whole yum update, as it might break something
on which kdump is dependent, maybe due to introduction of any new bug etc.
b. after any change at hardware level, maybe storage, networking,
firmware upgrading etc.
c. after implementing any new application, like which involves 3rd party modules
etc.
Though these exceed the range of kdump, however a simple vmcore creation
status notification is good to have for now.
Design
======
Kdump currently will check any relating files/fs/drivers modified before
determine if initrd should rebuild when (re)start. A rebuild is an
indicator of such modification, and kdump need to be tested. This will
clear the vmcore creation status specified in $VMCORE_CREATION_STATUS,
and as a result, a notification of vmcore creation test will be
outputted.
To test kdump, there is an entry for doing that by "kdumpctl test". It
will generate a timestamp string as the ID of the current test, along
with a "pending" status in $VMCORE_CREATION_STATUS, then a real crash &
dump process will be triggered.
After system reboot back to normal, a vmcore creation check will start at
"kdumpctl (re)start/status", and will report the results as
success/fail/manual status to users.
To achieve that, program will first check the status in $VMCORE_CREATION_STATUS.
If "pending" status if found, which means the test result is
undetermined and need a retrive from remote/local dump folder. Then if test
id is found in the dump folder and vmcore is complete, then "pending"
would be overwritten by "success", which indicates a successful kdump
test. If test id is found in the dump folder but vmcore is incomplete,
then it is a "fail" kdump test. If no test id is found, then it is a "manual"
status, which indicates users should check the test results manually.
If $VMCORE_CREATION_STATUS is already success/fail/manual status, it indicates
the test result has already been determined, so the program will not access
the remote/local dump folder again. This can limite any unnecessary
access to dump target, shorten the time consumption.
User should check for the root cause of fail/manual status when get
reports.
$VMCORE_CREATION_STATUS is used for recording the vmcore creation status of
the current env. The format is like:
<status> kdump_test_id=<timestamp sec>-<timestamp nanosec>
e.g:
success kdump_test_id=1729823462-938751820
Which means, there has been a successful kdump test at
$(date -d "@1729823462") timestamp for the current env. Timestamp
nanosec is only meaningful for uniquify id string.
Difference
==========
Previously there is one commit 88525ebf ("Introduce vmcore creation
notification to kdump") merged and addressing the same issue, but
implemented differently:
The prev one:
Save the $VMCORE_CREATION_STATUS to local drive during the 2nd kernel
dumping. If vmcore dumping target is different from $VMCORE_CREATION_STATUS's
drive, then the latter one need to be mounted in 2nd kernel.
This one:
Save the $VMCORE_CREATION_STATUS to local drive only in 1nd kernel, that
is, the test result is retrived after 2nd kernel dumping. So it doesn't
load or mount other drive in 2nd kernel.
The advantage:
Extra mounting in 2nd kernel will introduce higher risk of failure,
as a result, lower the success of vmcore dumping, which is
unaccepted. So keep the code for 2nd kernel as simple is preferred.
Usage
=====
[root@localhost ~]# kdumpctl restart
kdump: kexec: unloaded kdump kernel
kdump: Stopping kdump: [OK]
kdump: kexec: loaded kdump kernel
kdump: Starting kdump: [OK]
kdump: Notice: No vmcore creation test performed!
[root@localhost ~]# kdumpctl status
kdump: Kdump is operational
kdump: Notice: No vmcore creation test performed!
[root@localhost ~]# kdumpctl test
[root@localhost ~]# cat /var/lib/kdump/vmcore-creation.status
pending kdump_test_id=1729823462-938751820
[root@localhost ~]# kdumpctl status
kdump: Kdump is operational
kdump: Notice: Last successful vmcore creation on Fri Oct 25 02:31:02 AM UTC 2024
[root@localhost ~]# cat /var/lib/kdump/vmcore-creation.status
success kdump_test_id=1729823462-938751820
[root@localhost ~]# kdumpctl restart
kdump: kexec: unloaded kdump kernel
kdump: Stopping kdump: [OK]
kdump: kexec: loaded kdump kernel
kdump: Starting kdump: [OK]
kdump: Notice: Last successful vmcore creation on Fri Oct 25 02:31:02 AM UTC 2024
Note: the notification for kdumpctl (re)start/status can be disabled by
setting VMCORE_CREATION_NOTIFICATION in /etc/sysconfig/kdump. And fadump
is NOT supported for this feature.
Signed-off-by: Tao Liu <ltao@redhat.com>
Signed-off-by: Tao Liu <ltao@redhat.com>
2024-12-05 04:08:38 +00:00
|
|
|
kdump_test_init
|
2021-11-03 11:10:54 +00:00
|
|
|
if ! do_kdump_pre; then
|
2021-11-03 11:11:53 +00:00
|
|
|
derror "kdump_pre script exited with non-zero status!"
|
|
|
|
do_final_action
|
|
|
|
# During systemd service to reboot the machine, stop this shell script running
|
|
|
|
exit 1
|
2020-10-15 12:45:57 +00:00
|
|
|
fi
|
|
|
|
make_trace_mem "kdump saving vmcore" '1:shortmem' '2+:mem' '3+:slab'
|
|
|
|
do_dump
|
|
|
|
DUMP_RETVAL=$?
|
|
|
|
|
2021-11-03 11:10:54 +00:00
|
|
|
if ! do_kdump_post $DUMP_RETVAL; then
|
2021-11-03 11:11:53 +00:00
|
|
|
derror "kdump_post script exited with non-zero status!"
|
2020-10-15 12:45:57 +00:00
|
|
|
fi
|
|
|
|
|
|
|
|
if [ $DUMP_RETVAL -ne 0 ]; then
|
2021-11-03 11:11:53 +00:00
|
|
|
exit 1
|
2020-10-15 12:45:57 +00:00
|
|
|
fi
|
|
|
|
|
Re-introduce vmcore creation notification to kdump
Upstream: fedora
Resolves: RHEL-70214
Conflict: Yes, the conflict is the same as the original c9s commit
c5aa4609 ("Introduce vmcore creation notification to kdump")
9ec61f6c ("Return the correct exit code of rebuild initrd")
Also this patch cherry-picked the ipv6 fixed in [1].
[1]: https://github.com/rhkdump/kdump-utils/pull/60/files
commit 24e76222c740def1d03a506652400fe55959e024
Author: Tao Liu <ltao@redhat.com>
Date: Fri Nov 29 16:15:18 2024 +1300
Re-introduce vmcore creation notification to kdump
Motivation
==========
People may forget to recheck to ensure kdump works, which as a result, a
possibility of no vmcores generated after a real system crash. It is
unexpected for kdump.
It is highly recommended people to test kdump after any system modification,
such as:
a. after kernel patching or whole yum update, as it might break something
on which kdump is dependent, maybe due to introduction of any new bug etc.
b. after any change at hardware level, maybe storage, networking,
firmware upgrading etc.
c. after implementing any new application, like which involves 3rd party modules
etc.
Though these exceed the range of kdump, however a simple vmcore creation
status notification is good to have for now.
Design
======
Kdump currently will check any relating files/fs/drivers modified before
determine if initrd should rebuild when (re)start. A rebuild is an
indicator of such modification, and kdump need to be tested. This will
clear the vmcore creation status specified in $VMCORE_CREATION_STATUS,
and as a result, a notification of vmcore creation test will be
outputted.
To test kdump, there is an entry for doing that by "kdumpctl test". It
will generate a timestamp string as the ID of the current test, along
with a "pending" status in $VMCORE_CREATION_STATUS, then a real crash &
dump process will be triggered.
After system reboot back to normal, a vmcore creation check will start at
"kdumpctl (re)start/status", and will report the results as
success/fail/manual status to users.
To achieve that, program will first check the status in $VMCORE_CREATION_STATUS.
If "pending" status if found, which means the test result is
undetermined and need a retrive from remote/local dump folder. Then if test
id is found in the dump folder and vmcore is complete, then "pending"
would be overwritten by "success", which indicates a successful kdump
test. If test id is found in the dump folder but vmcore is incomplete,
then it is a "fail" kdump test. If no test id is found, then it is a "manual"
status, which indicates users should check the test results manually.
If $VMCORE_CREATION_STATUS is already success/fail/manual status, it indicates
the test result has already been determined, so the program will not access
the remote/local dump folder again. This can limite any unnecessary
access to dump target, shorten the time consumption.
User should check for the root cause of fail/manual status when get
reports.
$VMCORE_CREATION_STATUS is used for recording the vmcore creation status of
the current env. The format is like:
<status> kdump_test_id=<timestamp sec>-<timestamp nanosec>
e.g:
success kdump_test_id=1729823462-938751820
Which means, there has been a successful kdump test at
$(date -d "@1729823462") timestamp for the current env. Timestamp
nanosec is only meaningful for uniquify id string.
Difference
==========
Previously there is one commit 88525ebf ("Introduce vmcore creation
notification to kdump") merged and addressing the same issue, but
implemented differently:
The prev one:
Save the $VMCORE_CREATION_STATUS to local drive during the 2nd kernel
dumping. If vmcore dumping target is different from $VMCORE_CREATION_STATUS's
drive, then the latter one need to be mounted in 2nd kernel.
This one:
Save the $VMCORE_CREATION_STATUS to local drive only in 1nd kernel, that
is, the test result is retrived after 2nd kernel dumping. So it doesn't
load or mount other drive in 2nd kernel.
The advantage:
Extra mounting in 2nd kernel will introduce higher risk of failure,
as a result, lower the success of vmcore dumping, which is
unaccepted. So keep the code for 2nd kernel as simple is preferred.
Usage
=====
[root@localhost ~]# kdumpctl restart
kdump: kexec: unloaded kdump kernel
kdump: Stopping kdump: [OK]
kdump: kexec: loaded kdump kernel
kdump: Starting kdump: [OK]
kdump: Notice: No vmcore creation test performed!
[root@localhost ~]# kdumpctl status
kdump: Kdump is operational
kdump: Notice: No vmcore creation test performed!
[root@localhost ~]# kdumpctl test
[root@localhost ~]# cat /var/lib/kdump/vmcore-creation.status
pending kdump_test_id=1729823462-938751820
[root@localhost ~]# kdumpctl status
kdump: Kdump is operational
kdump: Notice: Last successful vmcore creation on Fri Oct 25 02:31:02 AM UTC 2024
[root@localhost ~]# cat /var/lib/kdump/vmcore-creation.status
success kdump_test_id=1729823462-938751820
[root@localhost ~]# kdumpctl restart
kdump: kexec: unloaded kdump kernel
kdump: Stopping kdump: [OK]
kdump: kexec: loaded kdump kernel
kdump: Starting kdump: [OK]
kdump: Notice: Last successful vmcore creation on Fri Oct 25 02:31:02 AM UTC 2024
Note: the notification for kdumpctl (re)start/status can be disabled by
setting VMCORE_CREATION_NOTIFICATION in /etc/sysconfig/kdump. And fadump
is NOT supported for this feature.
Signed-off-by: Tao Liu <ltao@redhat.com>
Signed-off-by: Tao Liu <ltao@redhat.com>
2024-12-05 04:08:38 +00:00
|
|
|
kdump_test_set_status "success"
|
2020-10-15 12:45:57 +00:00
|
|
|
do_final_action
|