Introduce kdump error handling service
Now upon failure kdump script might not be called at all and it might not be able to execute default action. It results in a hang. Because we disable emergency shell and rely on kdump.sh being invoked through dracut-pre-pivot hook. But it might happen that we never call into dracut-pre-pivot hook because certain systemd targets could not reach due to failure in their dependencies. In those cases error handling code does not run and system hangs. For example: sysroot-var-crash.mount --> initrd-root-fs.target --> initrd.target \ --> dracut-pre-pivot.service --> kdump.sh If /sysroot/var/crash mount fails, initrd-root-fs.target will not be reached. And then initrd.target will not be reached, dracut-pre-pivot.service wouldn't run. Finally kdump.sh wouldn't run. To solve this problem, we need to separate the error handling code from dracut-pre-pivot hook, and every time when a failure shows up, the separated code can be called by the emergency service. By default systemd provides an emergency service which will drop us into shell every time upon a critical failure. It's very convenient for us to re-use the framework of systemd emergency, because we don't have to touch the other parts of systemd. We can use our own script instead of the default one. This new scheme will overwrite emergency shell and replace with kdump error handling code. And this code will do the error handling as needed. Now, we will not rely on dracut-pre-pivot hook running always. Instead whenever error happens and it is serious enough that emergency shell needed to run, now kdump error handler will run. dracut-emergency is also replaced by kdump error handler and it's enabled again all the way down. So all the failure (including systemd and dracut) in 2nd kernel could be captured, and trigger kdump error handler. dracut-initqueue is a special case, which calls "systemctl start emergency" directly, not via "OnFailure=emergency". In case of failure, emergency is started, but not in a isolation mode, which means dracut-initqueue is still running. On the other hand, emergency will call dracut-initqueue again when default action is dump_to_rootfs. systemd would block on the last dracut-initqueue, waiting for the first instance to exit, which leaves us hang. It looks like the following: dracut-initqueue (running) --> call dracut-emergency: --> dracut-emergency (running) --> kdump-error-handler.sh (running) --> call dracut-initqueue: --> blocking and waiting for the original instance to exit. To fix this, I'd like to introduce a wrapper emergency service. This emegency service will replace both the systemd and dracut emergency. And this service does nothing but to isolate to real kdump error handler service: dracut-initqueue (running) --> call dracut-emergency: --> dracut-emergency isolate to kdump-error-handler.service --> dracut-emergency and dracut-initqueue will both be stopped and kdump-error-handler.service will run kdump-error-handler.sh. In a normal failure case, this still works: foo.service fails --> trigger emergency.service --> emergency.service isolates to kdump-error-handler.service --> kdump-error-handler.service will run kdump-error-handler.sh Signed-off-by: WANG Chao <chaowang@redhat.com> Acked-by: Vivek Goyal <vgoyal@redhat.com> Acked-by: Dave Young <dyoung@redhat.com>
This commit is contained in:
parent
de95c74a76
commit
002337c671
27
dracut-kdump-emergency.service
Normal file
27
dracut-kdump-emergency.service
Normal file
@ -0,0 +1,27 @@
|
|||||||
|
# This file is part of systemd.
|
||||||
|
#
|
||||||
|
# systemd is free software; you can redistribute it and/or modify it
|
||||||
|
# under the terms of the GNU Lesser General Public License as published by
|
||||||
|
# the Free Software Foundation; either version 2.1 of the License, or
|
||||||
|
# (at your option) any later version.
|
||||||
|
|
||||||
|
# This service will be placed in kdump initramfs and replace both the systemd
|
||||||
|
# emergency service and dracut emergency shell. IOW, any emergency will be
|
||||||
|
# kick this service and in turn isolating to kdump error handler.
|
||||||
|
|
||||||
|
[Unit]
|
||||||
|
Description=Kdump Emergency
|
||||||
|
DefaultDependencies=no
|
||||||
|
|
||||||
|
[Service]
|
||||||
|
ExecStart=systemctl --no-block isolate kdump-error-handler.service
|
||||||
|
Type=oneshot
|
||||||
|
StandardInput=tty-force
|
||||||
|
StandardOutput=inherit
|
||||||
|
StandardError=inherit
|
||||||
|
KillMode=process
|
||||||
|
IgnoreSIGPIPE=no
|
||||||
|
|
||||||
|
# Bash ignores SIGTERM, so we send SIGHUP instead, to ensure that bash
|
||||||
|
# terminates cleanly.
|
||||||
|
KillSignal=SIGHUP
|
34
dracut-kdump-error-handler.service
Normal file
34
dracut-kdump-error-handler.service
Normal file
@ -0,0 +1,34 @@
|
|||||||
|
# This file is part of systemd.
|
||||||
|
#
|
||||||
|
# systemd is free software; you can redistribute it and/or modify it
|
||||||
|
# under the terms of the GNU Lesser General Public License as published by
|
||||||
|
# the Free Software Foundation; either version 2.1 of the License, or
|
||||||
|
# (at your option) any later version.
|
||||||
|
|
||||||
|
# This service will run the real kdump error handler code. Executing the
|
||||||
|
# default action configured in kdump.conf
|
||||||
|
|
||||||
|
[Unit]
|
||||||
|
Description=Kdump Error Handler
|
||||||
|
DefaultDependencies=no
|
||||||
|
After=systemd-vconsole-setup.service
|
||||||
|
Wants=systemd-vconsole-setup.service
|
||||||
|
AllowIsolate=yes
|
||||||
|
|
||||||
|
[Service]
|
||||||
|
Environment=HOME=/
|
||||||
|
Environment=DRACUT_SYSTEMD=1
|
||||||
|
Environment=NEWROOT=/sysroot
|
||||||
|
WorkingDirectory=/
|
||||||
|
ExecStart=/bin/kdump-error-handler.sh
|
||||||
|
ExecStopPost=-/usr/bin/systemctl --fail --no-block default
|
||||||
|
Type=oneshot
|
||||||
|
StandardInput=tty-force
|
||||||
|
StandardOutput=inherit
|
||||||
|
StandardError=inherit
|
||||||
|
KillMode=process
|
||||||
|
IgnoreSIGPIPE=no
|
||||||
|
|
||||||
|
# Bash ignores SIGTERM, so we send SIGHUP instead, to ensure that bash
|
||||||
|
# terminates cleanly.
|
||||||
|
KillSignal=SIGHUP
|
10
dracut-kdump-error-handler.sh
Executable file
10
dracut-kdump-error-handler.sh
Executable file
@ -0,0 +1,10 @@
|
|||||||
|
#!/bin/sh
|
||||||
|
|
||||||
|
. /lib/kdump-lib-initramfs.sh
|
||||||
|
|
||||||
|
set -o pipefail
|
||||||
|
export PATH=$PATH:$KDUMP_SCRIPT_DIR
|
||||||
|
|
||||||
|
get_kdump_confs
|
||||||
|
do_default_action
|
||||||
|
do_final_action
|
@ -9,10 +9,6 @@ exec &> /dev/console
|
|||||||
. /lib/dracut-lib.sh
|
. /lib/dracut-lib.sh
|
||||||
. /lib/kdump-lib-initramfs.sh
|
. /lib/kdump-lib-initramfs.sh
|
||||||
|
|
||||||
if [ -f "$initdir/lib/dracut/no-emergency-shell" ]; then
|
|
||||||
rm -f -- $initdir/lib/dracut/no-emergency-shell
|
|
||||||
fi
|
|
||||||
|
|
||||||
set -o pipefail
|
set -o pipefail
|
||||||
DUMP_RETVAL=0
|
DUMP_RETVAL=0
|
||||||
|
|
||||||
|
@ -563,7 +563,6 @@ kdump_install_random_seed() {
|
|||||||
|
|
||||||
install() {
|
install() {
|
||||||
kdump_install_conf
|
kdump_install_conf
|
||||||
>"$initdir/lib/dracut/no-emergency-shell"
|
|
||||||
|
|
||||||
if is_ssh_dump_target; then
|
if is_ssh_dump_target; then
|
||||||
kdump_install_random_seed
|
kdump_install_random_seed
|
||||||
@ -581,6 +580,12 @@ install() {
|
|||||||
inst_hook pre-pivot 9999 "$moddir/kdump.sh"
|
inst_hook pre-pivot 9999 "$moddir/kdump.sh"
|
||||||
inst "/lib/kdump/kdump-lib.sh" "/lib/kdump-lib.sh"
|
inst "/lib/kdump/kdump-lib.sh" "/lib/kdump-lib.sh"
|
||||||
inst "/lib/kdump/kdump-lib-initramfs.sh" "/lib/kdump-lib-initramfs.sh"
|
inst "/lib/kdump/kdump-lib-initramfs.sh" "/lib/kdump-lib-initramfs.sh"
|
||||||
|
inst "$moddir/kdump-error-handler.sh" "/usr/bin/kdump-error-handler.sh"
|
||||||
|
inst "$moddir/kdump-error-handler.service" "$systemdsystemunitdir/kdump-error-handler.service"
|
||||||
|
# Replace existing emergency service
|
||||||
|
cp "$moddir/kdump-emergency.service" "$initdir/$systemdsystemunitdir/emergency.service"
|
||||||
|
# Also redirect dracut-emergency to kdump error handler
|
||||||
|
ln_r "$systemdsystemunitdir/emergency.service" "$systemdsystemunitdir/dracut-emergency.service"
|
||||||
|
|
||||||
# Check for all the devices and if any device is iscsi, bring up iscsi
|
# Check for all the devices and if any device is iscsi, bring up iscsi
|
||||||
# target. Ideally all this should be pushed into dracut iscsi module
|
# target. Ideally all this should be pushed into dracut iscsi module
|
||||||
|
@ -1,6 +1,5 @@
|
|||||||
# These variables and functions are useful in 2nd kernel
|
# These variables and functions are useful in 2nd kernel
|
||||||
|
|
||||||
. /lib/dracut-lib.sh
|
|
||||||
. /lib/kdump-lib.sh
|
. /lib/kdump-lib.sh
|
||||||
|
|
||||||
KDUMP_PATH="/var/crash"
|
KDUMP_PATH="/var/crash"
|
||||||
@ -23,6 +22,7 @@ NEWROOT="/sysroot"
|
|||||||
get_kdump_confs()
|
get_kdump_confs()
|
||||||
{
|
{
|
||||||
local config_opt config_val
|
local config_opt config_val
|
||||||
|
local user_specified_cc
|
||||||
|
|
||||||
while read config_opt config_val;
|
while read config_opt config_val;
|
||||||
do
|
do
|
||||||
@ -34,6 +34,7 @@ get_kdump_confs()
|
|||||||
;;
|
;;
|
||||||
core_collector)
|
core_collector)
|
||||||
[ -n "$config_val" ] && CORE_COLLECTOR="$config_val"
|
[ -n "$config_val" ] && CORE_COLLECTOR="$config_val"
|
||||||
|
user_specified_cc=yes
|
||||||
;;
|
;;
|
||||||
sshkey)
|
sshkey)
|
||||||
if [ -f "$config_val" ]; then
|
if [ -f "$config_val" ]; then
|
||||||
@ -55,7 +56,7 @@ get_kdump_confs()
|
|||||||
default)
|
default)
|
||||||
case $config_val in
|
case $config_val in
|
||||||
shell)
|
shell)
|
||||||
DEFAULT_ACTION="_emergency_shell kdump"
|
DEFAULT_ACTION="kdump_emergency_shell"
|
||||||
;;
|
;;
|
||||||
reboot)
|
reboot)
|
||||||
DEFAULT_ACTION="do_umount; reboot -f"
|
DEFAULT_ACTION="do_umount; reboot -f"
|
||||||
@ -67,12 +68,19 @@ get_kdump_confs()
|
|||||||
DEFAULT_ACTION="do_umount; poweroff -f"
|
DEFAULT_ACTION="do_umount; poweroff -f"
|
||||||
;;
|
;;
|
||||||
dump_to_rootfs)
|
dump_to_rootfs)
|
||||||
DEFAULT_ACTION="dump_fs $NEWROOT"
|
DEFAULT_ACTION="dump_to_rootfs"
|
||||||
;;
|
;;
|
||||||
esac
|
esac
|
||||||
;;
|
;;
|
||||||
esac
|
esac
|
||||||
done < $KDUMP_CONF
|
done < $KDUMP_CONF
|
||||||
|
|
||||||
|
if is_ssh_dump_target || is_raw_dump_target; then
|
||||||
|
if [ -z "$user_specified_cc" ]; then
|
||||||
|
CORE_COLLECTOR="$CORE_COLLECTOR -F"
|
||||||
|
fi
|
||||||
|
fi
|
||||||
|
|
||||||
}
|
}
|
||||||
|
|
||||||
# dump_fs <mount point| device>
|
# dump_fs <mount point| device>
|
||||||
@ -127,6 +135,24 @@ save_vmcore_dmesg_fs() {
|
|||||||
fi
|
fi
|
||||||
}
|
}
|
||||||
|
|
||||||
|
dump_to_rootfs()
|
||||||
|
{
|
||||||
|
|
||||||
|
echo "Kdump: trying to bring up rootfs device"
|
||||||
|
systemctl start dracut-initqueue
|
||||||
|
echo "Kdump: waiting for rootfs mount, will timeout after 90 seconds"
|
||||||
|
systemctl start sysroot.mount
|
||||||
|
|
||||||
|
dump_fs $NEWROOT
|
||||||
|
}
|
||||||
|
|
||||||
|
kdump_emergency_shell()
|
||||||
|
{
|
||||||
|
echo "PS1=\"kdump:\\\${PWD}# \"" >/etc/profile
|
||||||
|
/bin/dracut-emergency
|
||||||
|
rm -f /etc/profile
|
||||||
|
}
|
||||||
|
|
||||||
do_umount()
|
do_umount()
|
||||||
{
|
{
|
||||||
umount -Rf $NEWROOT
|
umount -Rf $NEWROOT
|
||||||
@ -134,7 +160,7 @@ do_umount()
|
|||||||
|
|
||||||
do_default_action()
|
do_default_action()
|
||||||
{
|
{
|
||||||
wait_for_loginit
|
echo "Kdump: Executing default action $DEFAULT_ACTION"
|
||||||
eval $DEFAULT_ACTION
|
eval $DEFAULT_ACTION
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@ -36,6 +36,9 @@ Source24: kdump-lib-initramfs.sh
|
|||||||
Source100: dracut-kdump.sh
|
Source100: dracut-kdump.sh
|
||||||
Source101: dracut-module-setup.sh
|
Source101: dracut-module-setup.sh
|
||||||
Source102: dracut-monitor_dd_progress
|
Source102: dracut-monitor_dd_progress
|
||||||
|
Source103: dracut-kdump-error-handler.sh
|
||||||
|
Source104: dracut-kdump-emergency.service
|
||||||
|
Source105: dracut-kdump-error-handler.service
|
||||||
|
|
||||||
Requires(post): systemd-units
|
Requires(post): systemd-units
|
||||||
Requires(preun): systemd-units
|
Requires(preun): systemd-units
|
||||||
@ -210,7 +213,9 @@ mkdir -p -m755 $RPM_BUILD_ROOT/etc/kdump-adv-conf/kdump_dracut_modules/99kdumpba
|
|||||||
cp %{SOURCE100} $RPM_BUILD_ROOT/etc/kdump-adv-conf/kdump_dracut_modules/99kdumpbase/%{remove_dracut_prefix %{SOURCE100}}
|
cp %{SOURCE100} $RPM_BUILD_ROOT/etc/kdump-adv-conf/kdump_dracut_modules/99kdumpbase/%{remove_dracut_prefix %{SOURCE100}}
|
||||||
cp %{SOURCE101} $RPM_BUILD_ROOT/etc/kdump-adv-conf/kdump_dracut_modules/99kdumpbase/%{remove_dracut_prefix %{SOURCE101}}
|
cp %{SOURCE101} $RPM_BUILD_ROOT/etc/kdump-adv-conf/kdump_dracut_modules/99kdumpbase/%{remove_dracut_prefix %{SOURCE101}}
|
||||||
cp %{SOURCE102} $RPM_BUILD_ROOT/etc/kdump-adv-conf/kdump_dracut_modules/99kdumpbase/%{remove_dracut_prefix %{SOURCE102}}
|
cp %{SOURCE102} $RPM_BUILD_ROOT/etc/kdump-adv-conf/kdump_dracut_modules/99kdumpbase/%{remove_dracut_prefix %{SOURCE102}}
|
||||||
|
cp %{SOURCE103} $RPM_BUILD_ROOT/etc/kdump-adv-conf/kdump_dracut_modules/99kdumpbase/%{remove_dracut_prefix %{SOURCE103}}
|
||||||
|
cp %{SOURCE104} $RPM_BUILD_ROOT/etc/kdump-adv-conf/kdump_dracut_modules/99kdumpbase/%{remove_dracut_prefix %{SOURCE104}}
|
||||||
|
cp %{SOURCE105} $RPM_BUILD_ROOT/etc/kdump-adv-conf/kdump_dracut_modules/99kdumpbase/%{remove_dracut_prefix %{SOURCE105}}
|
||||||
chmod 755 $RPM_BUILD_ROOT/etc/kdump-adv-conf/kdump_dracut_modules/99kdumpbase/%{remove_dracut_prefix %{SOURCE100}}
|
chmod 755 $RPM_BUILD_ROOT/etc/kdump-adv-conf/kdump_dracut_modules/99kdumpbase/%{remove_dracut_prefix %{SOURCE100}}
|
||||||
chmod 755 $RPM_BUILD_ROOT/etc/kdump-adv-conf/kdump_dracut_modules/99kdumpbase/%{remove_dracut_prefix %{SOURCE101}}
|
chmod 755 $RPM_BUILD_ROOT/etc/kdump-adv-conf/kdump_dracut_modules/99kdumpbase/%{remove_dracut_prefix %{SOURCE101}}
|
||||||
|
|
||||||
|
Loading…
Reference in New Issue
Block a user