Also known as CVE-2021-20269. The kdump dmesg log files(kexec-dmesg.log,
vmcore-dmesg.txt) are generated by shell redirection, which take the
default umask value, making the files readable for group and others.
This patch chmod these files, making them only accessible to owner.
Signed-off-by: Tao Liu <ltao@redhat.com>
Acked-by: Kairui Song <kasong@redhat.com>
Currently, kdump will fail to save vmcore when using the scp and ipv6.
The reason is that the scp requires IPv6 addresses to be enclosed in
square brackets, but ssh doesn’t require this.
Let's enclose the ipv6 address in square brackets for scp dump.
Signed-off-by: Lianbo Jiang <lijiang@redhat.com>
Acked-by: Pingfan Liu <piliu@redhat.com>
Sourcing logger file in kdump-lib.sh will leak kdump helper to dracut,
because module-setup.sh will source kdump-lib.sh. This will make kdump's
function override dracut's ones, and lead to unexpected behaviours.
So include kdump-logger.sh individually and only source it where it really
needed. for module-setup.sh, simply use dracut's logger helper is good
enough so just source kdump-logger.sh in kdump only scripts.
Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Lianbo Jiang <lijiang@redhat.com>
Simplify the code and fix mount point detection. The code logic is now
much simpler: if $1 is not a mount point, call "mount --target $1" again
to try mount it. "mount --target" cmd itself can handle all the /etc/fstab
parsing job, so drop the buggy and complex bash code.
Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Pingfan Liu <piliu@redhat.com>
systemctl is-failed will not work after dracut isolated to the emergency
target, so this judgement is invalid. And the restart is basically
harmless, so just revert this commit.
This reverts commit ad6a93b00d.
Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Pingfan Liu <piliu@redhat.com>
Currently, if saving vmcore failed, the final failure information won't
be saved to the kexec-dmesg.log, because the action of saving the log
occurs before the final log is printed, it has no chance to save the
log(marked it with the '^^^' below) to the log file(kexec-dmesg.log).
For example:
[1] console log:
[ 3.589967] kdump[453]: saving vmcore-dmesg.txt to /sysroot//var/crash/127.0.0.1-2020-11-26-14:19:17/
[ 3.627261] kdump[458]: saving vmcore-dmesg.txt complete
[ 3.633923] kdump[460]: saving vmcore
[ 3.661020] kdump[465]: saving vmcore failed
^^^^^^^^^^^^^^^^^^^^
[2] kexec-dmesg.log:
Nov 26 14:19:17 kvm-06-guest25.hv2.lab.eng.bos.redhat.com kdump[453]: saving vmcore-dmesg.txt to /sysroot//var/crash/127.0.0.1-2020-11-26-14:19:17/
Nov 26 14:19:17 kvm-06-guest25.hv2.lab.eng.bos.redhat.com kdump[458]: saving vmcore-dmesg.txt complete
Nov 26 14:19:17 kvm-06-guest25.hv2.lab.eng.bos.redhat.com kdump[460]: saving vmcore
Let's improve it in order to avoid the loss of important information.
Signed-off-by: Lianbo Jiang <lijiang@redhat.com>
Acked-by: Kairui Song <kasong@redhat.com>
If dracut-initqueue failed in kdump kernel and failure action
is set to dump_to_rootfs, there is no point try again to start the
initqueue. It will also slow down the dump process, and the initqueue
will most like still not work if first attemp failed.
So just try to start sysroot.mount, if it failed, there is no luck.
Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Pingfan Liu <piliu@redhat.com>
Remove the --real when calling findmnt.
The option is only useful in capture kernel, to avoid
`findmnt` returning the pseudo 'rootfs' for non mounted path.
example, when /kdumproot/mnt/ is not mounted:
kdump:/# findmnt --target /kdumproot/mnt
TARGET SOURCE FSTYPE OPTIONS
/ rootfs rootfs rw,size=61368k,nr_inodes=15342
kdump:/# findmnt --target /kdumproot/mnt
<return 1 and empty output>
But this function will make findmnt also return empty value for bind
mount. So remove it and add an extra if statement for second kernel.
Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Pingfan Liu <piliu@redhat.com>
Let's add the rd.kdumploglvl option to control log level in the second
kernel, which can make us avoid rebuilding the kdump initramfs after we
change the log level in /etc/sysconfig/kdump.
Signed-off-by: Lianbo Jiang <lijiang@redhat.com>
Acked-by: Kairui Song <kasong@redhat.com>
Currently, the makedumpfile option '--message-level' is set to 1 when
dumping the vmcore, it only displays the progress indicator message,
but there are no common message and error message, it is important to
report some additional messages, especially for the error message,
which is very useful for the debugging.
In view of this, let's change the message level to 7 by default.
Signed-off-by: Lianbo Jiang <lijiang@redhat.com>
Acked-by: Kairui Song <kasong@redhat.com>
Let's use the logger in the second kernel and collect the kernel ring
buffer(dmesg) of the second kernel.
Signed-off-by: Lianbo Jiang <lijiang@redhat.com>
Acked-by: Kairui Song <kasong@redhat.com>
This reverts commit cee618593c.
Upstream dracut have provided a parameter for adding mandantory network
requirement by appending "rd.neednet" parameter, so we should use that
instead.
Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Dave Young <dyoung@redhat.com>
Use get_mount_info so that fstab is used as a failback when look for
mount info.
Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Pingfan Liu <piliu@redhat.com>
Use is_mounted helper instaed of calling findmnt directly or checking if
"mount" value is empty.
If findmnt looks for fstab as well, some non mounted entry will also
return value. Required to support non-mounted target.
Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Pingfan Liu <piliu@redhat.com>
In second kernel, kdump always prints redundant '/':
kdump: saving to /sysroot//var/crash/127.0.0.1-2020-03-12-21:32:54/
Just trim it.
Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Dave Young <dyoung@redhat.com>
Since commit 6dee286467 ("Don't mount the dump target unless needed"),
dump_fs function unmounts the dump target just after saving vmcore.
This broke the condition that it's mounted when executing "kdump_post",
which had been stable since RHEL5, and a certain tool which uses the
kdump_post hook to save information of 2nd kernel to the dump target
started to fail.
As unmounting it is done by systemd-shutdown before reboot without
the umount command as below, so let's don't unmount it in dump_fs.
systemd-shutdown[1]: Unmounting file systems.
[547]: Remounting '/sysroot' read-only in with options '(null)'.
EXT4-fs (dm-0): re-mounted. Opts: (null)
[548]: Unmounting '/sysroot'.
Signed-off-by: Kazuhito Hagio <k-hagio-ab@nec.com>
Acked-by: Kairui Song <kasong@redhat.com>
With FADump support added on POWERNV paltform, enable the scripts to
capture /proc/vmcore. Also, if CONFIG_OPAL_CORE is enabled, OPAL core
is preserved and exported on POWERNV platform. So, offload OPAL core,
if it is available.
Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Acked-by: Kairui Song <kasong@redhat.com>
The dracut initqueue may quit immediately and won't trigger any hook if
there is no "finished" hook still pending (finished hook will be deleted
once it return 0).
This issue start to appear with latest dracut, latest dracut use
network-manager to configure the network,
network-manager module only install "settled" hook, and we didn't
install any other hook. So NFS/SSH dump will fail. iSCSI dump works
because dracut iscsi module will install a "finished" hook to detect if
the iscsi target is up.
So for NFS/SSH we keep initqueue running until the host successfully get
a valid IP address, which means the network is ready.
Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Pingfan Liu <piliu@redhat.com>
If failure_action is shutdown/reboot/halt, final_action is pointless as
the system will be already stopping. And if final_action is different
from failure_action, it will trigger a systemd race problem and cause
unexpected behavior to occur.
So let the error handler stop and exit after performing failure_action
successfully if failure_action is one of shutdown/reboot/halt.
This way, final_action will not be executed.
Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Pingfan Liu <piliu@redhat.com>
For fadump, this helps to reduce the risk of boot failure, and
may also help speed up the boot by a bit.
For normal kdump, this will delay the dump target mounting, and no
longer depend on systemd to do the mounting job.
And currently there is a failure that caused by some mount handling
bug with kernel and systemd that is failing the system booting:
[FAILED] Failed to mount /kdumproot/home.
See 'systemctl status kdumproot-home.mount' for details.
[DEPEND] Dependency failed for Local File Systems.
[ OK ] Reached target Remote File Systems (Pre).
[ OK ] Reached target Remote File Systems.
Starting udev Coldplug all Devices...
Starting Create Volatile Files and Directories...
Starting Kdump Emergency...
This patch can bypass it. The fix of root cause is still WIP, but this
patch itself is a nice to have optimization so it's reasonable to do so.
Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Pingfan Liu <piliu@redhat.com>
When reading kdump configs, a single parsing should be enough and this
saves a lot of duplicated striping call which speed up the total load
speed.
Speed up about 2 second when building and 0.1 second for reload in my
tests.
Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Dave Young <dyoung@redhat.com>
If a crash occurs repeatedly after enabling kdump, the system goes
into a crash loop and the dump target may get filled up by vmcores.
This is likely especially with early kdump.
This patch introduces 'final_action' option to kdump.conf, in order
for users to be able to power off the system even after capturing
a vmcore successfully.
Signed-off-by: Kazuhito Hagio <k-hagio@ab.jp.nec.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Lianbo Jiang <lijiang@redhat.com>
Cc: Bhupesh Sharma <bhsharma@redhat.com>
Acked-by: Bhupesh Sharma <bhsharma@redhat.com>
Acked-by: Dave Young <dyoung@redhat.com>
Signed-off-by: Kairui Song <kasong@redhat.com>
In preparation for adding 'final_action' option, since it's confusing
to have the 'final_action' and 'default' options at the same time,
this patch introduces 'failure_action' as an alias of the 'default'
option to /etc/kdump.conf, and makes 'default' obsolete to be removed
in the future.
Also, the "default action" term is renamed to "failure action".
Signed-off-by: Kazuhito Hagio <k-hagio@ab.jp.nec.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Lianbo Jiang <lijiang@redhat.com>
Cc: Bhupesh Sharma <bhsharma@redhat.com>
Acked-by: Bhupesh Sharma <bhsharma@redhat.com>
Acked-by: Dave Young <dyoung@redhat.com>
Signed-off-by: Kairui Song <kasong@redhat.com>
If default action is poweroff, we can observe that the machine is
rebooted, instead of poweroff. That is due to the following two race
processes:
systemctl poweroff
systemctl reboot -f
which is launched by kdump-error-handle.sh.
Unfortunately, although both of them are executed in systemd block
mode, but due to poweroff will tear down some internal things in
systemd, there is no guarantee for the block mode. As we can see
the msg "Failed to execute operation: Connection reset by peer",
which is thrown by "systemctl reboot -f".
poweroff and reboot share most of code, if one fails, then the other
should also fails, so it is meaningless to use reboot as the backup of
poweroff. Using "systemctl poweroff -f", the sdbus will teared down
immediately, which prevent the following "systemctl reboot -f" from
executing. Meanwhile, as man systemctl says:
-f, --force
When used with enable, overwrite any existing conflicting symlinks.
When used with halt, poweroff, reboot or kexec, execute the selected
operation without shutting down all units. However, all processes will
be killed forcibly and all file systems are unmounted or remounted read-only.
Hence, replacing the 'poweroff' with 'systemctl poweroff -f'
Signed-off-by: Pingfan Liu <piliu@redhat.com>
Acked-by: Kairui Song <kasong@redhat.com>
Currently the script does not check if the dump target is read-only and would
always mount to read-write mode. This caused an issue with nfs mount as the
fstab options would be reconsidered while remounting to read-write mode.
The remount would fail with the below error as all options cannot be changed
runtime.
mount.nfs: mount(2): Invalid argument
mount.nfs: an incorrect mount option was specified
Which in result would not save the vmcore on the dump target.
This patch addresses this issue by checking the dump target status for read-only.
If yes, remount to read-write mode without reconsidering the fstab options.
Signed-off-by: Kenneth D'souza <kdsouza@redhat.com>
Acked-by: Kairui Song <kasong@redhat.com>
The kdump-capture.service will fail, if the following conds are meet up.
-1. boot up a VM with the following cmd:
qemu-kvm -name 'avocado-vt-vm1' -sandbox off -machine pc -nodefaults -vga cirrus \
-drive id=drive_image1,if=none,snapshot=off,aio=native,cache=none,format=qcow2,file=$guest_img \
-device virtio-blk-pci,id=image1,drive=drive_image1,bootindex=0,bus=pci.0,addr=04 \
-device virtio-net-pci,mac=9a:4d:4e:4f:50:51,id=id3DveCw,vectors=4,netdev=idgW5YRp,bus=pci.0,addr=05 \
-netdev tap,id=idgW5YRp \
-m 2048 \
-smp 4,maxcpus=4,cores=2,threads=1,sockets=2 \
-cpu 'SandyBridge',+kvm_pv_unhalt \
-vnc :0 \
-rtc base=utc,clock=host,driftfix=slew \
-boot order=cdn,once=c,menu=off,strict=off \
-enable-kvm \
-monitor stdio \
-qmp tcp:localhost:4444,server,nowait
-2. in kernel cmdline with the following options: console=tty0 console=ttyS0,
Because the "-nodefaults" option in qemu cmd excludes the emulation of serial port, the ttyS0 will
have no real backend device. We can observe such issue in 1st kernel by:
echo teststring > /dev/console or
echo teststring > /dev/ttyS0,
It gets the error "-bash: echo: write error: Input/output error".
Such conds cause small issue in 1st kernel, but it is a big problem for kdump-capture and emergency
service.
This patch aims to work aroundthe issue in kdump-capture service:
dump_fs() return value will affect the following code in dracut-kdump.sh
DUMP_RETVAL=$? <---
do_kdump_post $DUMP_RETVAL
if [ $? -ne 0 ]; then
echo "kdump: kdump_post script exited with non-zero status!"
fi
Although kdump-capture saves the vmcore successfully, but it exit 1 and
fall on emergency service.
Signed-off-by: Pingfan Liu <piliu@redhat.com>
Reviewed-by: Xunlei Pang <xlpang@redhat.com>
Acked-by: Dave Young <dyoung@redhat.com>
In latest rawhide kdump kernel reboot hangs because systemd reports a
conflict when kdump calls reboot during booting. Need further investigation
about the new systemd behavior.
Here is the error message copied from kdump session:
[snip]
kdump: saving vmcore complete
Failed to start reboot.target: Transaction contains conflicting jobs 'stop' and 'start' for shutdown.target. Probably contradicting requirement dependencies configured.
Failed to talk to init daemon.
[FAILED] Failed to start Kdump Vmcore Save Service.
[snip]
We previouly use "reboot -f" but later we changed to reboot because we want
systemd to take care of the shutdown path, mainly for umount filesystems.
Change back to "reboot -f" works but we still need umount by ourselves.
During my tests with "reboot -f" I get below dirty ext2 filesystem:
[root@localhost ~]# fsck /dev/vdb
fsck from util-linux 2.27
e2fsck 1.42.13 (17-May-2015)
/dev/vdb was not cleanly unmounted, check forced.
Actually "reboot -f" equals to "systemctl reboot -f -f"
systemctl manpage says "-f" and "-f -f" means different behavior:
When use -f with reboot, will execute reboot without shutting down all units.
However all processes will be killed forcibly and all file systems are
unmounted or remounted read-only. If -f is specified twice, will reboot
immediately without terminating any processes or unmounting any file systems.
Thus change to use "systemctl reboot -f" for our reboot actions. It can fix
the problem and at the same time it can ensure filesystems are umounted before
rebooting.
OTOH, a systemd changes cause the breakage, it may be a system service new
design, Later I can dig into systemd changes see which commit cause the
breakage.
Signed-off-by: Dave Young <dyoung@redhat.com
Signed-off-by: Dangyi Liu <dliu@redhat.com>
Acked-by: Baoquan He <bhe@redhat.com>
This reverts commit f4c45236bf.
Since that commit will change the behaviour of kdump_post. That is not
good.
Signed-off-by: Baoquan He <bhe@redhat.com>
Now we use the pattern:
<machine/ipaddr>-YYYY.MM.DD-HH:MM:SS
while rhel6 uses the following:
<machine/ipaddr>-YYYY-MM-DD-HH:MM:SS
This change may break someone's script and we should change it back to
keep consistent between releases.
Signed-off-by: WANG Chao <chaowang@redhat.com>
Acked-by: Baoquan He <bhe@redhat.com>
Acked-by: Minfei Huang <mhuang@redhat.com>
User complains that kdump_post script doesn't execute after mount
failed. This happened since mount failure will trigger
kdump-error-handler.service, and then start kdump-error-handler.sh.
However in kdump-error-handler.sh it doesn't execute kdump_post.
Hence add it in this patch.
Surely the function do_kdump_post need be moved into kdump-lib-initramfs.sh
to be a common function.
v1->v2:
Add a return value to do_kdump_post when invoked in kdump_error-handler.sh.
And call do_kdump_post earlier than do_default_action, otherwise
it may not execute if reboot/poweroff/halt.
Signed-off-by: Baoquan He <bhe@redhat.com>
Acked-by: Dave Young <dyoung@redhat.com>
Acked-by: Meifei Huang <mhuang@redhat.com>
In ssh or raw dump case, if user do not specify "core_collector" in
kdump.conf, kdump will fail. Because global DEFAULT_CORE_COLLECTOR
variable isn't applied to CORE_COLLECTOR. Now fix it and clean up the
duplicate code in kdump.sh.
Signed-off-by: WANG Chao <chaowang@redhat.com>
Acked-by: Dave Young <dyoung@redhat.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Since we've use systemd to control the shutdown path, there's not need
for us to unmount the filesystem, systemd will do that for us just like
it does in a normal boot.
Signed-off-by: WANG Chao <chaowang@redhat.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
It's more safe to use systemd (init) to control the shutdown path for us
in either reboot or power off or halt action.
Signed-off-by: WANG Chao <chaowang@redhat.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Now upon failure kdump script might not be called at all and it might
not be able to execute default action. It results in a hang.
Because we disable emergency shell and rely on kdump.sh being invoked
through dracut-pre-pivot hook. But it might happen that we never call
into dracut-pre-pivot hook because certain systemd targets could not
reach due to failure in their dependencies. In those cases error
handling code does not run and system hangs. For example:
sysroot-var-crash.mount --> initrd-root-fs.target --> initrd.target \
--> dracut-pre-pivot.service --> kdump.sh
If /sysroot/var/crash mount fails, initrd-root-fs.target will not be
reached. And then initrd.target will not be reached,
dracut-pre-pivot.service wouldn't run. Finally kdump.sh wouldn't run.
To solve this problem, we need to separate the error handling code from
dracut-pre-pivot hook, and every time when a failure shows up, the
separated code can be called by the emergency service.
By default systemd provides an emergency service which will drop us into
shell every time upon a critical failure. It's very convenient for us to
re-use the framework of systemd emergency, because we don't have to
touch the other parts of systemd. We can use our own script instead of
the default one.
This new scheme will overwrite emergency shell and replace with kdump
error handling code. And this code will do the error handling as needed.
Now, we will not rely on dracut-pre-pivot hook running always. Instead
whenever error happens and it is serious enough that emergency shell
needed to run, now kdump error handler will run.
dracut-emergency is also replaced by kdump error handler and it's
enabled again all the way down. So all the failure (including systemd
and dracut) in 2nd kernel could be captured, and trigger kdump error
handler.
dracut-initqueue is a special case, which calls "systemctl start
emergency" directly, not via "OnFailure=emergency". In case of failure,
emergency is started, but not in a isolation mode, which means
dracut-initqueue is still running. On the other hand, emergency will
call dracut-initqueue again when default action is dump_to_rootfs.
systemd would block on the last dracut-initqueue, waiting for the first
instance to exit, which leaves us hang. It looks like the following:
dracut-initqueue (running)
--> call dracut-emergency:
--> dracut-emergency (running)
--> kdump-error-handler.sh (running)
--> call dracut-initqueue:
--> blocking and waiting for the original instance to exit.
To fix this, I'd like to introduce a wrapper emergency service. This
emegency service will replace both the systemd and dracut emergency. And
this service does nothing but to isolate to real kdump error handler
service:
dracut-initqueue (running)
--> call dracut-emergency:
--> dracut-emergency isolate to kdump-error-handler.service
--> dracut-emergency and dracut-initqueue will both be stopped
and kdump-error-handler.service will run kdump-error-handler.sh.
In a normal failure case, this still works:
foo.service fails
--> trigger emergency.service
--> emergency.service isolates to kdump-error-handler.service
--> kdump-error-handler.service will run kdump-error-handler.sh
Signed-off-by: WANG Chao <chaowang@redhat.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Acked-by: Dave Young <dyoung@redhat.com>
This patch does the following change in 2nd kernel:
- dump target is mounted under /sysroot
With this change, we don't need to track what we've mounted in 2nd
kernel. We can just umount recursively every mount in /sysroot by
command:
umount -R /sysroot
It's very convenient to do so, because it's hard to track what we've
mounted when we're in error handling path (later patches). So mount
everything under /sysroot is reasonable and practical for us.
Also clean up a bit along with this patch.
Signed-off-by: WANG Chao <chaowang@redhat.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Acked-by: Dave Young <dyoung@redhat.com>
Extract functions from kdump.sh, and construct kdump-lib-initramfs.sh as
kdump common functions/varaibles library.
kdump-lib-initramfs.sh will include kdump-lib.sh, because it will use
the functions from there. IOW, kdump-lib-initramfs.sh will be a superset
of kdump-lib.sh
So after this cleanup:
- scripts running in 1st kernel only have to include kdump-lib.sh
- scripts running in 2nd kernel only have to include kdump-lib-initramfs.sh
Signed-off-by: WANG Chao <chaowang@redhat.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Acked-by: Dave Young <dyoung@redhat.com>