Upstream: fedora
Resolves: RHEL-32060
Conflict: Yes, there are several conflicts. 1) Upstream have moved
dracut-kdump.sh into kdump-utils/dracut/99kdumpbase/kdump.sh,
so the targeting files are changed. 2) There are several
patchsets([1] [2]) which not backported to rhel9, so some
formating conflicts encountered. But there is no functional
change been made for the patch backporting.
[1]: https://github.com/rhkdump/kdump-utils/pull/18/commits
[2]: https://github.com/rhkdump/kdump-utils/pull/33/commits
commit 88525ebf5e43cc86aea66dc75ec83db58233883b
Author: Tao Liu <ltao@redhat.com>
Date: Thu Sep 5 15:49:07 2024 +1200
Introduce vmcore creation notification to kdump
Motivation
==========
People may forget to recheck to ensure kdump works, which as a result, a
possibility of no vmcores generated after a real system crash. It is
unexpected for kdump.
It is highly recommended people to recheck kdump after any system
modification, such as:
a. after kernel patching or whole yum update, as it might break something
on which kdump is dependent, maybe due to introduction of any new bug etc.
b. after any change at hardware level, maybe storage, networking,
firmware upgrading etc.
c. after implementing any new application, like which involves 3rd party modules
etc.
Though these exceed the range of kdump, however a simple vmcore creation
status notification is good to have for now.
Design
======
Kdump currently will check any relating files/fs/drivers modified before
determine if initrd should rebuild when (re)start. A rebuild is an
indicator of such modification, and kdump need to be rechecked. This will
clear the vmcore creation status specified in $VMCORE_CREATION_STATUS.
Vmcore creation check will happen at "kdumpctl (re)start/status", and will
report the creation success/fail status to users. A "success" status indicates
previously there has been a vmcore successfully generated based on the current
env, so it is more likely a vmcore will be generated later when real crash
happens; A "fail" status indicates previously there was no vmcore
generated, or has been a vmcore creation failed based on current env. User
should check the 2nd kernel log or the kexec-dmesg.log for the failing reason.
$VMCORE_CREATION_STATUS is used for recording the vmcore creation status of
the current env. The format will be like:
success 1718682002
Which means, there has been a vmcore generated successfully at this
timestamp for the current env.
Usage
=====
[root@localhost ~]# kdumpctl restart
kdump: kexec: unloaded kdump kernel
kdump: Stopping kdump: [OK]
kdump: kexec: loaded kdump kernel
kdump: Starting kdump: [OK]
kdump: Notice: No vmcore creation test performed!
[root@localhost ~]# kdumpctl test
[root@localhost ~]# kdumpctl status
kdump: Kdump is operational
kdump: Notice: Last successful vmcore creation on Tue Jun 18 16:39:10 CST 2024
[root@localhost ~]# kdumpctl restart
kdump: kexec: unloaded kdump kernel
kdump: Stopping kdump: [OK]
kdump: kexec: loaded kdump kernel
kdump: Starting kdump: [OK]
kdump: Notice: Last successful vmcore creation on Tue Jun 18 16:39:10 CST 2024
The notification for kdumpctl (re)start/status can be disabled by
setting VMCORE_CREATION_NOTIFICATION in /etc/sysconfig/kdump
Signed-off-by: Tao Liu <ltao@redhat.com>
Signed-off-by: Tao Liu <ltao@redhat.com>
Resolves: https://issues.redhat.com/browse/RHEL-56832
Upstream Status: RHEL-only
This reverts commit 099aead590.
Currently get_mntpoint_from_target incorrectly return empty result for
targets that contain square bracket '[', e.g
- eng.redhat.com:/srv/[nfs]
- [2620:52:0:a1:217:38ff:fe01:131]:/srv/[nfs]
- /dev/mapper/rhel[disk]
get_mntpoint_from_target is also used in several places. To avoid
RHEL-56832 and other possible regressions, revert the bad commit.
Suggested-by: Dave Young <dyoung@redhat.com>
Signed-off-by: Coiby Xu <coxu@redhat.com>
Resolves: RHEL-35885
commit 9252d6b1b492016aa11a73340f286822e6d545f2
Author: Colin Walters <walters@verbum.org>
Date: Fri Jul 19 11:44:09 2024 -0400
lib: Ensure we don't find bind mounts for device target
There's comment here that `--source` somehow avoids bind
mounts, but that appears not to be the case in my
testing. I think we just happened to be lucky before
now with the `--first` picking the value we wanted.
Instead of using `--first` and hoping for the best,
parse the mounts and skip ones which are bind mounts
explicitly.
Signed-off-by: Colin Walters <walters@verbum.org>
Signed-off-by: Lichen Liu <lichliu@redhat.com>
Related: bz2151504
Upstream: Fedora
Conflict: None
commit 12d9eff9dc
Author: Coiby Xu <coxu@redhat.com>
Date: Tue Mar 28 16:33:34 2023 +0800
Show how much time kdump has waited for the network to be ready
Relates: https://bugzilla.redhat.com/show_bug.cgi?id=2151504
Currently, when the network isn't ready, kdump would repeatedly print
the same info,
[ 29.537230] kdump[671]: Bad kdump network destination: 192.123.1.21
[ 30.559418] kdump[679]: Bad kdump network destination: 192.123.1.21
[ 31.580189] kdump[687]: Bad kdump network destination: 192.123.1.21
This is not user-friendly and users may think kdump has got stuck. So
also show much time has waited for the network to be ready,
[ 29.546258] kdump[673]: Waiting for network to be ready (50s / 10min)
...
[ 32.608967] kdump[697]: Waiting for network to be ready (56s / 10min)
Note kdump_get_ip_route no longer prints an error message and it's up to
the caller to determine the log level and print relevant messages. And
kdump_collect_netif_usage aborts when kdump_get_ip_route fails.
Reported-by: Martin Pitt <mpitt@redhat.com>
Signed-off-by: Coiby Xu <coxu@redhat.com>
Reviewed-by: Philipp Rudo <prudo@redhat.com>
Signed-off-by: Coiby Xu <coxu@redhat.com>
Resolves: bz2076416
Upstream: Fedora
Conflict: None
commit 568623e69a
Author: Coiby Xu <coxu@redhat.com>
Date: Thu Sep 23 14:25:01 2021 +0800
Address the cases where a NIC has a different name in kdump kernel
A NIC may get a different name in the kdump kernel from 1st kernel
in cases like,
- kernel assigned network interface names are not persistent e.g. [1]
- there is an udev rule to rename the NIC in the 1st kernel but the
kdump initrd may not have that rule e.g. [2]
If NM tries to match a NIC with a connection profile based on NIC name
i.e. connection.interface-name, it will fail the above bases. A simple
solution is to ask NM to match a connection profile by MAC address.
Note we don't need to do this for user-created NICs like vlan, bridge and
bond.
An remaining issue is passing the name of a NIC via the kdumpnic dracut
command line parameter which requires passing ifname=<interface>:<MAC> to
have fixed NIC name. But we can simply drop this requirement. kdumpnic
is needed because kdump needs to get the IP by NIC name and use the IP
to created a dumping folder named "{IP}-{DATE}". We can simply pass the
IP to the kdump kernel directly via a new dracut command line parameter
kdumpip instead. In addition to the benefit of simplifying the code,
there are other three benefits brought by this approach,
- make use of whatever network to transfer the vmcore. Because as long
as we have the network to we don't care which NIC is active.
- if obtained IP in the kdump kernel is different from the one in the
1st kernel. "{IP}-{DATE}" would better tell where the dumped vmcore
comes from.
- without passing ifname=<interface>:<MAC> to kdump initrd, the
issue of there are two interfaces with the same MAC address for
Azure Hyper-V NIC SR-IOV [3] is resolved automatically.
[1] https://bugzilla.redhat.com/show_bug.cgi?id=1121778
[2] https://bugzilla.redhat.com/show_bug.cgi?id=810107
[3] https://bugzilla.redhat.com/show_bug.cgi?id=1962421
Signed-off-by: Coiby Xu <coxu@redhat.com>
Reviewed-by: Thomas Haller <thaller@redhat.com>
Reviewed-by: Philipp Rudo <prudo@redhat.com>
Signed-off-by: Coiby Xu <coxu@redhat.com>
resolves: bz2083475
upstream: fedora
conflict: none
commit 10ca970940
Author: Tao Liu <ltao@redhat.com>
Date: Sat Oct 8 15:41:40 2022 +0800
lvm.conf should be check modified if lvm2 thinp enabled
lvm2 relies on /etc/lvm/lvm.conf to determine its behaviour. The
important configs such as thin_pool_autoextend_threshold and
thin_pool_autoextend_percent will be used during kdump in 2nd
kernel. So if the file is modified, the initramfs should be
rebuild to include the latest.
Signed-off-by: Tao Liu <ltao@redhat.com>
Reviewed-by: Philipp Rudo <prudo@redhat.com>
Signed-off-by: Tao Liu <ltao@redhat.com>
resolves: bz2083475
upstream: fedora
conflict: none
commit 0a5b71d123
Author: Tao Liu <ltao@redhat.com>
Date: Sat Oct 8 15:41:39 2022 +0800
Add lvm2 thin provision dump target checker
We need to check if a directory or a device is lvm2 thinp target.
First, we use get_block_dump_target() to convert dump path into
block device, then we check if the device is lvm2 thinp target by
cmd lvs.
is_lvm2_thinp_device is now located in kdump-lib-initramfs.sh, for it
will be used in 2nd kernel. is_lvm2_thinp_dump_target is located in
kdump-lib.sh, for it is only used in 1st kernel, and it has dependencies
which exist in kdump-lib.sh.
Signed-off-by: Tao Liu <ltao@redhat.com>
Reviewed-by: Philipp Rudo <prudo@redhat.com>
Signed-off-by: Tao Liu <ltao@redhat.com>
upstream: fedora
resolves: bz2085347
conflict: yes, small conflict due to patch
"kdumpctl: drop DUMP_TARGET variable" not
backported to rhel9.
commit c743881ae6
Author: Tao Liu <ltao@redhat.com>
Date: Fri Sep 23 18:13:11 2022 +0800
virtiofs support for kexec-tools
This patch add virtiofs support for kexec-tools by introducing a new option
for /etc/kdump.conf:
virtiofs myfs
Where myfs is a variable tag name specified in qemu cmdline
"-device vhost-user-fs-pci,tag=myfs".
The patch covers the following cases:
1) Dumping VM's vmcore to a virtiofs shared directory;
2) When the VM's rootfs is a virtiofs shared directory and dumping the
VM's vmcore to its subdirectory, such as /var/crash;
3) The combination of case 1 & 2: The VM's rootfs is a virtiofs shared
directory and dumping the VM's vmcore to another virtiofs shared
directory.
Case 2 & 3 need dracut >= 057, otherwise VM cannot boot from virtiofs
shared rootfs. But it is not the issue of kexec-tools.
Reviewed-by: Philipp Rudo <prudo@redhat.com>
Signed-off-by: Tao Liu <ltao@redhat.com>
Signed-off-by: Tao Liu <ltao@redhat.com>
Resolves: bz2031736
Upstream: Fedora
Conflict: None
commit 3cd561fcbcb3ba4f285e746d81e1e6dae17447c3 (HEAD)
Author: Pingfan Liu <piliu@redhat.com>
Date: Tue Jan 18 10:42:00 2022 +0800
move variable FENCE_KDUMP_SEND from kdump-lib.sh to kdump-lib-initramfs.sh
Since kdump-lib-initramfs.sh is included by kdump-lib.sh, and
FENCE_KDUMP_SEND is used by both 1st and 2nd kernel, moving
FENCE_KDUMP_SEND from kdump-lib.sh to kdump-lib-initramfs.sh.
Signed-off-by: Pingfan Liu <piliu@redhat.com>
Acked-by: Tao Liu <ltao@redhat.com>
Signed-off-by: Pingfan Liu <piliu@redhat.com>
upstream: fedora
resolves: bz2003832
conflict: none
commit ee337c6f49
Author: Kairui Song <kasong@redhat.com>
Date: Mon Sep 13 03:38:14 2021 +0800
Add header comment for POSIX compliant scripts
To make things cleaner and more human readable, add a short comment for
the POSIX scripts.
Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Philipp Rudo <prudo@redhat.com>
Signed-off-by: Tao Liu <ltao@redhat.com>
upstream: fedora
resolves: bz2003832
conflict: none
commit 5debf397fe
Author: Kairui Song <kasong@redhat.com>
Date: Tue Sep 14 03:04:08 2021 +0800
kdump-lib-initramfs.sh: make it POSIX compatible
POSIX doesn't support keyword local, so add double underscore and prefix
to variable names, and reduce variable usage, to avoid any variable name
conflict.
Also reformat the code with `shfmt -s -w kdump-lib-initramfs.sh`.
Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Philipp Rudo <prudo@redhat.com>
Signed-off-by: Tao Liu <ltao@redhat.com>
upstream: fedora
resolves: bz2003832
conflict: none
commit a1205effaa
Author: Kairui Song <kasong@redhat.com>
Date: Thu Aug 5 00:59:29 2021 +0800
kdump-lib-initramfs.sh: move dump related functions to kdump.sh
These dump related functions are only used by dracut-kdump.sh.
Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Philipp Rudo <prudo@redhat.com>
Signed-off-by: Tao Liu <ltao@redhat.com>
upstream: fedora
resolves: bz2003832
conflict: none
commit a5faa052d4
Author: Kairui Song <kasong@redhat.com>
Date: Tue Sep 14 03:25:46 2021 +0800
kdump-lib-initramfs.sh: prepare to be a POSIX compatible lib
Move all functions needed in the second kernel from kdump-lib.sh
to kdump-lib-initramfs.sh, and update shebang headers.
Now, kdump-lib-initramfs.sh is an independent lib script, no longer
depend on kdump-lib.sh, and kdump-lib.sh is no longer needed for
the second kernel.
In later commits, functions in kdump-lib-initramfs.sh will be reworked
to be POSIX compatible, kdump-lib.sh will contain bash only functions.
POSIX shell have very limited features, eg. `local` keyword doesn't
exist in POSIX but we rely on that heavily. So kdump-lib.sh will
use bash syntax and contain the most complex helper and codes.
kdump-lib-initramfs.sh will contain the minimum set of helpers,
and be shared by both the first and second kernel.
Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Philipp Rudo <prudo@redhat.com>
Signed-off-by: Tao Liu <ltao@redhat.com>
upstream: fedora
resolves: bz2003832
conflict: none
commit a0282ab22c
Author: Kairui Song <kasong@redhat.com>
Date: Tue Aug 3 19:49:51 2021 +0800
kdump-lib.sh: add a config format and read helper
Add a helper `kdump_read_conf` to replace read_strip_comments.
`kdump_read_conf` does a few more things:
- remove trailing spaces.
- format the content, remove duplicated spaces between name and value.
- read from KDUMP_CONFIG_FILE (/etc/kdump.conf) directly, avoid pasting
"/etc/kdump.conf" path everywhere in the code.
- check if config file exists, just in case.
Also unify the environmental variable, now KDUMP_CONFIG_FILE stands for
the default config location.
This helps avoid some shell pitfalls about spaces when reading config.
Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Philipp Rudo <prudo@redhat.com>
Signed-off-by: Tao Liu <ltao@redhat.com>
Previous commit da6b280 ('Cleanup dead systemd services before start sysroot.mount')
is not enough for fixing bz1972463. Coiby found a new issue that never
saw before, this patch fixes it.
Resolves: bz1972463
Conflict: None
Upstream: Fedora
commit 660cf4ac03
Author: Kairui Song <kasong@redhat.com>
Date: Tue Jul 20 13:41:08 2021 +0800
Make `dump_to_rootfs` wait for 90s for real
When `failure_action` is set to `dump_to_rootfs`, the message:
"Waiting for rootfs mount, will timeout after 90 seconds"
is actually wrong. Kdump will simply call `systemctl start sysroot.mount`,
but the timeout value of sysroot.mount depends on the unit service and
dracut parameters. And by default, dracut will set
JobRunningTimeoutSec=0 and JobTimeoutSec=0 for the device units,
which means it will wait forever. (see wait_for_dev function in dracut)
For some devices, this can be fixed by setting rd.timeout=90. But when
initqueue is set enabled during initramfs build, dracut will force set
timeout for host devices to `0`. (see 99base/module-setup.sh).
Depending on dracut / systemd can make things unpredictable and break as
parameters or code change. To make things easy to understand and
maintain, just call `systemctl` with `--no-block` params, and implement
a standalone wait loop. Now `dump_to_rootfs` will actually wait for
90s then timeout.
Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Coiby Xu <coxu@redhat.com>
Signed-off-by: Kairui Song <kasong@redhat.com>
Resolves: bz1972463
Conflict: None
Upstream: Fedora
commit 2603ba7187 (origin/rawhide, rawhide)
Author: Kairui Song <kasong@redhat.com>
Date: Fri Jul 2 03:27:05 2021 +0800
Cleanup dead systemd services before start sysroot.mount
When kdump failed due to initqueue timeout, the sysroot.mount and other
serivces could be stuck in `start` but `dead` status:
Example output of systemctl:
dev-disk-by\x2duuid-530830d1\x2df2c7\x2d4c9a\x2d9a82\x2d148609097521.device loaded inactive dead start
<... snip ...>
squash-root.mount loaded active mounted /squash/root
squash.mount loaded active mounted /squash
sysroot.mount loaded inactive dead start /sysroot
<... snip ...>
dracut-cmdline.service loaded active exited dracut cmdline hook
dracut-initqueue.service loaded activating start start dracut initqueue hook
dracut-mount.service loaded inactive dead start dracut mount hook
At this point calling `systemctl start sysroot.mount` will just hang as
systemd will just wait for the services that are stuck in `start`
status. So call `systemctl cancel` here to cancel all pending jobs and
have a clean start for mounting sysroot.
Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Coiby Xu <coxu@redhat.com>
Signed-off-by: Kairui Song <kasong@redhat.com>
Resolves: bz1901024
Upstream: Fedora
Conflict: None
commit 41980f30d9
Author: Kairui Song <kasong@redhat.com>
Date: Mon Apr 26 17:09:57 2021 +0800
Use a customized emergency shell
Use a modified and minimized version of emergency shell.
The differences of this kdump shell and dracut emergency shell are:
- Kdump shell won't generate a rdsosreport automatically
- Customized prompts
- Never ask root password
- Won't tangle with dracut's emergency_action. If emergency_action is
set, dracut emergency shell will perform dracut's emergency_action
instead of kdump final_action on exit.
- If rd.shell=no is set, kdump shell will still work, dracut emergency
shell won't, even if kdump failure_action is set to shell.
Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Coiby Xu <coxu@redhat.com>
Signed-off-by: Kairui Song <kasong@redhat.com>
Resolves: bz1901024
Upstream: Fedora
Conflict: None
commit 108258139a
Author: Kairui Song <kasong@redhat.com>
Date: Mon Apr 26 17:09:55 2021 +0800
Don's try to restart dracut-initqueue if it's already there
kdump's dump_to_rootfs will try to start initqueue unconditionally.
dump_to_rootfs will run after systemd isolate to emergency
target, so this is currently accetable.
But there is a problem when initqueue starts the emergency action
because of initqueue timeout. dump_to_rootfs will start initqueue and
lead to timeout again.
So following patch will remove the previous isolation wrapper, and
detect the service status here. Previous isolation makes the detection
impossible. Now this detection will be valid and helpful to prevent
double timeout or hang.
Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Coiby Xu <coxu@redhat.com>
Signed-off-by: Kairui Song <kasong@redhat.com>
Resolves: bz1952652
Upstream: fedora
Conflict: none
commit d0e9c51e0d
Author: Hari Bathini <hbathini@linux.ibm.com>
Date: Thu Apr 22 18:21:59 2021 +0530
fadump: fix dump capture failure to root disk
If the dump target is the root disk, kdump scripts add an entry in
/etc/fstab for root disk with /sysroot as the mount point. The root
disk, passed through root=<> kernel commandline parameter, is mounted
at /sysroot in read-only mode before switching from initial ramdisk.
So, in fadump mode, a remount of /sysroot to read-write mode is needed
to capture dump successfully, because /sysroot is already mounted as
read-only based on root=<> boot parameter.
Commit e8ef4db8ff ("Fix dump_fs mount point detection and fallback
mount") removed initialization of $_op variable, the variable holding
the options the dump target was mounted with, leading to the below
error as remount was skipped:
kdump[586]: saving to /sysroot/var/crash/127.0.0.1-2021-04-22-07:22:08/
kdump.sh[587]: mkdir: cannot create directory '/sysroot/var/crash/127.0.0.1-2021-04-22-07:22:08/': Read-only file system
kdump[589]: saving vmcore failed
Restore $_op variable initialization in dump_fs() function to fix this.
Fixes: e8ef4db8ff ("Fix dump_fs mount point detection and fallback mount")
Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Acked-by: Kairui Song <kasong@redhat.com>
Signed-off-by: Tao Liu <ltao@redhat.com>
Resolves: bz1955453
Upstream: fedora
Conflict: none
commit ca05b754af
Author: Tao Liu <ltao@redhat.com>
Date: Mon May 10 22:10:26 2021 +0800
Fix incorrect file permissions of vmcore-dmesg-incomplete.txt
vmcore-dmesg-incomplete.txt is generated by shell redirection,
which taking the default umask value. When dmesg collector exits
with non-zero, the file will exist and anyone can have access to
it.
This patch fixed the issue by chmod the file, making it accessible
only to its owner.
Signed-off-by: Tao Liu <ltao@redhat.com>
Acked-by: Kairui Song <kasong@redhat.com>
Signed-off-by: Tao Liu <ltao@redhat.com>
Resolves: rhbz#1938165
Upstream: fedora
Conflict: none
commit 91c802ff52
Author: Tao Liu <ltao@redhat.com>
Date: Thu Mar 18 16:52:46 2021 +0800
Fix incorrect permissions on kdump dmesg file
Also known as CVE-2021-20269. The kdump dmesg log files(kexec-dmesg.log,
vmcore-dmesg.txt) are generated by shell redirection, which take the
default umask value, making the files readable for group and others.
This patch chmod these files, making them only accessible to owner.
Signed-off-by: Tao Liu <ltao@redhat.com>
Acked-by: Kairui Song <kasong@redhat.com>
Signed-off-by: Tao Liu <ltao@redhat.com>