Upstream: RHEL-only
Resolves: https://issues.redhat.com/browse/RHEL-47130
kdump prefixes kdump- to a name of a network interface when its name
matches eth.*. However, the regex is incorrect and matches names like
baremetal*. As a consequence, kdump failed with the following message,
[ 305.950223] kdump.sh[790]: Device "kdump-baremetal" does not exist.
[ 305.992018] kdump[795]: wrong kdumpnic: kdump-baremetal
[ 306.029386] kdump[797]: get_host_ip exited with non-zero status!
[ 306.038085] systemd[1]: kdump-capture.service: Main process exited, code=exited, status=1/FAILURE
[ 306.050082] systemd[1]: kdump-capture.service: Failed with result 'exit-code'.
Fixes: ba7660f ("dracut-module-setup: NIC renamed with prefix "kdump-" for native ethX")
Reported-by: Dima Shtranvasser <dshtranv@redhat.com>
Suggeste-by: Dima Shtranvasser <dshtranv@redhat.com>
Signed-off-by: Coiby Xu <coxu@redhat.com>
Resolves: https://issues.redhat.com/browse/RHEL-38358
Upstream: kdump-utils
commit 247c7a5f39b305f9a83bad2d936d00237165b7e0
Author: Mamoru Nishibe (Fujitsu) <nishibe.mamoru@fujitsu.com>
Date: Wed Apr 24 08:11:12 2024 +0000
mkdumprd: Fix makedumpfile parameter check.
If only "makedumpfile" is written in "core_collector" of /etc/kdump.conf
and try to run makedumpfile without options,
"makedumpfile --check-params" fails and terminates abnormally.
# grep ^core_collector /etc/kdump.conf
core_collector makedumpfile
# /usr/bin/kdumpctl start
:
Commandline parameter is invalid.
Try `makedumpfile --help' for more information.
kdump: makedumpfile parameter check failed.
kdump: mkdumprd: failed to make kdump initrd
kdump: Starting kdump: [FAILED]
On the other hand, "makedumpfile --check-params" works fine without any options.
# makedumpfile --check-params vmcore dumpfile
# echo $?
0
In addition, before verify_core_collector() was implemented,
initial RAM for kdump was successfully created using only "core_collector makedumpfile".
I consider it a regression.
This is due to a parameter extraction error in verify_core_collector().
Fix it to correctly extract only the options as follows.
Fixes: a1c28126 ("mkdumprd: Use makedumpfile --check-params option")
Signed-off-by: Mamoru Nishibe <nishibe.mamoru@fujitsu.com>
Reviewed-by: Coiby Xu <coxu@redhat.com>
Signed-off-by: Pingfan Liu <piliu@redhat.com>
Resolves: https://issues.redhat.com/browse/RHEL-25490
Upstream: Fedora
Conflict: None
commit 468336700d
Author: Lichen Liu <lichliu@redhat.com>
Date: Mon Jan 22 15:59:09 2024 +0800
dracut-module-setup: Skip initrd-cleanup and initrd-parse-etc in kdump
When using multipath devices as the target for kdump, if user_friendly_name
is also specified, devices default to names like "mpath*", e.g., mpatha.
In dracut, we obtain a persistent device name via get_persistent_dev. However,
dracut currently believes using /dev/mapper/mpath* could cause issues, thus
alternatively names are used, here it's /dev/disk/by-uuid/<FS_UUID>.
During the kdump boot progress, the /dev/disk/by-uuid/<FS_UUID> will exist as
soon as one of the path devices exists, but it won't be usable by systemd,
since multipathd will claim that device as a path device. Then multipathd will
get stopped before it can create the multipath device.
Without user_friendly_name, /dev/mapper/<WWID> is considered a persistent
device name, avoiding the issue.
The exit of multipathd is due to two dependencies in the current dracut module
90multipath/multipathd.service, "Before=initrd-cleanup.service" and
"Conflicts=initrd-cleanup.service".
As per man 5 systemd.unit, if A.service has "Conflicts=B.service", starting
B.service will stop A.service.
This is useful during normal boot. However, we will never switch-root after
capturing vmcore in kdump.
We need to ensure that multipathd is not killed due to such dependency issue.
Without modifying multipathd.service, we add ConditionPathExists=!/proc/vmcore
to skip initrd-cleanup.service in kdump. This approach is beneficial as
it avoid the potential termination of other services that conflict with
initrd-cleanup.service. Also skip initrd-parse-etc.service as it will try to
start initrd-cleanup.service. Both of these services are used for switch root,
so they can be safely skipped in kdump.
Suggested-by: Benjamin Marzinski <bmarzins@redhat.com>
Suggested-by: Dave Young <dyoung@redhat.com>
Signed-off-by: Lichen Liu <lichliu@redhat.com>
Reviewed-by: Philipp Rudo <prudo@redhat.com>
Signed-off-by: Lichen Liu <lichliu@redhat.com>
Resolves: https://issues.redhat.com/browse/RHEL-23831
Upstream: Fedora
Conflict: None
commit bc101086e2
Author: Coiby Xu <coxu@redhat.com>
Date: Mon Dec 12 18:37:25 2022 +0800
dracut-module-setup.sh: also install the driver of physical NIC for
Hyper-V VM with accelerated networking
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2151842
Currently, vmcore dumping to remote fs fails on Azure Hyper-V VM with
accelerated networking because it uses a physical NIC for accrelarated
networking [1]. In this case, the driver for this physical NIC should be
installed as well.
[1] https://learn.microsoft.com/en-us/azure/virtual-network/accelerated-networking-overview
Fixes: a65dde2d ("Reduce kdump memory consumption by only installing needed NIC drivers")
Reported-by: Xiaoqiang Xiong <xxiong@redhat.com>
Signed-off-by: Coiby Xu <coxu@redhat.com>
Reviewed-by: Philipp Rudo <prudo@redhat.com>
Signed-off-by: Coiby Xu <coxu@redhat.com>
Resolves: https://issues.redhat.com/browse/RHEL-10485
Upstream: Fedora
Conflict: Missing upstream patch d4e8772("kdumpctl: make do_estimate more
robust")
commit 741861164e
Author: Lichen Liu <lichliu@redhat.com>
Date: Mon Oct 30 14:51:59 2023 +0800
kdumpctl: Only returns immediately after an error occurs in check_*_modified
Currently is_system_modified will return immediately when check_*_modified
return a non-zero value, and the remaining checks will not be executed.
For example, if there is a fs-related error exists, and someone changes the
kdump.conf, check_files_modified will return 1 and is_system_modified will
return 1 immediately. This will cause kdumpctl to skip check_fs/drivers_modified,
kdump.service will rebuild the initrd and start successfully, however, any
errors should prevent kdump.service from starting.
This patch will cause check_*_modifed to continue running until an error occurs
or all execution ends.
Signed-off-by: Lichen Liu <lichliu@redhat.com>
Acked-by: Tao Liu <ltao@redhat.com>
Signed-off-by: Lichen Liu <lichliu@redhat.com>
Resolves: https://issues.redhat.com/browse/RHEL-14002
Upstream: Fedora
Conflict: there are changes in format.
commit 4fa17b2ee4
Author: Nayna Jain <nayna@linux.ibm.com>
Date: Tue Oct 3 23:41:47 2023 -0400
powerpc: update kdumpctl to load kernel signing key for fadump
On secure boot enabled systems with static keys, kexec with kexec_file_load(-s)
fails as "Permission Denied" when fadump is enabled.
Similar to kdump, load kernel signing key for fadump as well.
Reported-by: Sachin P Bappalige <sachinpb@linux.vnet.ibm.com>
Signed-off-by: Nayna Jain <nayna@linux.ibm.com>
Signed-off-by: Coiby Xu <coxu@redhat.com>
Resolves: https://issues.redhat.com/browse/RHEL-14002
Upstream: Fedora
Conflict: There are changes on format
commit fe6eb30e67
Author: Nayna Jain <nayna@linux.ibm.com>
Date: Tue Oct 3 23:41:46 2023 -0400
powerpc: update kdumpctl to remove deletion of kernel signing key once loaded
Kernel signing key is deleted once kdump is loaded. This causes confusion in
debugging since key is no longer visible. Unless someone knows how
kdumpctl script works, it is difficult to find out how kdump could be
loaded when there is no key on .ima keyring.
Remove deletion of kernel signing key once loaded. And then to prevent
multiple loading of same key when kdump service is disabled/enabled, update
key description field as well.
Suggested-by: Mimi Zohar <zohar@linux.ibm.com>
Signed-off-by: Nayna Jain <nayna@linux.ibm.com>
Reviewed-by: Philipp Rudo <prudo@redhat.com>
Signed-off-by: Coiby Xu <coxu@redhat.com>
Resolves: bz2235389
Upstream: Fedora Rawhide
Conflict: small change.
commit 4b7b7736ee
Author: Sourabh Jain <sourabhjain@linux.ibm.com>
Date: Wed Aug 2 20:36:48 2023 +0530
Introduce a function to get reserved memory size
The size of the reserved memory in the functions show_reserved_mem,
check_crash_mem_reserved, and do_estimate are fetched from the sysfs
node `/sys/kernel/kexec_crash_size`. However, in the case of fadump,
the reserved area size is instead present in
/sys/kernel/fadump/mem_reserved.
For example:
$ kdumpctl showmem
kdump: Dump mode is fadump
kdump: Reserved 0MB memory for crash kernel
The above command showed 0MB for Reserved memory which is incorrect, the
actual reservation was 2048MB.
To resolve this issue a new helper function is introduced to fetch
reserved memory size based on the dump mode. For "fadump" mode,
it looks in `/sys/kernel/fadump/mem_reserved`, otherwise, it uses
`/sys/kernel/kexec_crash_size`. And all functions that previously
fetching reserved memory directly from `/sys/kernel/kexec_crash_size`
sysfs node are now updated to use this new function to get the reserved
memory size.
With the fix in place, the `kdumpctl showmem` command will now display
correct reserved memory size.
$ kdumpctl showmem
kdump: Dump mode is fadump
kdump: Reserved 2048MB memory for crash kernel
Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com>
Reported-by: Sachin P Bappalige <sachinpb@linux.vnet.ibm.com>
Reviewed-by: Coiby Xu <coxu@redhat.com>
Signed-off-by: Lichen Liu <lichliu@redhat.com>
Resolves: bz2185794
Upstream: Fedora
Conflicts: None
commit e42a823dae
Author: Coiby Xu <coxu@redhat.com>
Date: Thu Jun 1 16:05:05 2023 +0800
mkdumprd: Use the correct syntax to redirect the stderr to null
A space was added by mistake and unfortunately fips-mode-setup refuses
an extra parameter,
# fips-mode-setup --is-enabled 2 > /dev/null
# echo $?
2
# fips-mode-setup --is-enabled 2
Check, enable, or disable the system FIPS mode.
usage: /usr/bin/fips-mode-setup --enable|--disable [--no-bootcfg]
usage: /usr/bin/fips-mode-setup --check
usage: /usr/bin/fips-mode-setup --is-enabled
So in this case mkdumprd can never detect if FIPS is enabled. Fix this
mistake.
Fixes: 443a43e0 ("mkdumprd: call dracut with --add-device to install the drivers needed by /boot partition automatically for FIPS")
Signed-off-by: Coiby Xu <coxu@redhat.com>
Reviewed-by: Tao Liu <ltao@redhat.com>
Signed-off-by: Coiby Xu <coxu@redhat.com>
Resolves: bz2185794
Upstream: Fedora
Conflicts: small change
commit 443a43e075
Author: Coiby Xu <coxu@redhat.com>
Date: Wed May 24 12:01:45 2023 +0800
mkdumprd: call dracut with --add-device to install the drivers needed by /boot partition automatically for FIPS
Currently, kdump doesn't work on many FIPS-enabled systems including
Azure, ESXI, Hyper, POWER and etc. When FIPS is enabled, it needs to
access /boot//.vmlinuz-xxx.hmac to verify the integrity of the kernel.
However, on those systems, /boot fails to be mounted due to a lack of
fs and block device drivers and the system just halted after failing to
verify the integrity of the kernel. For example, on Hyper-V, sd_mod, sg,
scsi_transport_fc, hv_storvsc and hv_vmbus need to be installed in order
for /boot to be mounted.
mkdumprd calls dracut with the --no-hostonly-default-device. Following
the documentation (man dracut),
--no-hostonly-default-device
Do not generate implicit host devices like root, swap, fstab, etc.
Use "--mount" or "--add-device" to explicitly add devices as needed
this patch uses "--add-device" to explicitly add the device of /boot.
Note there is already an attempt to fix it in dracut's 01fips module
i.e. via the commit 83651776 ("fips: ensure fs module for /boot is
installed"). Unfortunately it only installs the file system driver e.g.
xfs.
Reviewed-by: Philipp Rudo <prudo@redhat.com>
Signed-off-by: Coiby Xu <coxu@redhat.com>
Signed-off-by: Coiby Xu <coxu@redhat.com>
Resolves: bz2229287
Upstream: RHEL-ONLY
Conflict: None
There is a use case where a separate NIC is used to handle DNS queries.
In this case this NIC should be added to the allowlist as well.
Fixes: e67e4bd ("Reduce kdump memory consumption by only installing needed NIC drivers")
Signed-off-by: Coiby Xu <coxu@redhat.com>
Resolves: bz2219378
Upstream: RHEL-ONLY
Conflict: None
The commit 68d02c2a casused a regression that mount sysroot will fail in
the kdump kernel. Since that commit only fixed a print issue, revert it.
This issue is related to a systemd bug which has been fixed by this PR:
https://github.com/systemd/systemd/pull/23893, before these patches are
backported to RHEL-8, we should keep the nofail and x-systemd.before
options.
Fixes: 68d02c2a
(Revert "Append both nofail and x-systemd.before to kdump mount target")
Signed-off-by: Lichen Liu <lichliu@redhat.com>
Resolves: bz1958587
Upstream: Fedora
Conflict: None
commit 3b22cce1cb
Author: Coiby Xu <coxu@redhat.com>
Date: Wed Dec 14 10:12:17 2022 +0800
dracut-module-setup.sh: skip installing driver for the loopback
interface
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2151500
Currently, kdump initrd fails to be built when dumping vmcore to
localhost via ssh or nfs,
kdumpctl[3331]: Cannot get driver information: Operation not supported
kdumpctl[1991]: dracut: Failed to get the driver of lo
dracut[2020]: Failed to get the driver of lo
kdumpctl[1775]: kdump: mkdumprd: failed to make kdump initrd
kdumpctl[1775]: kdump: Starting kdump: [FAILED]
systemd[1]: kdump.service: Main process exited, code=exited, status=1/FAILURE
systemd[1]: kdump.service: Failed with result 'exit-code'.
systemd[1]: Failed to start Crash recovery kernel arming.
systemd[1]: kdump.service: Consumed 1.710s CPU time.
This is because the loopback interface is used for transferring vmcore and
ethtool can't get the driver of the loopback interface. In fact, once
COFNIG_NET is enabled, the loopback device is enabled and there is no driver
for the loopback device. So skip installing driver for the loopback device.
The loopback interface is implemented in linux/drivers/net/loopback.c
and always has the name "lo". So we can safely tell if a network
interface is the loopback interface by its name.
Fixes: a65dde2d ("Reduce kdump memory consumption by only installing needed NIC drivers")
Reported-by: Martin Pitt <mpitt@redhat.com>
Reported-by: Rich Megginson <rmeggins@redhat.com>
Reviewed-by: Lichen Liu <lichliu@redhat.com>
Reviewed-by: Philipp Rudo <prudo@redhat.com>
Signed-off-by: Coiby Xu <coxu@redhat.com>
Signed-off-by: Coiby Xu <coxu@redhat.com>
Resolves: bz1958587
Upstream: Fedora
Conflict: 1. RHEL8's mkdumprd has different dracut_args from upstream's.
2. RHEL8's kdump_install_net is different from upstream's and
we should install needed NIC drivers in the end.
commit a65dde2d10
Author: Coiby Xu <coxu@redhat.com>
Date: Thu May 19 11:39:25 2022 +0800
Reduce kdump memory consumption by only installing needed NIC drivers
Even after having asked NM to stop managing a unneeded NIC, a NIC driver
may still waste memory. For example, mlx5_core uses a substantial amount
of memory during driver initialization,
======== Report format module_summary: ========
Module mlx5_core using 350.2MB (89650 pages), peak allocation 367.4MB (94056 pages)
Module squashfs using 13.1MB (3360 pages), peak allocation 13.1MB (3360 pages)
Module overlay using 2.1MB (550 pages), peak allocation 2.2MB (555 pages)
Module dns_resolver using 0.9MB (219 pages), peak allocation 5.2MB (1338 pages)
Module mlxfw using 0.7MB (172 pages), peak allocation 5.3MB (1349 pages)
======== Report format module_summary END ========
======== Report format module_top: ========
Top stack usage of module mlx5_core:
(null) Pages: 89650 (peak: 94056)
ret_from_fork (0xffffda088b4165f8) Pages: 60007 (peak: 60007)
kthread (0xffffda088b4bd7e4) Pages: 60007 (peak: 60007)
worker_thread (0xffffda088b4b48d0) Pages: 60007 (peak: 60007)
process_one_work (0xffffda088b4b3f40) Pages: 60007 (peak: 60007)
work_for_cpu_fn (0xffffda088b4aef00) Pages: 53906 (peak: 53906)
local_pci_probe (0xffffda088b9e1e44) Pages: 53906 (peak: 53906)
probe_one mlx5_core (0xffffda084f899cc8) Pages: 53518 (peak: 53518)
mlx5_init_one mlx5_core (0xffffda084f8994ac) Pages: 49756 (peak: 49756)
mlx5_function_setup.constprop.0 mlx5_core (0xffffda084f899100) Pages: 44434 (eak: 44434)
mlx5_satisfy_startup_pages mlx5_core (0xffffda084f8a4f24) Pages: 44434 (peak: 44434)
mlx5_function_setup.constprop.0 mlx5_core (0xffffda084f899078) Pages: 5285 (peak: 5285)
mlx5_cmd_init mlx5_core (0xffffda084f89e414) Pages: 4818 (peak: 4818)
mlx5_alloc_cmd_msg mlx5_core (0xffffda084f89aaa0) Pages: 4403 (peak: 4403)
This memory consumption is completely unnecessary when kdump doesn't need
this NIC. Only install needed NIC drivers to prevent this kind of waste.
Note
1. this patch depends on [1] to ask dracut to not install NIC drivers.
2. "ethtool -i" somehow fails to get the vlan driver
3. team.ko doesn't depend on the team mode drivers so we need to install
the team mode drivers manually.
[1] https://github.com/dracutdevs/dracut/pull/1789
Signed-off-by: Coiby Xu <coxu@redhat.com>
Reviewed-by: Thomas Haller <thaller@redhat.com>
Reviewed-by: Philipp Rudo <prudo@redhat.com>
Signed-off-by: Coiby Xu <coxu@redhat.com>
Resolves: bz2164929
Upstream: Fedora
Conflict: Missing upstream commit
94a7b4("Always drop nofail or nobootwait options")
commit 0843c70672
Author: Kairui Song <kasong@redhat.com>
Date: Wed Jan 13 17:12:18 2021 +0800
Revert "Append both nofail and x-systemd.before to kdump mount target"
That commit is trying to workaround a kernel VFS bug. Now,
the VFS issue should have been fixed in all recent releases, so
remove this workaround.
This reverts commit 539bff4083.
Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Pingfan Liu <piliu@redhat.com>
Signed-off-by: Lichen Liu <lichliu@redhat.com>
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1964822
Upstream: RHEL-only
Currently, vmcore dumping to remote fs gives a warning "eth0: Failed to
rename network interface 3 from 'eth0' to 'kdump-eth0': File exists" on
Azure Hyper-V VM with accelerated networking because it uses a physical
NIC for accelerated networking [1] and the backing physical NIC has the
same MAC address as the virtual NIC. In the kdump initrd, an udev rule
will try renaming NICs with the given MAC address and fails as expected
since there are two NICs having the same MAC address. This udev rule is
created automatically when specifying the dracut cmdline
"ifname=<interface>:<MAC>". For the case of Azure Hyper-V VM with
accelerated networking, only the virtual network interface need to be
renamed. So create an udev rule manually.
[1] https://learn.microsoft.com/en-us/azure/virtual-network/accelerated-networking-overview
Signed-off-by: Coiby Xu <coxu@redhat.com>
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1964822
Upstream: RHEL-only
Currently, vmcore dumping to remote fs gives a warning "eth0: Failed to
rename network interface 3 from 'eth0' to 'kdump-eth0': File exists" on
Azure Hyper-V VM with accelerated networking because it uses a physical
NIC for accelerated networking [1] and the backing physical NIC has the
same MAC as the virtual NIC. There is no need to rename a Hypver-V
interface in this case which also leads the aforementioned warning.
[1] https://learn.microsoft.com/en-us/azure/virtual-network/accelerated-networking-overview
Signed-off-by: Coiby Xu <coxu@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2184284
Resolves: rhbz#2184284
Upstream: makedumpfile
commit 58553ad03187f0cf208d6c4a0dc026c6338e5edd
Author: Daisuke Hatayama (Fujitsu) <d.hatayama@fujitsu.com>
Date: Wed Mar 29 12:44:10 2023 +0000
[PATCH] sadump: fix failure of reading memory when 5-level paging is enabled
makedumpfile fails as follows for memory dumps collected by sadump
when 5-level paging is enabled on the corresponding systems:
# makedumpfile -l -d 31 -x ./vmlinux ./dump.sadump dump.sadump-ld31
__vtop4_x86_64: Can't get a valid pgd.
...snip...
__vtop4_x86_64: Can't get a valid pgd.
calc_kaslr_offset: failed to calculate kaslr_offset and phys_base; default to 0
__vtop4_x86_64: Can't get a valid pgd.
readmem: Can't convert a virtual address(ffffffff82fce960) to physical address.
readmem: type_addr: 0, addr:ffffffff82fce960, size:1024
cpu_online_mask_init: Can't read cpu_online_mask memory.
makedumpfile Failed.
This is because 5-level paging support has not been done yet for
sadump; the work of the 5-level paging support was done by the commit
30a3214a7193e94c551c0cebda5918a72a35c589 (PATCH 4/4 arch/x86_64: Add
5-level paging support) but that was focused on the core part only.
Having said that, most of things has already been finished in the
commit. What needs to be newly added for sadump is just how to check
if 5-level paging is enabled for a given memory dump.
For that purpose, let's refer to CR4.LA57, bit 12 of CR4, representing
whether 5-level paging is enabled or not. We can do this because
memory dumps collected by sadump have SMRAM as note information and
they include CR4 together with the other control registers.
Signed-off-by: HATAYAMA Daisuke <d.hatayama@fujitsu.com>
Signed-off-by: Pingfan Liu <piliu@redhat.com>
Resolves: bz2149846
Upstream: Fedora
Conflict: Move to kdump.sysconfig.s390 due to missing
677da8a ("sysconfig: use a simple generator script to maintain")
Author: Philipp Rudo <prudo@redhat.com>
Date: Tue Mar 7 14:45:35 2023 +0100
sysconfig: add zfcp.allow_lun_scan to KDUMP_COMMANDLINE_REMOVE on s390
Probing unnecessary I/O devices wastes memory and in extreme cases can
cause the crashkernel to run OOM. That's why the s390-tools maintain
their own module, 95zdev-kdump [1], that disables auto LUN scanning and
only configures zfcp devices that can be used as dump target. So remove
zfcp.allow_lun_scan from the kernel command line to prevent that we
accidentally overwrite the default set by the module.
[1] https://github.com/ibm-s390-linux/s390-tools/blob/master/zdev/dracut/95zdev-kdump/module-setup.sh
Signed-off-by: Philipp Rudo <prudo@redhat.com>
Reviewed-by: Coiby Xu <coxu@redhat.com>
Signed-off-by: Philipp Rudo <prudo@redhat.com>
Acked-by: Pingfan Liu <piliu@redhat.com>