kexec-tools

Author	SHA1	Message	Date
Coiby Xu	568623e69a	Address the cases where a NIC has a different name in kdump kernel A NIC may get a different name in the kdump kernel from 1st kernel in cases like, - kernel assigned network interface names are not persistent e.g. [1] - there is an udev rule to rename the NIC in the 1st kernel but the kdump initrd may not have that rule e.g. [2] If NM tries to match a NIC with a connection profile based on NIC name i.e. connection.interface-name, it will fail the above bases. A simple solution is to ask NM to match a connection profile by MAC address. Note we don't need to do this for user-created NICs like vlan, bridge and bond. An remaining issue is passing the name of a NIC via the kdumpnic dracut command line parameter which requires passing ifname=<interface>:<MAC> to have fixed NIC name. But we can simply drop this requirement. kdumpnic is needed because kdump needs to get the IP by NIC name and use the IP to created a dumping folder named "{IP}-{DATE}". We can simply pass the IP to the kdump kernel directly via a new dracut command line parameter kdumpip instead. In addition to the benefit of simplifying the code, there are other three benefits brought by this approach, - make use of whatever network to transfer the vmcore. Because as long as we have the network to we don't care which NIC is active. - if obtained IP in the kdump kernel is different from the one in the 1st kernel. "{IP}-{DATE}" would better tell where the dumped vmcore comes from. - without passing ifname=<interface>:<MAC> to kdump initrd, the issue of there are two interfaces with the same MAC address for Azure Hyper-V NIC SR-IOV [3] is resolved automatically. [1] https://bugzilla.redhat.com/show_bug.cgi?id=1121778 [2] https://bugzilla.redhat.com/show_bug.cgi?id=810107 [3] https://bugzilla.redhat.com/show_bug.cgi?id=1962421 Signed-off-by: Coiby Xu <coxu@redhat.com> Reviewed-by: Thomas Haller <thaller@redhat.com> Reviewed-by: Philipp Rudo <prudo@redhat.com>	2022-11-23 06:39:27 +08:00
Coiby Xu	a65dde2d10	Reduce kdump memory consumption by only installing needed NIC drivers Even after having asked NM to stop managing a unneeded NIC, a NIC driver may still waste memory. For example, mlx5_core uses a substantial amount of memory during driver initialization, ======== Report format module_summary: ======== Module mlx5_core using 350.2MB (89650 pages), peak allocation 367.4MB (94056 pages) Module squashfs using 13.1MB (3360 pages), peak allocation 13.1MB (3360 pages) Module overlay using 2.1MB (550 pages), peak allocation 2.2MB (555 pages) Module dns_resolver using 0.9MB (219 pages), peak allocation 5.2MB (1338 pages) Module mlxfw using 0.7MB (172 pages), peak allocation 5.3MB (1349 pages) ======== Report format module_summary END ======== ======== Report format module_top: ======== Top stack usage of module mlx5_core: (null) Pages: 89650 (peak: 94056) ret_from_fork (0xffffda088b4165f8) Pages: 60007 (peak: 60007) kthread (0xffffda088b4bd7e4) Pages: 60007 (peak: 60007) worker_thread (0xffffda088b4b48d0) Pages: 60007 (peak: 60007) process_one_work (0xffffda088b4b3f40) Pages: 60007 (peak: 60007) work_for_cpu_fn (0xffffda088b4aef00) Pages: 53906 (peak: 53906) local_pci_probe (0xffffda088b9e1e44) Pages: 53906 (peak: 53906) probe_one mlx5_core (0xffffda084f899cc8) Pages: 53518 (peak: 53518) mlx5_init_one mlx5_core (0xffffda084f8994ac) Pages: 49756 (peak: 49756) mlx5_function_setup.constprop.0 mlx5_core (0xffffda084f899100) Pages: 44434 (eak: 44434) mlx5_satisfy_startup_pages mlx5_core (0xffffda084f8a4f24) Pages: 44434 (peak: 44434) mlx5_function_setup.constprop.0 mlx5_core (0xffffda084f899078) Pages: 5285 (peak: 5285) mlx5_cmd_init mlx5_core (0xffffda084f89e414) Pages: 4818 (peak: 4818) mlx5_alloc_cmd_msg mlx5_core (0xffffda084f89aaa0) Pages: 4403 (peak: 4403) This memory consumption is completely unnecessary when kdump doesn't need this NIC. Only install needed NIC drivers to prevent this kind of waste. Note 1. this patch depends on [1] to ask dracut to not install NIC drivers. 2. "ethtool -i" somehow fails to get the vlan driver 3. team.ko doesn't depend on the team mode drivers so we need to install the team mode drivers manually. [1] https://github.com/dracutdevs/dracut/pull/1789 Signed-off-by: Coiby Xu <coxu@redhat.com> Reviewed-by: Thomas Haller <thaller@redhat.com> Reviewed-by: Philipp Rudo <prudo@redhat.com>	2022-11-23 06:39:27 +08:00
Coiby Xu	586fe410aa	Reduce kdump memory consumption by not letting NetworkManager manage unneeded network interfaces By default, NetworkManger will manage all the network interfaces and try to set interface IFF_UP to get carrier state. Regardless of whether the network interface is connected to a cable or not, the NIC driver will allocate memory resources for e.g. ring buffers when setting IFF_UP. This could be a waste of memory. For example it's found i40e consumes ~15GB on a power machine. On this machine, i40e manages four interfaces but only one interface is valid. This patch use "managed=false" to tell NetworkManager to not manage network interfaces that are not needed by kdump by putting 10-kdump-netif_allowlist.conf in the initramfs. Signed-off-by: Coiby Xu <coxu@redhat.com> Reviewed-by: Thomas Haller <thaller@redhat.com> Reviewed-by: Philipp Rudo <prudo@redhat.com>	2022-11-23 06:39:27 +08:00
Coiby Xu	63c3805c48	Set up kdump network by directly copying NM connection profile to initrd This patch setup kdump network by directly copying NM connection profile(s) for different network setup including bond, bridge, vlan, and team. For vlan network, rename phydev to parent_netif to improve code readability. With the new approach, the related code to build up dracut cmdline parameter such rd.route, ip and etc can be cleaned up. And there is no need to setup dns when copying .nmconnection directly to initrd either. Note the bootdev dracut command line parameter is only used by dracut's 35network-legacy and network-manager doesn't use it, remove related code as well. Note 1. kdump_setup_vlan/bond/... are no longer called in subshells in order to modify global variables like unique_netifs 2. The original kdump_install_net is renamed to better reflect its current function Signed-off-by: Coiby Xu <coxu@redhat.com> Reviewed-by: Thomas Haller <thaller@redhat.com> Reviewed-by: Philipp Rudo <prudo@redhat.com>	2022-11-23 06:39:27 +08:00
Coiby Xu	62355ebe5a	Stop dracut 35network-manager from running nm-initrd-generator kexec-tools depends on dracut's 35network-manager module which will call nm-initrd-generator. We don't want nm-initrd-generator to generate connection profiles since we will copy them from 1st kernel to kdump kernel initramfs. NetworkManager >= 1.35.2 won't generate connection profiles if there's a connection dir with rd.neednet. For Fedora/RHEL, this connection dir is /etc/NetworkManager/system-connections. For the details, please refer to the NetworkManager commit 79885656d3 ("initrd: don't add a connection if there's a connection dir with rd.neednet") [1]. Before the release of NetworkManager >= 1.35.2, we need to mask /usr/libexec/nm-initrd-generator. [1] https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1010 Signed-off-by: Coiby Xu <coxu@redhat.com> Reviewed-by: Thomas Haller <thaller@redhat.com> Reviewed-by: Philipp Rudo <prudo@redhat.com>	2022-11-23 06:39:27 +08:00
Coiby Xu	6b586a9036	Apply the timeout configuration of nm-initrd-generator nm-wait-online-initrd.service installed by dracut's 35-networkmanager module calls nm-online with "-s" which means it returns immediately when NetworkManager logs "startup complete" after certain timeouts are reached. "startup complete" doesn't necessarily network connectivity has been established. nm-initrd-generator has a set of timeouts that in most of cases when applied, "startup-complete" means network connectivity has been established. So apply it when setting up kdump network. Suggested-by: Thomas Haller <thaller@redhat.com> Signed-off-by: Coiby Xu <coxu@redhat.com> Reviewed-by: Thomas Haller <thaller@redhat.com> Reviewed-by: Philipp Rudo <prudo@redhat.com>	2022-11-23 06:39:27 +08:00
Coiby Xu	9dfcacf72d	Determine whether IPv4 or IPv6 is needed According to `man nm-online`, "By default, connections have the ipv4.may-fail and ipv6.may-fail properties set to yes; this means that NetworkManager waits for one of the two address families to complete configuration before considering the connection activated. If you need a specific address family configured before network-online.target is reached, set the corresponding may-fail property to no." If a NIC has an IPv4 or IPv6 address, set the corresponding may-fail property to no. Otherwise, dumping vmcore over IPv6 could fail because only IPv4 network is ready or vice versa. Also disable IPv6 if only IPv4 is used and vice versa. Signed-off-by: Coiby Xu <coxu@redhat.com> Reviewed-by: Thomas Haller <thaller@redhat.com> Reviewed-by: Philipp Rudo <prudo@redhat.com>	2022-11-23 06:39:27 +08:00
Coiby Xu	d25b1ee31c	Add functions to copy NetworkManage connection profiles to the initramfs Each network interface is manged by a NM connection. Given a list of network interface names, copy the NetworkManager (NM) connection profiles i.e. .nmconnection files to the kdump initramfs. Before copying a connection file, clone it to automatically convert a legacy ifcfg-*[1] file to a .nmconnection file and for the convenience of editing the connection profile. [1] https://fedoraproject.org/wiki/Changes/NetworkManager_keyfile_instead_of_ifcfg_rh Signed-off-by: Coiby Xu <coxu@redhat.com> Reviewed-by: Thomas Haller <thaller@redhat.com> Reviewed-by: Philipp Rudo <prudo@redhat.com>	2022-11-23 06:39:27 +08:00
Coiby Xu	b7e58619d1	Fix error for vlan over team network interface `6f9235887f` ("module-setup.sh: enable vlan on team interface") skips establishing teaming network by mistake. Although it could use one of slave netifs to establish connection to transfer vmcore to remote fs, it breaks the implicit assumption of creating an identical network topology to the 1st kernel. Fixes: `6f92358` ("module-setup.sh: enable vlan on team interface") Signed-off-by: Coiby Xu <coxu@redhat.com> Reviewed-by: Thomas Haller <thaller@redhat.com> Reviewed-by: Philipp Rudo <prudo@redhat.com>	2022-11-23 06:39:27 +08:00
Coiby Xu	a3da46d6c4	Skip reset_crashkernel_after_update during package install Currently, kexec-tools tries to reset crashkernel when using anaconda to install the system. But grubby isn't ready and complains that, 10:34:17,014 INF packaging: Configuring (running scriptlet for): kexec-tools-2.0.23-9.el9.x86_64 1646034766 53ff7158f8808774f4e3c3c87e504aa7a6d677b537754dac86c87925c8f0a397 10:34:17,205 INF dnf.rpm: grep: /boot/grub2/grubenv: No such file or directory grep: /boot/grub2/grubenv: No such file or directory grep: /boot/grub2/grubenv: No such file or directory kexec-tools is supposed to update the kernel crashkernel parameter after package upgrade. Unfortunately, the posttrans RPM scriptlet doesn't distinguish between package install and upgrade. This patch skips reset_crashkernel_after_update as similar to `e218128e` ("Only try to reset crashkernel for osbuild during package install"). Reported-by: Jan Stodola <jstodola@redhat.com> Signed-off-by: Coiby Xu <coxu@redhat.com>	2022-11-18 17:22:39 +08:00
Tao Liu	3ae8cf8876	Don't check fs modified when dump target is lvm2 thinp When the dump target is lvm2 thinp, if we didn't mount the dump target first, get_fs_type_from_target will get empty output: Before mount: $ get_fs_type_from_target /dev/vg00/thinlv After mount: $ mount /dev/vg00/thinlv /mnt $ get_fs_type_from_target /dev/vg00/thinlv ext4 As a result, kdumpctl start will fail with: $ kdumpctl start kdump: Dump target is invalid kdump: Starting kdump: [FAILED] This patch fix the issue by bypassing check_fs_modified when the dump target is lvm2 thinp. Signed-off-by: Tao Liu <ltao@redhat.com> Reviewed-by: Coiby Xu <prudo@redhat.com>	2022-11-11 10:29:02 +08:00
Coiby Xu	cea74a7b3e	tests: use .nmconnection to set up test network F36 has dropped support on ifcfg and as a result current network tests fails. Use .nmconnection to set up test network instead. Signed-off-by: Coiby Xu <coxu@redhat.com> Reviewed-by: Philipp Rudo <prudo@redhat.com>	2022-11-09 14:07:29 +08:00
Hari Bathini	f33c99e347	fadump: preserve file modification time to help with hardlinking With commit `fa9201b2` ("fadump: isolate fadump initramfs image within the default one"), initramfs image gets to hold two images, one for production kernel boot purpose and the other for capture kernel boot. Most files are common among the two images. Retain file modification time to replace duplicate files with hardlinks and save space. Also, avoid unnecessarily compressing fadump image that is decompressed immediately anyway. Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> Reviewed-by: Philipp Rudo <prudo@redhat.com>	2022-11-09 14:07:29 +08:00
Hari Bathini	55b0dd03b3	fadump: do not use squash to reduce image size With commit `fa9201b2` ("fadump: isolate fadump initramfs image within the default one"), initramfs image gets to hold two squash images, one for production kernel boot purpose and the other for capture kernel boot. Having separate images improved reliability for both production kernel and capture kernel boot scenarios, but the size of initramfs image became considerably larger. Instead of having squash images, compressing $initdir without using squash images reduced the size of initramfs image for fadump case by around 30%. So, avoid using squash for fadump case. Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> Reviewed-by: Philipp Rudo <prudo@redhat.com>	2022-11-09 14:07:29 +08:00
Tao Liu	6d2c22bb81	selftest: Add lvm2 thin provision for kdump test Signed-off-by: Tao Liu <ltao@redhat.com> Reviewed-by: Philipp Rudo <prudo@redhat.com>	2022-11-01 12:20:34 +08:00
Tao Liu	68978f9241	selftest: Only iterate the .sh files for test execution Previously, all files within $TESTCASEDIR/$test_case are regarded as shell script files for testing. However there might be config files under the directory. So let's only iterate the .sh files. Signed-off-by: Tao Liu <ltao@redhat.com> Reviewed-by: Philipp Rudo <prudo@redhat.com>	2022-11-01 12:20:34 +08:00
Tao Liu	f11721077a	Add dependency of dracut lvmthinpool-monitor module The 80lvmthinpool-monitor module is needed for monitor and autoextend the size of thin pool in 2nd kernel. The module was integrated in dracut version 057. If lvmthinpool-monitor module is not found, we will print a warning. Because we don't want to block the kdump process when the thin pool capacity is enough and no monitor-and-autoextend actually needed. Signed-off-by: Tao Liu <ltao@redhat.com> Reviewed-by: Philipp Rudo <prudo@redhat.com>	2022-11-01 12:20:34 +08:00
Tao Liu	10ca970940	lvm.conf should be check modified if lvm2 thinp enabled lvm2 relies on /etc/lvm/lvm.conf to determine its behaviour. The important configs such as thin_pool_autoextend_threshold and thin_pool_autoextend_percent will be used during kdump in 2nd kernel. So if the file is modified, the initramfs should be rebuild to include the latest. Signed-off-by: Tao Liu <ltao@redhat.com> Reviewed-by: Philipp Rudo <prudo@redhat.com>	2022-11-01 12:20:34 +08:00
Tao Liu	0a5b71d123	Add lvm2 thin provision dump target checker We need to check if a directory or a device is lvm2 thinp target. First, we use get_block_dump_target() to convert dump path into block device, then we check if the device is lvm2 thinp target by cmd lvs. is_lvm2_thinp_device is now located in kdump-lib-initramfs.sh, for it will be used in 2nd kernel. is_lvm2_thinp_dump_target is located in kdump-lib.sh, for it is only used in 1st kernel, and it has dependencies which exist in kdump-lib.sh. Signed-off-by: Tao Liu <ltao@redhat.com> Reviewed-by: Philipp Rudo <prudo@redhat.com>	2022-11-01 12:20:34 +08:00
Coiby Xu	995ee24903	Release 2.0.25-2 Signed-off-by: Coiby Xu <coxu@redhat.com>	2022-10-27 16:00:11 +08:00
Coiby Xu	fdad7d9869	Skip reading /etc/defaut/grub for s390x Currently, updating kexec-tools on s390x gives the warning sed: can't read /etc/default/grub: No such file or directory This happens because s390x doesn't use GRUB and /etc/default/grub doesn't exist. We need to skip both reading and writing to /etc/default/grub. Reported-by: Jie Li <jieli@redhat.com> Reviewed-by: Philipp Rudo <prudo@redhat.com> Signed-off-by: Coiby Xu <coxu@redhat.com>	2022-10-27 14:42:27 +08:00
Coiby Xu	6ce4b85bb3	Include the memory overhead cost of cryptsetup when estimating the memory requirement for LUKS-encrypted target Currently, "kdumpctl estimate" neglects the memory overhead cost of cryptsetup itself. Unfortunately, there is no golden formula to calculate the overhead cost [1]. So estimate the overhead cost as 50M for aarch64 and 20M for other architectures based on the following empirical data, \| Overhead (M) \| OS \| arch \| \| ------------ \| ----------------------------------------- \| ------- \| \| 14.1 \| RHEL-9.2.0-20220829.d.1 \| ppc64le \| \| 14 \| Fedora-37-20220830.n.0 Everything ppc64le \| ppc64le \| \| 17 \| Fedora 36 \| ppc64le \| \| 8.8 \| Fedora 35 \| s390x \| \| 10.1 \| Fedora-Rawhide-20220829.n.0, fc38 \| s390x \| \| 42 \| Fedora-Rawhide-20220829.n.0, fc38 \| arch64 \| \| 40 \| F35 \| arch64 \| \| 42 \| F36 \| arch64 \| \| 42 \| Fedora-Rawhide-20220901.n.0 \| arch64 \| \| 10 \| F35 \| x86_64 \| \| 10 \| Fedora-Rawhide-20220901.n.0 \| x86_64 \| \| 11 \| Fedora-Rawhide-20220901.n.0 \| x86_64 \| [1] https://lore.kernel.org/cryptsetup/20220616044339.376qlipk5h2omhx2@Rk/T/#u Fixes: `e9e6a2c` ("kdumpctl: Add kdumpctl estimate") Signed-off-by: Coiby Xu <coxu@redhat.com> Reviewed-by: Philipp Rudo <prudo@redhat.com>	2022-10-26 15:38:21 +08:00
Coiby Xu	50a8461fc7	Choosing the most memory-consuming key slot when estimating the memory requirement for LUKS-encrypted target When there are multiple key slots, "kdumpctl estimate" uses the least memory-consuming key slot. For example, when there are two memory slots created with --pbkdf-memory=1048576 (1G) and --pbkdf-memory=524288 (512M), "kdumpctl estimate" thinks the extra memory requirement is only 512M. This will of course lead to OOM if the user uses the more memory-consuming key slot. Fix it by sorting in reverse order. Fixes: `e9e6a2c` ("kdumpctl: Add kdumpctl estimate") Signed-off-by: Coiby Xu <coxu@redhat.com> Reviewed-by: Lichen Liu <lichliu@redhat.com> Signed-off-by: Coiby Xu <coxu@redhat.com>	2022-10-26 15:34:08 +08:00
Coiby Xu	15122b3f98	Fix grep warnings "grep: warning: stray \ before -" Latest grep (3.8) warnings about unneeded backslashes when building kdump initrd [1], kdump: Rebuilding /boot/initramfs-6.0.0-0.rc5.a335366bad13.40.test.fc38.aarch64kdump.img grep: warning: stray \ before - grep: warning: stray \ before - grep: warning: stray \ before - grep: warning: stray \ before - grep: warning: stray \ before - Some warnings can be avoided by using "sed -n" to remove grep and the others can use the -- argument. [1] https://s3.us-east-1.amazonaws.com/arr-cki-prod-datawarehouse-public/datawarehouse-public/2022/09/17/redhat:643020269/build_aarch64_redhat:643020269_aarch64/tests/4/results_0001/job.01/recipes/12617739/tasks/5/logs/taskout.log Reported-by: Baoquan He <bhe@redhat.com> Signed-off-by: Coiby Xu <coxu@redhat.com> Suggested-by: Philipp Rudo <prudo@redhat.com> Reviewed-by: Philipp Rudo <prudo@redhat.com>	2022-10-26 14:16:04 +08:00
Coiby Xu	e218128e28	Only try to reset crashkernel for osbuild during package install Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2060319 Currently, kexec-tools tries to reset crashkernel when using anaconda to install the system. But grubby isn't ready and complains that, 10:33:17,631 INF packaging: Configuring (running scriptlet for): kernel-core-5.14.0-70.el9.x86_64 1645746534 03dcd32db234b72440ee6764d59b32347c5f0cd98ac3fb55beb47214a76f33b4 10:34:16,696 INF dnf.rpm: grep: /boot/grub2/grubenv: No such file or directory grep: /boot/grub2/grubenv: No such file or directory We only need to try resetting crashkernel for osbuild. Skip it for other cases. To tell if it's package install instead of package upgrade, make use of %pre to write a file /tmp/kexec-tools-install when "$1 == 1" [1]. [1] https://docs.fedoraproject.org/en-US/packaging-guidelines/Scriptlets/#_syntax Reported-by: Jan Stodola <jstodola@redhat.com> Signed-off-by: Coiby Xu <coxu@redhat.com> Reviewed-by: Lichen Liu <lichenliu@redhat.com>	2022-10-20 13:54:10 +08:00
Coiby Xu	a7ead187a4	Prefix reset-crashkernel-{for-installed_kernel,after-update} with underscore Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2048690 To indicate they are for internal use only, underscore them. Reported-by: rcheerla@redhat.com Signed-off-by: Coiby Xu <coxu@redhat.com> Reviewed-by: Lichen Liu <lichenliu@redhat.com>	2022-10-20 13:54:10 +08:00
Tao Liu	fc1c79ffd2	Seperate dracut and dracut-squash compressor for zstd Previously kexec-tools will pass "--compress zstd" to dracut. It will make dracut to decide whether: a) call mksquashfs to make a zstd format squash-root.img, b) call cmd zstd to make a initramfs. Since dracut(>= 057) has decoupled the compressor for dracut and dracut-squash, So in this patch, we will pass the compressor seperately. Note: The is_squash_available && !dracut_has_option --squash-compressor && !is_zsdt_command_available case is left unprocessed on purpose. Actually, the situation when we want to call zstd compression is: 1) If squash function OK, we want dracut to invoke mksquashfs to make a zstd format squash-root.img within initramfs. 2) If squash function is not OK, and cmd zstd presents, we want dracut to invoke cmd zstd to make a zstd format initramfs. is_zstd_command_available check can handle case 2 completely. However, for the is_squash_available check, it cannot handle case 1 completely. It only checks if the kernel supports squashfs, it doesn't check whether the squash module has been added by dracut when making initramfs. In fact, in kexec-tools we are unable to do the check, there are multiple ways to forbit dracut to load a module, such as "dracut -o module" and "omit_dracutmodules in dracut.conf". When squash dracut module is omitted, is_squash_available check will still pass, so "--compress zstd" will be appended to dracut cmdline, and it will call cmd zstd to do the compression. However cmd zstd may not exist, so it fails. The previous "--compress zstd" is ambiguous, after the intro of "--squash-compressor", "--squash-compressor" only effect for mksquashfs and "--compress" only effect for specific cmd. So for the is_squash_available && !dracut_has_option --squash-compressor && !is_zsdt_command_available case, we just leave it to be handled the default way. Reviewed-by: Philipp Rudo <prudo@redhat.com> Signed-off-by: Tao Liu <ltao@redhat.com>	2022-10-20 12:26:37 +08:00
Tao Liu	bea6143178	Fix the sync issue for dump_fs Previously the sync for dump_fs is problematic, it always return success according to man 2 sync. So it cannot detect the error of the dump target is full and not all of vmcore data been written back the disk, which will leave the vmcore imcomplete and report misleading log as "saving vmcore complete". In this patch, we will use "sync -f vmcore" instead, which will return error if syncfs on the dump target fails. In this way, vmcore sync related failures, such as autoextend of lvm2 thinpool fails, can be detected and handled properly. Signed-off-by: Tao Liu <ltao@redhat.com> Reviewed-by: Coiby Xu <coxu@redhat.com>	2022-10-08 15:46:11 +08:00
Tao Liu	c743881ae6	virtiofs support for kexec-tools This patch add virtiofs support for kexec-tools by introducing a new option for /etc/kdump.conf: virtiofs myfs Where myfs is a variable tag name specified in qemu cmdline "-device vhost-user-fs-pci,tag=myfs". The patch covers the following cases: 1) Dumping VM's vmcore to a virtiofs shared directory; 2) When the VM's rootfs is a virtiofs shared directory and dumping the VM's vmcore to its subdirectory, such as /var/crash; 3) The combination of case 1 & 2: The VM's rootfs is a virtiofs shared directory and dumping the VM's vmcore to another virtiofs shared directory. Case 2 & 3 need dracut >= 057, otherwise VM cannot boot from virtiofs shared rootfs. But it is not the issue of kexec-tools. Reviewed-by: Philipp Rudo <prudo@redhat.com> Signed-off-by: Tao Liu <ltao@redhat.com>	2022-09-29 12:22:49 +08:00
Hari Bathini	d905d49c08	fadump: avoid non-debug kernel use for fadump case Since commit `c5bdd2d8f1` ("kdump-lib: use non-debug kernels first"), non-debug kernel is preferred, over the debug variant, as dump capture kernel to reduce memory consumption. This works alright for kdump as the capture kernel is loaded using kexec. In case of fadump, regular boot loader is used to load the capture kernel. So, the default kernel needs to be used as capture kernel as well. But with commit `c5bdd2d8f1`, initrd of a different kernel is made dump capture capable, breaking fadump's ability to capture dump properly. Fix this by sticking with the debug variant in case of fadump. Fixes: `c5bdd2d8f1` ("kdump-lib: use non-debug kernels first") Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> Acked-by: Lichen Liu <lichliu@redhat.com> Acked-by: Coiby Xu <coxu@redhat.com>	2022-09-23 14:23:38 +08:00
Lichen Liu	4d52b7d548	mkdumprd: Improve error messages on non-existing NFS target directories When kdump is configured with a NFS location, and the remote directory does not exist, kdump.service fails with a confusing error message. kdumpctl[2172]: kdump: Dump path "/tmp/mkdumprd.ftWhOF/target/dumps" does not exist in dump target "10.111.113.2:/srv/kdump" We just need to print the remote directory "dumps" in such case, because "/tmp/mkdumprd.ftWhOF/target" is the local temporary mount point. Signed-off-by: Lichen Liu <lichliu@redhat.com> Reviewed-by: Coiby Xu<coxu@redhat.com>	2022-09-21 13:30:14 +08:00
Lichen Liu	4edcd9a400	kdumpctl: make the kdump.log root-readable-only Decrease the risk that of leaking information that could potentially be used to exploit the crash further (think location of keys). Signed-off-by: Lichen Liu <lichliu@redhat.com> Acked-by: Coiby Xu <coxu@redhat.com>	2022-09-06 20:21:31 +08:00
Kairui Song	677da8a59b	sysconfig: use a simple generator script to maintain These kdump.sysconfig.* files are almost identical with a bit difference in several parameters, just use a simple script to generate them upon packaging. This should make it easier to maintain, updating a comment or param for a certain arch can be done in one place. There are only some comment or empty option differences with the generated version because some arch's sysconfig is not up-to-dated, this actually fixes the issue, I used the following script to check these differences: # for arch in aarch64 i386 ppc64 ppc64le s390x x86_64; do ./gen-kdump-sysconfig.sh $arch > kdump.sysconfig.$arch.new git checkout HEAD^ kdump.sysconfig.$arch &>/dev/null echo "$arch:" diff kdump.sysconfig.$arch kdump.sysconfig.$arch.new; echo "" done; git reset; Signed-off-by: Kairui Song <kasong@tencent.com> Reviewed-by: Philipp Rudo <prudo@redhat.com>	2022-08-16 14:35:35 +08:00
Coiby Xu	aa84244346	Release 2.0.25-1 Signed-off-by: Coiby Xu <coxu@redhat.com>	2022-08-03 20:14:23 +08:00
Coiby Xu	2d5df7a512	tests: specify the Fedora version when running fedpkg sources So fedpkg will fetch the sources that matches given Fedora version. Signed-off-by: Coiby Xu <coxu@redhat.com> Reviewed-by: Philipp Rudo <prudo@redhat.com>	2022-08-03 20:14:12 +08:00
Coiby Xu	f91711ba8e	tests: specify the backing format for the backing file when using qemu-img create New version of qemu-img requires specifying the backing format for the backing file otherwise it will abort. Signed-off-by: Coiby Xu <coxu@redhat.com> Reviewed-by: Philipp Rudo <prudo@redhat.com>	2022-08-03 20:14:12 +08:00
Coiby Xu	d347ad591f	tests: correctly mount the root and also the boot partitions for Fedora 35, 36 and rawhide Cloud Base Image Fedora 33 and 34 Cloud Base Images have only one partition with the following directory structure, . ├── bin -> usr/bin ├── boot ├── dev ├── etc ├── home ├── root By comparison, Fedora 35, 36 and 37 Cloud Base Images have multiple partitions. The root partition which is the last partition has the following directory, . ├── home └── root ├── bin -> usr/bin ├── boot ├── dev ├── etc ├── home ├── root and the 2nd partition is the boot partition. This patch address the above changes by mounting {LAST_PARTITION}/root as to TEMP_ROOT and mount SECOND_PARTITION to TEMP_ROOT/boot. So the test image can be built successfully. Signed-off-by: Coiby Xu <coxu@redhat.com> Reviewed-by: Philipp Rudo <prudo@redhat.com>	2022-08-03 20:14:12 +08:00
Coiby Xu	4d1e02d340	remind the users to run zipl after calling grubby on s390x s390x doesn't use GRUB. To make sure the boot entries are updated, call zipl after running grubby. Suggested-by: smitterl@redhat.com Reviewed-by: Philipp Rudo <prudo@redhat.com> Signed-off-by: Coiby Xu <coxu@redhat.com>	2022-08-03 11:09:55 +08:00
Coiby Xu	58eef4582a	remove useless --zipl when calling grubby to update kernel command line "grubby --zipl" only takes effect when setting default kernel. It's useless to add "--zipl" when updating kernel command line. Also rename _update_grub to _update_kernel_cmdline since s390x doesn't use GRUB. Reviewed-by: Philipp Rudo <prudo@redhat.com> Signed-off-by: Coiby Xu <coxu@redhat.com>	2022-08-03 11:09:45 +08:00
Coiby Xu	e8ae897595	skip updating /etc/default/grub for s390x Resolves: bz2104534 When running "kdumpctl reset-crashkernel --kernel=ALL" on s390x, sed: can't read /etc/default/grub: No such file or directory sed: can't read /etc/default/grub: No such file or directory This happens because s390x doesn't use the grub bootloader and /etc/default/grub doesn't exist. Reported-by: smitterl@redhat.com Reviewed-by: Philipp Rudo <prudo@redhat.com> Signed-off-by: Coiby Xu <coxu@redhat.com>	2022-08-03 11:09:37 +08:00
Coiby Xu	f6bcd819fc	use /run/ostree-booted to tell if scriptlet is running on OSTree system Resolves: bz2092012 According to the ostree team [1], the existence of /run/ostree-booted > is the most stable way to signal/check that a system has been > booted in ostree-style. It is also used by rpm-ostree at > compose/install time in the sandboxed environment where scriptlets run, > in order to signal that the package is being installed/composed into > an ostree commit (i.e. not directly on a live system). See > `8ddf5f40d9/src/libpriv/rpmostree-scripts.cxx (L350-L353)` > for reference. By checking the existence of /run/ostree-booted, we could skip trying to update kernel cmdline during OSTree compose time. [1] https://bugzilla.redhat.com/show_bug.cgi?id=2092012#c3 Reported-by: Luca BRUNO <lucab@redhat.com> Suggested-by: Luca BRUNO <lucab@redhat.com> Fixes: `0adb0f4` ("try to reset kernel crashkernel when kexec-tools updates the default crashkernel value") Signed-off-by: Coiby Xu <coxu@redhat.com> Reviewed-by: Philipp Rudo <prudo@redhat.com> Acked-by: Timothée Ravier <siosm@fedoraproject.org>	2022-08-03 11:07:47 +08:00
Coiby Xu	da0ca0d205	Allow to update kexec-tools using virt-customize for cloud base image Resolves: bz2089871 Currently, kexec-tools can't be updated using virt-customize because older version of kdumpctl can't acquire instance lock for the get-default-crashkernel subcommand. The reason is /var/lock is linked to /run/lock which however doesn't exist in the case of virt-customize. This patch fixes this problem by using /tmp/kdump.lock as the lock file if /run/lock doesn't exist. Note 1. The lock file is now created in /run/lock instead of /var/run/lock since Fedora has adopted adopted /run [2] since F15. 2. %pre scriptlet now always return success since package update won't be blocked [1] https://fedoraproject.org/wiki/Features/var-run-tmpfs Fixes: `0adb0f4` ("try to reset kernel crashkernel when kexec-tools updates the default crashkernel value") Reported-by: Nicolas Hicher <nhicher@redhat.com> Suggested-by: Laszlo Ersek <lersek@redhat.com> Suggested-by: Philipp Rudo <prudo@redhat.com> Signed-off-by: Coiby Xu <coxu@redhat.com> Reviewed-by: Philipp Rudo <prudo@redhat.com>	2022-08-02 18:36:34 +08:00
Pingfan Liu	d593bfa6fc	KDUMP_COMMANDLINE: remove irqpoll parameter on aws aarch64 platform Currently, kdump may experience failure on some aws aarch64 platform. The final scenario is: [ 79.145089] printk: console [ttyS0] disabled Then the system has no response any more. And after reboot, there is no vmcore generated under /var/crash/. More detail [1]. In a short word, it is caused by the irqpoll policy and some unknown acpi issue. The serial device is hot-removed as a pci device. More detailed, the irqpoll policy demands to iterate over all interrupt handler, if the interrupt line is shared, then the handler is dispatched. And acpi handler acpi_irq() is on a shared interrupt line, so it is called. But for some unknown reason, the acpi hardware regs hold wrong state, and the acpi driver decides that a hot-removed event happens on a pci slot, which finally removes the pci serial device. To tackle this issue by removing the irqpoll parameter on aws aarch64 platform, until the real root cause in acpi is found and resolved. [1]: https://bugzilla.redhat.com/show_bug.cgi?id=2080468#c0 Signed-off-by: Pingfan Liu <piliu@redhat.com> Acked-by: Coiby Xu <coxu@redhat.com>	2022-07-21 19:03:37 +08:00
Coiby Xu	c735539b35	Release 2.0.24-4 Signed-off-by: Coiby Xu <coxu@redhat.com>	2022-07-21 16:42:43 +08:00
Baoquan He	1913ea9118	Checking the existence of 40-redhat.rules before modifying Resolves: bz2106645 The code of commit `163c02970e` takes effect in rhel firstly, later pulled to Fedora. However, Fedora OS doesn't have 40-redhat.rules in systemd-udev package. With this commit applied, a false positive warning message can always been seen as below. So fixing it by checking if 40-redhat.rules exists before handling. With this change, the false warning is gone. [root@ ~]# kdumpctl restart kdump: kexec: unloaded kdump kernel kdump: Stopping kdump: [OK] kdump: No kdump initial ramdisk found. kdump: Rebuilding /boot/initramfs-5.19.0-rc6+kdump.img sed: can't read /var/tmp/dracut.NnAV2g/initramfs/usr/lib/udev/rules.d/40-redhat.rules: No such file or directory kdump: kexec: loaded kdump kernel kdump: Starting kdump: [OK] Signed-off-by: Baoquan He <bhe@redhat.com> Acked-by: Pingfan Liu <piliu@redhat.com>	2022-07-21 15:02:32 +08:00
Lichen Liu	ed9cbec2ee	kdump-lib: Add the CoreOS kernel dir to the boot_dirlist The kernel of CoreOS is not in the standard locations, add /boot/ostree/* to the boot_dirlist to find the vmlinuz. Signed-off-by: Lichen Liu <lichliu@redhat.com> Acked-by: Coiby Xu <coxu@redhat.com>	2022-06-30 16:00:06 +08:00
Dusty Mabe	f9c32372d2	kdump-lib: attempt to fix BOOT_IMAGE detection Currently $boot_img can get bad data if running on a platform that doesn't set BOOT_IMAGE in the kernel command line. For example, currently: - s390x Fedora CoreOS machine: ``` [root@cosa-devsh ~]# sed "s/^BOOT_IMAGE=$(\S)$\?$\S$ ./\2/" /proc/cmdline mitigations=auto,nosmt ignition.platform.id=qemu ostree=/ostree/boot.0/fedora-coreos/2a72567ac8f7ed678c3ac89408f795e6ccd4e97b41e14af5f471b6a807e858b9/0 root=UUID=2a88436a-3b6b-4706-b33a-b8270bd87cde rw rootflags=prjquota boot=UUID=f4b2eaa5-9317-4798-85cf-308c477fee4c crashkernel=600M ``` where on a platform that uses GRUB we get: - x86_64 Fedora CoreOS machine: ``` [root@cosa-devsh ~]# sed "s/^BOOT_IMAGE=$(\S)$\?$\S$ ./\2/" /proc/cmdline /ostree/fedora-coreos-af4f6cc7b9ff486cfa647680b180e989c72c8eed03a34a42e7328e49332bd20e/vmlinuz-5.18.5-200.fc36.x86_64 ``` We should change the setting of the boot_img variable such that it will be empty if BOOT_IMAGE doesn't exist. With this change on the s390x machine: ``` [root@cosa-devsh ~]# grep -P -o '^BOOT_IMAGE=(\S+)' /proc/cmdline \| sed "s/^BOOT_IMAGE=$(\S)$\?$\S$/\2/" [root@cosa-devsh ~]# ``` This change mattered much more before the change in `c5bdd2d` which changed the following line from [[ -n $boot_img ]] to [[ "$boot_img" == *"$kdump_kernelver" ]]. Still I think this change has merit. Signed-off-by: Dusty Mabe <dusty@dustymabe.com> Acked-by: Coiby Xu <coxu@redhat.com>	2022-06-30 16:00:06 +08:00
Dusty Mabe	a1ebf0b565	kdump-lib: change how ostree based systems are detected The current recommendation is to check for /run/ostree-booted. See https://bugzilla.redhat.com/show_bug.cgi?id=2092012#c0 Signed-off-by: Dusty Mabe <dusty@dustymabe.com> Acked-by: Coiby Xu <coxu@redhat.com>	2022-06-30 16:00:06 +08:00
Dusty Mabe	980f10aa40	kdump-lib: clear up references to Atomic/CoreOS There are many variants on OSTree based systems these days so we should probably refer to the class of systems as "OSTree based systems". Also, Atomic Host is dead. Signed-off-by: Dusty Mabe <dusty@dustymabe.com> Acked-by: Coiby Xu <coxu@redhat.com>	2022-06-30 16:00:06 +08:00
Pingfan Liu	b92bc6e0a7	crashkernel: optimize arm64 reserved size if PAGE_SIZE=4k On RHEL9 and Fedora, the arm64 platform only supports 4KB page size. the reserved memory size can be aligned to that on x86_64. Introducing a new formula for 4KB on arm64, which bases on x86_64 plus extra 64MB. Signed-off-by: Pingfan Liu <piliu@redhat.com> Acked-by: Baoquan He <bhe@redhat.com>	2022-06-15 09:08:03 +08:00

1 2 3 4 5 ...

1581 Commits