kexec-tools

Author	SHA1	Message	Date
Philipp Rudo	33b307af20	kdumpctl: cleanup 'start' The function has many block of the kind if ! cmd; then derror "Starting kdump: [FAILED]" return 1 fi This duplicates code and makes the function hard to read. Thus move the block to the calling function. Signed-off-by: Philipp Rudo <prudo@redhat.com> Reviewed-by: Coiby Xu <coxu@redhat.com>	2023-01-30 17:37:23 +08:00
Philipp Rudo	d55a056558	kdumpctl: move aws workaround to kdump-lib Move the workaround for aws graviton cpus from load_kdump to prepare_cmdline. This (1) makes the workaround available also for other callers of prepare_cmdline (although not needed at the moment) and (2) makes it easier to fix the problems found by the unit test included earlier as all changes to the cmdline are done at one place now. Signed-off-by: Philipp Rudo <prudo@redhat.com> Reviewed-by: Coiby Xu <coxu@redhat.com>	2023-01-30 17:37:23 +08:00
Philipp Rudo	b9fd7a4076	kdumpctl: merge check_current_{kdump,fadump}_status Both functions are almost identical. The only differences are (1) the sysfs node the status is read from and (2) the fact the fadump version doesn't verify if the file it's trying to read actually exists. Thus merge the two functions and get rid of the check_current_status wrapper. While at it rename the function to is_kernel_loaded which explains better what the function does. Finally, after moving FADUMP_REGISTER_SYS_NODE shellcheck can no longer access the definition and starts complaining about it not being quoted. Thus quote all uses of FADUMP_REGISTER_SYS_NODE. Signed-off-by: Philipp Rudo <prudo@redhat.com> Reviewed-by: Coiby Xu <coxu@redhat.com>	2023-01-30 17:37:23 +08:00
Philipp Rudo	d5faaee62b	kdumpctl: simplify check_failure_action_config With the deprecation of the 'default' option in kdump.conf check_failure_action_config needed to track which option was used (default or failure_action). This made the function quite complex.Thus make option 'default' a true alias of 'failure_action' when parsing kdump.conf and simplify check_failure_action_config. Do the same simplifications for check_final_action_config as both functions are basically identical. Signed-off-by: Philipp Rudo <prudo@redhat.com> Reviewed-by: Coiby Xu <coxu@redhat.com>	2023-01-30 17:37:23 +08:00
Coiby Xu	5951b5e268	Don't try to update crashkernel when bootloader is not installed Currently when using anaconda to install the OS, the following errors occur, INF packaging: Configuring (running scriptlet for): kernel-core-5.14.0-70.el9.x86_64 ... INF dnf.rpm: grep: /boot/grub2/grubenv: No such file or directory grep: /boot/grub2/grubenv: No such file or directory grep: /boot/grub2/grubenv: No such file or directory grep: /boot/grub2/grubenv: No such file or directory ... INF packaging: Configuring (running scriptlet for): kexec-tools-2.0.23-9.el9.x86_64 ... INF dnf.rpm: grep: /boot/grub2/grubenv: No such file or directory grep: /boot/grub2/grubenv: No such file or directory grep: /boot/grub2/grubenv: No such file or directory Or for s390, the following errors occur, INF packaging: Configuring (running scriptlet for): kernel-core-5.14.0-71.el9.s390x ... 03:37:51,232 INF dnf.rpm: grep: /etc/zipl.conf: No such file or directory grep: /etc/zipl.conf: No such file or directory grep: /etc/zipl.conf: No such file or directory INF packaging: Configuring (running scriptlet for): kexec-tools-2.0.23-9_1.el9_0.s390x ... INF dnf.rpm: grep: /etc/zipl.conf: No such file or directory This is because when anaconda installs the packages, bootloader hasn't been installed and /boot/grub2/grubenv or /etc/zipl.conf doesn't exist. So don't try to update crashkernel when bootloader isn't ready to avoid the above errors. Note this is the second attempt to fix this issue. Previously a file /tmp/kexec_tools_package_install was created to avoid running the related code thus to avoid the above errors but unfortunately that approach has two issues a) somehow osbuild doesn't delete it for RHEL b) this file could still exist if users manually remove kexec-tools. Fixes: `e218128` ("Only try to reset crashkernel for osbuild during package install") Reported-by: Jan Stodola <jstodola@redhat.com> Signed-off-by: Coiby Xu <coxu@redhat.com> Reviewed-by: Philipp Rudo <prudo@redhat.com>	2022-12-22 10:33:00 +08:00
Hari Bathini	a833624fe5	fadump: avoid status check while starting in fadump mode With kernel commit 607451ce0aa9b ("powerpc/fadump: register for fadump as early as possible"), 'kdumpctl start' prematurely returns with the below message: "Kdump already running: [WARNING]" instead of setting default initrd with dump capture capability as required for fadump. Skip status check in fadump mode to avoid this problem. Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> Reviewed-by: Philipp Rudo <prudo@redhat.com>	2022-12-07 09:43:41 +08:00
Hari Bathini	25411da966	fadump: fix default initrd backup and restore logic In case of fadump, default initrd is rebuilt with dump capturing capability, as the same initrd is used for booting production kernel as well as capture kernel. The original initrd file is backed up with a checksum, to restore it as the default initrd when fadump is disabled. As the checksum file is not kernel version specific, switching between different kernel versions and kdump/fadump dump mode breaks the default initrd backup/restore logic. Fix this by having a kernel version specific checksum file. Also, if backing up initrd fails, retaining the checksum file isn't useful. Remove it. Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> Reviewed-by: Philipp Rudo <prudo@redhat.com>	2022-12-07 09:42:29 +08:00
Lichen Liu	5eb77ee3fa	kdumpctl: Optimize _find_kernel_path_by_release regex string Currently _find_kernel_path_by_release uses grubby and grep to find the kernel path, if both the normal kernel and it's debug varient exist, the grep will give more than one kernel strings. ``` kernel="/boot/vmlinuz-5.14.0-139.kpq0.el9.s390x+debug" kernel="/boot/vmlinuz-5.14.0-139.kpq0.el9.s390x" ``` This will cause an error when installing debug kernel. ``` The param "/boot/vmlinuz-5.14.0-139.kpq0.el9.s390x+debug /boot/vmlinuz-5.14.0-139.kpq0.el9.s390x" is incorrect ``` Fixes: `945cbbd` ("add helper functions to get kernel path by kernel release and the path of current running kernel") Signed-off-by: Lichen Liu <lichliu@redhat.com> Reviewed-by: Philipp Rudo <prudo@redhat.com>	2022-11-25 17:27:15 +08:00
Coiby Xu	a3da46d6c4	Skip reset_crashkernel_after_update during package install Currently, kexec-tools tries to reset crashkernel when using anaconda to install the system. But grubby isn't ready and complains that, 10:34:17,014 INF packaging: Configuring (running scriptlet for): kexec-tools-2.0.23-9.el9.x86_64 1646034766 53ff7158f8808774f4e3c3c87e504aa7a6d677b537754dac86c87925c8f0a397 10:34:17,205 INF dnf.rpm: grep: /boot/grub2/grubenv: No such file or directory grep: /boot/grub2/grubenv: No such file or directory grep: /boot/grub2/grubenv: No such file or directory kexec-tools is supposed to update the kernel crashkernel parameter after package upgrade. Unfortunately, the posttrans RPM scriptlet doesn't distinguish between package install and upgrade. This patch skips reset_crashkernel_after_update as similar to `e218128e` ("Only try to reset crashkernel for osbuild during package install"). Reported-by: Jan Stodola <jstodola@redhat.com> Signed-off-by: Coiby Xu <coxu@redhat.com>	2022-11-18 17:22:39 +08:00
Tao Liu	3ae8cf8876	Don't check fs modified when dump target is lvm2 thinp When the dump target is lvm2 thinp, if we didn't mount the dump target first, get_fs_type_from_target will get empty output: Before mount: $ get_fs_type_from_target /dev/vg00/thinlv After mount: $ mount /dev/vg00/thinlv /mnt $ get_fs_type_from_target /dev/vg00/thinlv ext4 As a result, kdumpctl start will fail with: $ kdumpctl start kdump: Dump target is invalid kdump: Starting kdump: [FAILED] This patch fix the issue by bypassing check_fs_modified when the dump target is lvm2 thinp. Signed-off-by: Tao Liu <ltao@redhat.com> Reviewed-by: Coiby Xu <prudo@redhat.com>	2022-11-11 10:29:02 +08:00
Tao Liu	10ca970940	lvm.conf should be check modified if lvm2 thinp enabled lvm2 relies on /etc/lvm/lvm.conf to determine its behaviour. The important configs such as thin_pool_autoextend_threshold and thin_pool_autoextend_percent will be used during kdump in 2nd kernel. So if the file is modified, the initramfs should be rebuild to include the latest. Signed-off-by: Tao Liu <ltao@redhat.com> Reviewed-by: Philipp Rudo <prudo@redhat.com>	2022-11-01 12:20:34 +08:00
Coiby Xu	fdad7d9869	Skip reading /etc/defaut/grub for s390x Currently, updating kexec-tools on s390x gives the warning sed: can't read /etc/default/grub: No such file or directory This happens because s390x doesn't use GRUB and /etc/default/grub doesn't exist. We need to skip both reading and writing to /etc/default/grub. Reported-by: Jie Li <jieli@redhat.com> Reviewed-by: Philipp Rudo <prudo@redhat.com> Signed-off-by: Coiby Xu <coxu@redhat.com>	2022-10-27 14:42:27 +08:00
Coiby Xu	6ce4b85bb3	Include the memory overhead cost of cryptsetup when estimating the memory requirement for LUKS-encrypted target Currently, "kdumpctl estimate" neglects the memory overhead cost of cryptsetup itself. Unfortunately, there is no golden formula to calculate the overhead cost [1]. So estimate the overhead cost as 50M for aarch64 and 20M for other architectures based on the following empirical data, \| Overhead (M) \| OS \| arch \| \| ------------ \| ----------------------------------------- \| ------- \| \| 14.1 \| RHEL-9.2.0-20220829.d.1 \| ppc64le \| \| 14 \| Fedora-37-20220830.n.0 Everything ppc64le \| ppc64le \| \| 17 \| Fedora 36 \| ppc64le \| \| 8.8 \| Fedora 35 \| s390x \| \| 10.1 \| Fedora-Rawhide-20220829.n.0, fc38 \| s390x \| \| 42 \| Fedora-Rawhide-20220829.n.0, fc38 \| arch64 \| \| 40 \| F35 \| arch64 \| \| 42 \| F36 \| arch64 \| \| 42 \| Fedora-Rawhide-20220901.n.0 \| arch64 \| \| 10 \| F35 \| x86_64 \| \| 10 \| Fedora-Rawhide-20220901.n.0 \| x86_64 \| \| 11 \| Fedora-Rawhide-20220901.n.0 \| x86_64 \| [1] https://lore.kernel.org/cryptsetup/20220616044339.376qlipk5h2omhx2@Rk/T/#u Fixes: `e9e6a2c` ("kdumpctl: Add kdumpctl estimate") Signed-off-by: Coiby Xu <coxu@redhat.com> Reviewed-by: Philipp Rudo <prudo@redhat.com>	2022-10-26 15:38:21 +08:00
Coiby Xu	50a8461fc7	Choosing the most memory-consuming key slot when estimating the memory requirement for LUKS-encrypted target When there are multiple key slots, "kdumpctl estimate" uses the least memory-consuming key slot. For example, when there are two memory slots created with --pbkdf-memory=1048576 (1G) and --pbkdf-memory=524288 (512M), "kdumpctl estimate" thinks the extra memory requirement is only 512M. This will of course lead to OOM if the user uses the more memory-consuming key slot. Fix it by sorting in reverse order. Fixes: `e9e6a2c` ("kdumpctl: Add kdumpctl estimate") Signed-off-by: Coiby Xu <coxu@redhat.com> Reviewed-by: Lichen Liu <lichliu@redhat.com> Signed-off-by: Coiby Xu <coxu@redhat.com>	2022-10-26 15:34:08 +08:00
Coiby Xu	15122b3f98	Fix grep warnings "grep: warning: stray \ before -" Latest grep (3.8) warnings about unneeded backslashes when building kdump initrd [1], kdump: Rebuilding /boot/initramfs-6.0.0-0.rc5.a335366bad13.40.test.fc38.aarch64kdump.img grep: warning: stray \ before - grep: warning: stray \ before - grep: warning: stray \ before - grep: warning: stray \ before - grep: warning: stray \ before - Some warnings can be avoided by using "sed -n" to remove grep and the others can use the -- argument. [1] https://s3.us-east-1.amazonaws.com/arr-cki-prod-datawarehouse-public/datawarehouse-public/2022/09/17/redhat:643020269/build_aarch64_redhat:643020269_aarch64/tests/4/results_0001/job.01/recipes/12617739/tasks/5/logs/taskout.log Reported-by: Baoquan He <bhe@redhat.com> Signed-off-by: Coiby Xu <coxu@redhat.com> Suggested-by: Philipp Rudo <prudo@redhat.com> Reviewed-by: Philipp Rudo <prudo@redhat.com>	2022-10-26 14:16:04 +08:00
Coiby Xu	e218128e28	Only try to reset crashkernel for osbuild during package install Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2060319 Currently, kexec-tools tries to reset crashkernel when using anaconda to install the system. But grubby isn't ready and complains that, 10:33:17,631 INF packaging: Configuring (running scriptlet for): kernel-core-5.14.0-70.el9.x86_64 1645746534 03dcd32db234b72440ee6764d59b32347c5f0cd98ac3fb55beb47214a76f33b4 10:34:16,696 INF dnf.rpm: grep: /boot/grub2/grubenv: No such file or directory grep: /boot/grub2/grubenv: No such file or directory We only need to try resetting crashkernel for osbuild. Skip it for other cases. To tell if it's package install instead of package upgrade, make use of %pre to write a file /tmp/kexec-tools-install when "$1 == 1" [1]. [1] https://docs.fedoraproject.org/en-US/packaging-guidelines/Scriptlets/#_syntax Reported-by: Jan Stodola <jstodola@redhat.com> Signed-off-by: Coiby Xu <coxu@redhat.com> Reviewed-by: Lichen Liu <lichenliu@redhat.com>	2022-10-20 13:54:10 +08:00
Coiby Xu	a7ead187a4	Prefix reset-crashkernel-{for-installed_kernel,after-update} with underscore Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2048690 To indicate they are for internal use only, underscore them. Reported-by: rcheerla@redhat.com Signed-off-by: Coiby Xu <coxu@redhat.com> Reviewed-by: Lichen Liu <lichenliu@redhat.com>	2022-10-20 13:54:10 +08:00
Tao Liu	c743881ae6	virtiofs support for kexec-tools This patch add virtiofs support for kexec-tools by introducing a new option for /etc/kdump.conf: virtiofs myfs Where myfs is a variable tag name specified in qemu cmdline "-device vhost-user-fs-pci,tag=myfs". The patch covers the following cases: 1) Dumping VM's vmcore to a virtiofs shared directory; 2) When the VM's rootfs is a virtiofs shared directory and dumping the VM's vmcore to its subdirectory, such as /var/crash; 3) The combination of case 1 & 2: The VM's rootfs is a virtiofs shared directory and dumping the VM's vmcore to another virtiofs shared directory. Case 2 & 3 need dracut >= 057, otherwise VM cannot boot from virtiofs shared rootfs. But it is not the issue of kexec-tools. Reviewed-by: Philipp Rudo <prudo@redhat.com> Signed-off-by: Tao Liu <ltao@redhat.com>	2022-09-29 12:22:49 +08:00
Lichen Liu	4edcd9a400	kdumpctl: make the kdump.log root-readable-only Decrease the risk that of leaking information that could potentially be used to exploit the crash further (think location of keys). Signed-off-by: Lichen Liu <lichliu@redhat.com> Acked-by: Coiby Xu <coxu@redhat.com>	2022-09-06 20:21:31 +08:00
Coiby Xu	58eef4582a	remove useless --zipl when calling grubby to update kernel command line "grubby --zipl" only takes effect when setting default kernel. It's useless to add "--zipl" when updating kernel command line. Also rename _update_grub to _update_kernel_cmdline since s390x doesn't use GRUB. Reviewed-by: Philipp Rudo <prudo@redhat.com> Signed-off-by: Coiby Xu <coxu@redhat.com>	2022-08-03 11:09:45 +08:00
Coiby Xu	e8ae897595	skip updating /etc/default/grub for s390x Resolves: bz2104534 When running "kdumpctl reset-crashkernel --kernel=ALL" on s390x, sed: can't read /etc/default/grub: No such file or directory sed: can't read /etc/default/grub: No such file or directory This happens because s390x doesn't use the grub bootloader and /etc/default/grub doesn't exist. Reported-by: smitterl@redhat.com Reviewed-by: Philipp Rudo <prudo@redhat.com> Signed-off-by: Coiby Xu <coxu@redhat.com>	2022-08-03 11:09:37 +08:00
Coiby Xu	da0ca0d205	Allow to update kexec-tools using virt-customize for cloud base image Resolves: bz2089871 Currently, kexec-tools can't be updated using virt-customize because older version of kdumpctl can't acquire instance lock for the get-default-crashkernel subcommand. The reason is /var/lock is linked to /run/lock which however doesn't exist in the case of virt-customize. This patch fixes this problem by using /tmp/kdump.lock as the lock file if /run/lock doesn't exist. Note 1. The lock file is now created in /run/lock instead of /var/run/lock since Fedora has adopted adopted /run [2] since F15. 2. %pre scriptlet now always return success since package update won't be blocked [1] https://fedoraproject.org/wiki/Features/var-run-tmpfs Fixes: `0adb0f4` ("try to reset kernel crashkernel when kexec-tools updates the default crashkernel value") Reported-by: Nicolas Hicher <nhicher@redhat.com> Suggested-by: Laszlo Ersek <lersek@redhat.com> Suggested-by: Philipp Rudo <prudo@redhat.com> Signed-off-by: Coiby Xu <coxu@redhat.com> Reviewed-by: Philipp Rudo <prudo@redhat.com>	2022-08-02 18:36:34 +08:00
Pingfan Liu	d593bfa6fc	KDUMP_COMMANDLINE: remove irqpoll parameter on aws aarch64 platform Currently, kdump may experience failure on some aws aarch64 platform. The final scenario is: [ 79.145089] printk: console [ttyS0] disabled Then the system has no response any more. And after reboot, there is no vmcore generated under /var/crash/. More detail [1]. In a short word, it is caused by the irqpoll policy and some unknown acpi issue. The serial device is hot-removed as a pci device. More detailed, the irqpoll policy demands to iterate over all interrupt handler, if the interrupt line is shared, then the handler is dispatched. And acpi handler acpi_irq() is on a shared interrupt line, so it is called. But for some unknown reason, the acpi hardware regs hold wrong state, and the acpi driver decides that a hot-removed event happens on a pci slot, which finally removes the pci serial device. To tackle this issue by removing the irqpoll parameter on aws aarch64 platform, until the real root cause in acpi is found and resolved. [1]: https://bugzilla.redhat.com/show_bug.cgi?id=2080468#c0 Signed-off-by: Pingfan Liu <piliu@redhat.com> Acked-by: Coiby Xu <coxu@redhat.com>	2022-07-21 19:03:37 +08:00
Dusty Mabe	980f10aa40	kdump-lib: clear up references to Atomic/CoreOS There are many variants on OSTree based systems these days so we should probably refer to the class of systems as "OSTree based systems". Also, Atomic Host is dead. Signed-off-by: Dusty Mabe <dusty@dustymabe.com> Acked-by: Coiby Xu <coxu@redhat.com>	2022-06-30 16:00:06 +08:00
Coiby Xu	b97310428f	unit tests: prepare for kdumpctl and kdump-lib.sh to be unit-tested Currently there are two issues with unit-testing the functions defined in kdumpctl and other shell scripts after sourcing them, - kdumpctl would call main which requires root permission and would create single instance lock (/var/lock/kdump) - kdumpctl and other shell scripts directly source files under /usr/lib/kdump/ When ShellSpec load a script via "Include", it defines the__SOURCED__ variable. By making use of __SOURCED__, we can 1. let kdumpctl not call main when kdumpctl is "Include"d by ShellSpec 2. instruct kdumpctl and kdump-lib.sh to source the files in the repo when running ShelSpec tests Note coverage/ is added to .gitignore because ShellSpec generates code coverage results in this folder. Reviewed-by: Philipp Rudo <prudo@redhat.com> Signed-off-by: Coiby Xu <coxu@redhat.com>	2022-04-14 11:44:12 +08:00
Philipp Rudo	55b5c4e2b0	kdumpctl: simplify local_fs_dump_target Make use of the new ${OPT[]} array and simplify local_fs_dump_target to remove one more file operations. While at it rename the local_fs_dump_target to is_local_target Signed-off-by: Philipp Rudo <prudo@redhat.com> Reviewed-by: Tao Liu <ltao@redhat.com> Reviewed-by: Coiby Xu <coxu@redhat.com>	2022-04-02 16:24:32 +08:00
Philipp Rudo	ac5968218f	kdumpctl: remove kdump_get_conf_val in save_raw With the introduction of ${OPT[fstype]} this call to kdump_get_conf_val can be removed now as well. Signed-off-by: Philipp Rudo <prudo@redhat.com> Reviewed-by: Tao Liu <ltao@redhat.com> Reviewed-by: Coiby Xu <coxu@redhat.com>	2022-04-02 16:24:32 +08:00
Philipp Rudo	5118daf2ff	kdumpctl: drop DUMP_TARGET variable The variable is only used for ssh dump targets. Furthermore it is identical to the value stored in ${OPT[_target]}. Thus drop DUMP_TARGET and use ${OPT[_target]} instead. In order to be able to distinguish between the different target types introduce the internal ${OPT[_fstype]}. Signed-off-by: Philipp Rudo <prudo@redhat.com> Reviewed-by: Tao Liu <ltao@redhat.com> Reviewed-by: Coiby Xu <coxu@redhat.com>	2022-04-02 16:24:32 +08:00
Philipp Rudo	a859abe365	kdumpctl: drop SSH_KEY_LOCATION variable The variable is only used for ssh dump targets. Furthermore it is identical to the value stored in ${OPT[sshkey]}. Thus drop SSH_KEY_LOCATION and use ${OPT[sshkey]} instead. Signed-off-by: Philipp Rudo <prudo@redhat.com> Reviewed-by: Tao Liu <ltao@redhat.com> Reviewed-by: Coiby Xu <coxu@redhat.com>	2022-04-02 16:24:32 +08:00
Philipp Rudo	0460f0a768	kdumpctl: drop SAVE_PATH variable The variable is only used for ssh dump targets. Furthermore it is identical to the value stored in ${OPT[path]}. Thus drop SAVE_PATH and use ${OPT[path]} instead. Also make sure that ${OPT[path]} is always set to the default value when no entry in kdump.conf is found. Signed-off-by: Philipp Rudo <prudo@redhat.com> Reviewed-by: Tao Liu <ltao@redhat.com> Reviewed-by: Coiby Xu <coxu@redhat.com>	2022-04-02 16:24:32 +08:00
Philipp Rudo	edb1d04425	kdumpctl: reduce file operations on kdump.conf Every call to kdump_get_conf_val parses kdump.conf although the file has already been parsed in check_config. Thus store the values parsed in check_config in an array and use them later instead of re-parsing the file over and over again. While at it rename check_config to parse_config. Signed-off-by: Philipp Rudo <prudo@redhat.com> Reviewed-by: Tao Liu <ltao@redhat.com> Reviewed-by: Coiby Xu <coxu@redhat.com>	2022-04-02 16:24:32 +08:00
Philipp Rudo	4adf6d3cc8	kdumpctl: merge check_ssh_config into check_config check_config and check_ssh_config both parse /etc/kdump.conf and are usually used together. The difference between both is that check_ssh_config does some extra checks on the format of the provided ssh destination but ignores invalid or deprecated options in the config. Thus merge check_ssh_config into check_config. Leave the additional checks on the ssh destination in check_ssh_config but treat it like the checks done for e.g. the failure_action. This slightly changes the behavior of 'kdumpctl propagate', which now fails if kdump.conf contains an invalid value unrelated to ssh. This change in behavior isn't problematic because 'kdumpctl propagate' always needs to be followed by a 'kdumpctl start' to have a working kdump environment. For the situations where 'propagate' fails now the 'start' would have failed in the past. So the failure only moved one step ahead in the sequence. While at it drop check_ssh_target and call check_and_wait_network_ready directly. Signed-off-by: Philipp Rudo <prudo@redhat.com> Reviewed-by: Tao Liu <ltao@redhat.com> Reviewed-by: Coiby Xu <coxu@redhat.com>	2022-04-02 16:24:32 +08:00
Philipp Rudo	e3fa367840	kdumpctl: simplify propagate_ssh_key The function has multiple problems: 1) SSH_{USER,SERVER} aren't defined local 2) Weird use of cut and sed to parse the DUMP_TARGET for the user and host although check_ssh_config guarantees that it has the format <user>@<host>. 3) Unnecessary use of a variable for the return value 4) Weird behavior to first unpack the DUMP_TARGET to SSH_USER and SSH_SERVER and then putting it back together again 5) Definition of variable errmsg that is only used once but breaks grep-ability of error message. 6) Wrong order when redirecting output of ssh-keygen, see SC2069 [1] Fix them now. While at it also improve the error messages in the function. [1] https://www.shellcheck.net/wiki/SC2069 Signed-off-by: Philipp Rudo <prudo@redhat.com> Reviewed-by: Tao Liu <ltao@redhat.com> Reviewed-by: Coiby Xu <coxu@redhat.com>	2022-04-02 16:24:32 +08:00
Philipp Rudo	b802dbff9f	kdumpctl: forbid aliases from ssh config For ssh targets kdumpctl only verifies that the config value has the correct <user>@<host> format itself. For all other tests, e.g. if the destination can be reached, it relies on ssh. This allows users to provide a <host> that isn't the proper hostname but an alias defined in the ssh_config without failing the tests. If this is done dracut-module-setup.sh:kdump_get_remote_ip will fail to obtain the targets ip address. This failure is not detected and thus will not fail the initramfs creation. The resulting initramfs however doesn't have the necessary information for setting up the network and thus will fail to boot. Prevent the use of alias hostnames by verifying that the given hostname is the same one ssh would use after parsing the ssh_config. Note: Don't use getent ahosts to verify that the given host can be resolved as this requires the network to be up which cannot be guaranteed when the kdump.conf is parsed. Signed-off-by: Philipp Rudo <prudo@redhat.com> Reviewed-by: Tao Liu <ltao@redhat.com> Reviewed-by: Coiby Xu <coxu@redhat.com>	2022-04-02 16:24:32 +08:00
Philipp Rudo	247b3dd297	kdumpctl: fix comment in check_and_wait_network_ready The time out was increased to 180 seconds in `680c0d3` ("kdumpctl: distinguish the failed reason of ssh"). Update the comment to reflect that change. Signed-off-by: Philipp Rudo <prudo@redhat.com> Reviewed-by: Tao Liu <ltao@redhat.com> Reviewed-by: Coiby Xu <coxu@redhat.com>	2022-04-02 16:24:32 +08:00
Philipp Rudo	7cd3f232d5	kdump-lib-initramfs: merge definitions for default ssh key There are currently three identical definitions for the default ssh key. Combine them into one in kdump-lib-initramfs.sh. Signed-off-by: Philipp Rudo <prudo@redhat.com> Reviewed-by: Tao Liu <ltao@redhat.com> Reviewed-by: Coiby Xu <coxu@redhat.com>	2022-04-02 16:24:32 +08:00
Philipp Rudo	b49083126f	kdumpctl: remove unnecessary uses of $? Signed-off-by: Philipp Rudo <prudo@redhat.com> Reviewed-by: Tao Liu <ltao@redhat.com> Reviewed-by: Coiby Xu <coxu@redhat.com>	2022-04-02 16:24:32 +08:00
Philipp Rudo	8736aa5bb3	kdumpctl/estimate: Fix unnecessary warning do_estimate prints the warning that the reserved crashkernel is lower than the recommended one even then when both values are identical. This might cause confusion. So omit printing the warning when both values are equal. Signed-off-by: Philipp Rudo <prudo@redhat.com> Acked-by: Coiby Xu <coxu@redhat.com>	2022-04-02 11:32:49 +08:00
Lichen Liu	7141d044c8	kdumpctl: sync the $TARGET_INITRD after rebuild There is a system-wide sync call at the end of mkdumprd, move it to kdumpctl after rebuild initrd and add another one for mkfadumprd. Sync only the $TARGET_INITRD to avoid a system-wide sync taking too long on a system with high disk activity. Also update the sync in kdumpctl:restore_default_initrd which will mv the $DEFAULT_INITRD_BAK to $DEFAULT_INITRD. Signed-off-by: Lichen Liu <lichliu@redhat.com> Reviewed-by: Philipp Rudo <prudo@redhat.com>	2022-03-01 17:54:29 +08:00
Coiby Xu	6d4062a936	try to update the crashkernel in GRUB_ETC_DEFAULT after kexec-tools updates the default crashkernel value If GRUB_ETC_DEFAULT use crashkernel=auto or crashkernel=OLD_DEFAULT_CRASHKERNEL, it should be updated as well. Add a helper function to read kernel cmdline parameter from GRUB_ETC_DEFAULT. This function is used to read kernel cmdline parameter like fadump or crashkernel. Reviewed-by: Philipp Rudo <prudo@redhat.com>	2022-03-01 10:29:20 +08:00
Coiby Xu	37f4f2c1f6	address the case where there are multiple values for the same kernel arg There is the case where there are multiple entries of the same parameter on the command line, e.g. GRUB_CMDLINE_LINUX="crashkernel=110M crashkernel=220M fadump=on crashkernel=330M". In such an situation _update_kernel_cmdline_in_grub_etc_default only updates/removes the last entry which is usually not what you want as the kernel (for crashkernel) takes the last entry it can find. Thus make sure the case with multiple entries of the same parameter is handled properly by removing all occurrences of given parameter first. Note 1. sed command group and conditional control has been used to get rid of grep. 2. Fully supporting kernel cmdline as documented in Documentation/admin-guide/kernel-parameters.rst is complex and in foreseeable future a full implementation is not needed. So simply document the unsupported cases instead. Fixes: `140da74` ("rewrite reset_crashkernel to support fadump and to used by RPM scriptlet") Reported-by: Philipp Rudo <prudo@redhat.com> Suggested-by: Philipp Rudo <prudo@redhat.com> Reviewed-by: Philipp Rudo <prudo@redhat.com>	2022-03-01 10:28:53 +08:00
Coiby Xu	41b8f9528c	fix incorrect usage of _get_all_kernels_from_grubby It's found that the kernel cmdline crashkernel=auto doesn't get updated when upgrading kexec-tools. This happens because _get_all_kernels_from_grubby is called with no argument by reset_crashkernel_after_update. When retrieving all kernel paths on the system, "grubby --info ALL" should be used. Fix this error by passing "ALL" argument. Fixes: `0adb0f4` ("try to reset kernel crashkernel when kexec-tools updates the default crashkernel value") Reported-by: Jie Li <jieli@redhat.com> Signed-off-by: Coiby Xu <coxu@redhat.com> Reviewed-by: Tao Liu <ltao@redhat.com>	2022-02-14 10:34:55 +08:00
Coiby Xu	5111c01334	fix the mistake of swapping function parameters of read_proc_environ_var _is_osbuild fails because it expects the 1st and 2nd function parameter to be the environment variable and environ file path respectively. Fix it by swapping the parameters in read_proc_environ_var. Note the osbuild environ file path is defined in _OSBUILD_ENVIRON_PATH so _is_osbuild can be unit-tested by overwriting _OSBUILD_ENVIRON_PATH. Fixes: `6a3ce83` ("fix the error of parsing the container environ variable for osbuild") Signed-off-by: Coiby Xu <coxu@redhat.com> Reviewed-by: Pingfan Liu <piliu@redhat.com>	2022-02-08 10:42:40 +08:00
Coiby Xu	6a3ce83a60	fix the error of parsing the container environ variable for osbuild The environment variable entries in /proc/[pid]/environ are separated by null bytes instead of by spaces. Update the sed regex to fix this issue. Note that, 1. this patch also fixes a issue which is kdumpctl would try to reset crashkernel even osbuild has provided custom crashkernel value. 2. kernel hook 92-crashkernel.install installed by kexec-tools is guaranteed to be ran by kernel-install. kexec-tools doesn't recommend kernel so there is no guarantee kernel is installed after kexec-tools. But dnf invokes kernel-install in the posttrans scriptlet (of kernel-core) which is always ran after all packages including kexec-tools and kernel in a dnf transaction. 3. To be able to do unit tests, the logic of reading environment variable has been extracted as a separate function. Fixes: `ddd428a` ("set up kernel crashkernel for osbuild in kernel hook") Signed-off-by: Coiby Xu <coxu@redhat.com> Reviewed-by: Pingfan Liu <piliu@redhat.com> Reviewed-by: Philipp Rudo <prudo@redhat.com>	2022-01-26 08:32:06 +08:00
Coiby Xu	ae0cbdf34a	fix "kdump: Invalid kdump config option auto_reset_crashkernel" error kdumpctl only accepts a specified set of options. Add auto_reset_crashkernel to this set. Fixes: `73ced7f` ("introduce the auto_reset_crashkernel option to kdump.conf") Signed-off-by: Coiby Xu <coxu@redhat.com> Acked-by: Tao Liu <ltao@redhat.com>	2022-01-07 12:20:37 +08:00
Coiby Xu	d5c31605f3	use grep -s to suppress error messages about nonexistent or unreadable files When a file doesn't exist or isn't readable, grep complains as follows, grep: /proc/cmdline: No such file or directory grep: /etc/kernel/cmdline: No such file or directory /proc/cmdline doesn't exist when installing package for an OS image and /etc/kernel/cmdline may not exist if osbuild doesn't want set custom kernel cmdline. Use "-s" to suppress the error messages. Fixes: `0adb0f4` ("try to reset kernel crashkernel when kexec-tools updates the default crashkernel value") Fixes: `ddd428a` ("set up kernel crashkernel for osbuild in kernel hook") Signed-off-by: Coiby Xu <coxu@redhat.com> Acked-by: Tao Liu <ltao@redhat.com>	2022-01-07 12:20:21 +08:00
Coiby Xu	ddd428a1d0	set up kernel crashkernel for osbuild in kernel hook osbuild is a tool to build OS images. It uses bwrap to install packages inside a sandbox/container. Since the kernel package recommends kexec-tools which in turn recommends grubby, the installation order would be grubby -> kexec-tools -> kernel. So we can use the kernel hook 92-crashkernel.install provided by kexec-tools to set up kernel crashkernel for the target OS image. But in osbuild's case, there is no current running kernel and running `uname -r` in the container/sandbox actually returns the host kernel release. To set up kernel crashkernel for the OS image built by osbuild, a different logic is needed. We will check if kernel hook is running inside the osbuild container then set up kernel crashkernel only if osbuild hasn't specified a custome value. osbuild exposes [1] the container=bwrap-osbuild environment variable. According to [2], the environment variable is not inherited down the process tree, so we need to check /proc/1/environ to detect this environment variable to tell if the kernel hook is running inside a bwrap-osbuild container. After that we need to know if osbuild wants to use custom crashkernel value. This is done by checking if /etc/kernel/cmdline has crashkernel set [3]. /etc/kernel/cmdline is written before packages are installed. [1] https://github.com/osbuild/osbuild/pull/926 [2] https://systemd.io/CONTAINER_INTERFACE/ [3] https://bugzilla.redhat.com/show_bug.cgi?id=2024976#c5 Reviewed-by: Pingfan Liu <piliu@redhat.com> Reviewed-by: Philipp Rudo <prudo@redhat.com> Signed-off-by: Coiby Xu <coxu@redhat.com>	2022-01-05 09:40:24 +08:00
Coiby Xu	5e8c751c39	reset kernel crashkernel for the special case where the kernel is updated right after kexec-tools When kexec-tools updates the default crashkernel value, it will try to reset the existing installed kernels including the currently running kernel. So the running kernel could have different kernel cmdline parameters from /proc/cmdline. When installing a kernel after updating kexec-tools, /usr/lib/kernel/install.d/20-grub.install would be called by kernel-install [1] which would use /proc/cmdline to set up new kernel's cmdline. To address this special case, reset the new kernel's crashkernel and fadump value to the value that would be used by running kernel after rebooting by the installation hook. One side effect of this commit is it would reset the installed kernel's crashkernel even currently running kernel don't use the default crashkernel value after rebooting. But I think this side effect is a benefit for the user. The implementation depends on kernel-install which run the scripts in /usr/lib/kernel/install.d passing the following arguments, add KERNEL-VERSION $BOOT/MACHINE-ID/KERNEL-VERSION/ KERNEL-IMAGE [INITRD-FILE ...] An concrete example is given as follows, add 5.11.12-300.fc34.x86_64 /boot/e986846f63134c7295458cf36300ba5b/5.11.12-300.fc34.x86_64 /lib/modules/5.11.12-300.fc34.x86_64/vmlinuz kernel-install could be started by the kernel package's RPM scriplet [2]. As mentioned in previous commit "try to reset kernel crashkernel when kexec-tools updates the default crashkernel value", kdumpctl has difficulty running in RPM scriptlet fore CoreOS. But rpm-ostree ignores all kernel hooks, there is no need to disable the kernel hook for CoreOS/Atomic/Silverblue. But a collaboration between rpm-ostree and kexec-tools is needed [3] to take care of this special case. Note the crashkernel.default support is dropped. [1] https://www.freedesktop.org/software/systemd/man/kernel-install.html [2] https://src.fedoraproject.org/rpms/kernel/blob/rawhide/f/kernel.spec#_2680 [3] https://github.com/coreos/rpm-ostree/issues/2894 Reviewed-by: Pingfan Liu <piliu@redhat.com> Reviewed-by: Philipp Rudo <prudo@redhat.com> Signed-off-by: Coiby Xu <coxu@redhat.com>	2022-01-05 09:40:24 +08:00
Coiby Xu	0adb0f4a8c	try to reset kernel crashkernel when kexec-tools updates the default crashkernel value kexec-tools could update the default crashkernel value. When auto_reset_crashkernel=yes, reset kernel to new crashkernel value in the following two cases, - crashkernel=auto is found in the kernel cmdline - the kernel crashkernel was previously set by kexec-tools i.e. the kernel is using old default crashkernel value To tell if the user is using a custom value for the kernel crashkernel or not, we assume the user would never use the default crashkernel value as custom value. When kexec-tools gets updated, 1. save the default crashkernel value of the older package to /tmp/crashkernel (for POWER system, /tmp/crashkernel_fadump is saved as well). 2. If auto_reset_crashkernel=yes, iterate all installed kernels. For each kernel, compare its crashkernel value with the old default crashkernel and reset it if yes The implementation makes use of two RPM scriptlets [2], - %pre is run before a package is installed so we can use it to save old default crashkernel value - %post is run after a package installed so we can use it to try to reset kernel crashkernel There are several problems when running kdumpctl in the RPM scripts for CoreOS/Atomic/Silverblue, for example, the lock can't be acquired by kdumpctl, "rpm-ostree kargs" can't be run and etc.. So don't enable this feature for CoreOS/Atomic/Silverblue. Note latest shellcheck (0.8.0) gives false positives about the associative array as of this commit. And Fedora's shellcheck is 0.7.2 and can't even correctly parse the shell code because of the associative array. [1] https://github.com/koalaman/shellcheck/issues/2399 [2] https://docs.fedoraproject.org/en-US/packaging-guidelines/Scriptlets/ Reviewed-by: Pingfan Liu <piliu@redhat.com> Reviewed-by: Philipp Rudo <prudo@redhat.com> Signed-off-by: Coiby Xu <coxu@redhat.com>	2022-01-05 09:40:24 +08:00
Coiby Xu	140da74a34	rewrite reset_crashkernel to support fadump and to used by RPM scriptlet Rewrite kdumpctl reset-crashkernel KERNEL_PATH as kdumpctl reset-crashkernel [--fadump=[on\|off\|nocma]] [--kernel=path_to_kernel] [--reboot] This interface would reset a specific kernel to the default crashkernel value given the kernel path. And it also supports grubby's syntax so there are the following special cases, - if --kernel not specified, - use KDUMP_KERNELVER if it's defined in /etc/sysconfig/kdump - otherwise use current running kernel, i.e. `uname -r` - if --kernel=DEFAULT, the default boot kernel is chosen - if --kernel=ALL, all kernels would have its crashkernel reset to the default value and the /etc/default/grub is updated as well --fadump=[on\|off\|nocma] toggles fadump on/off for the kernel provided in KERNEL_PATH. If --fadump is omitted, the dump mode is determined by parsing the kernel command line for the kernel(s) to update. CoreOS/Atomic/Silverblue needs to be treated as a special case because, - "rpm-ostree kargs" is used to manage kernel command line parameters so --kernel doesn't make sense and there is no need to find current running kernel - "rpm-ostree kargs" itself would prompt the user to reboot the system after modify the kernel command line parameter - POWER is not supported so we can assume the dump mode is always kdump This interface will also be called by kexec-tools RPM scriptlets [1] to reset crashkernel. Note the support of crashkenrel.default is dropped. [1] https://docs.fedoraproject.org/en-US/packaging-guidelines/Scriptlets/ Reviewed-by: Pingfan Liu <piliu@redhat.com> Reviewed-by: Philipp Rudo <prudo@redhat.com> Signed-off-by: Coiby Xu <coxu@redhat.com>	2022-01-05 09:40:24 +08:00

1 2 3 4 5

237 Commits