Commit Graph

226 Commits

Author SHA1 Message Date
Coiby Xu
fdad7d9869 Skip reading /etc/defaut/grub for s390x
Currently, updating kexec-tools on s390x gives the warning
sed: can't read /etc/default/grub: No such file or directory

This happens because s390x doesn't use GRUB and /etc/default/grub
doesn't exist. We need to skip both reading and writing to
/etc/default/grub.

Reported-by: Jie Li <jieli@redhat.com>
Reviewed-by: Philipp Rudo <prudo@redhat.com>
Signed-off-by: Coiby Xu <coxu@redhat.com>
2022-10-27 14:42:27 +08:00
Coiby Xu
6ce4b85bb3 Include the memory overhead cost of cryptsetup when estimating the memory requirement for LUKS-encrypted target
Currently, "kdumpctl estimate" neglects the memory overhead cost of
cryptsetup itself. Unfortunately, there is no golden formula to
calculate the overhead cost [1]. So estimate the overhead cost as 50M
for aarch64 and 20M for other architectures based on the following
empirical data,

| Overhead (M) | OS                                        | arch    |
| ------------ | ----------------------------------------- | ------- |
| 14.1         | RHEL-9.2.0-20220829.d.1                   | ppc64le |
| 14           | Fedora-37-20220830.n.0 Everything ppc64le | ppc64le |
| 17           | Fedora 36                                 | ppc64le |
| 8.8          | Fedora 35                                 | s390x   |
| 10.1         | Fedora-Rawhide-20220829.n.0, fc38         | s390x   |
| 42           | Fedora-Rawhide-20220829.n.0, fc38         | arch64  |
| 40           | F35                                       | arch64  |
| 42           | F36                                       | arch64  |
| 42           | Fedora-Rawhide-20220901.n.0               | arch64  |
| 10           | F35                                       | x86_64  |
| 10           | Fedora-Rawhide-20220901.n.0               | x86_64  |
| 11           | Fedora-Rawhide-20220901.n.0               | x86_64  |

[1] https://lore.kernel.org/cryptsetup/20220616044339.376qlipk5h2omhx2@Rk/T/#u

Fixes: e9e6a2c ("kdumpctl: Add kdumpctl estimate")
Signed-off-by: Coiby Xu <coxu@redhat.com>
Reviewed-by: Philipp Rudo <prudo@redhat.com>
2022-10-26 15:38:21 +08:00
Coiby Xu
50a8461fc7 Choosing the most memory-consuming key slot when estimating the
memory requirement for LUKS-encrypted target

When there are multiple key slots, "kdumpctl estimate" uses the least
memory-consuming key slot. For example, when there are two memory slots
created with --pbkdf-memory=1048576 (1G) and --pbkdf-memory=524288 (512M),
"kdumpctl estimate" thinks the extra memory requirement is only 512M.
This will of course lead to OOM if the user uses the more
memory-consuming key slot. Fix it by sorting in reverse order.

Fixes: e9e6a2c ("kdumpctl: Add kdumpctl estimate")
Signed-off-by: Coiby Xu <coxu@redhat.com>

Reviewed-by: Lichen Liu <lichliu@redhat.com>
Signed-off-by: Coiby Xu <coxu@redhat.com>
2022-10-26 15:34:08 +08:00
Coiby Xu
15122b3f98 Fix grep warnings "grep: warning: stray \ before -"
Latest grep (3.8) warnings about unneeded backslashes when building
kdump initrd [1],
    kdump: Rebuilding /boot/initramfs-6.0.0-0.rc5.a335366bad13.40.test.fc38.aarch64kdump.img
    grep: warning: stray \ before -
    grep: warning: stray \ before -
    grep: warning: stray \ before -
    grep: warning: stray \ before -
    grep: warning: stray \ before -

Some warnings can be avoided by using "sed -n" to remove grep and the
others can use the -- argument.

[1] https://s3.us-east-1.amazonaws.com/arr-cki-prod-datawarehouse-public/datawarehouse-public/2022/09/17/redhat:643020269/build_aarch64_redhat:643020269_aarch64/tests/4/results_0001/job.01/recipes/12617739/tasks/5/logs/taskout.log

Reported-by: Baoquan He <bhe@redhat.com>
Signed-off-by: Coiby Xu <coxu@redhat.com>
Suggested-by: Philipp Rudo <prudo@redhat.com>
Reviewed-by: Philipp Rudo <prudo@redhat.com>
2022-10-26 14:16:04 +08:00
Coiby Xu
e218128e28 Only try to reset crashkernel for osbuild during package install
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2060319

Currently, kexec-tools tries to reset crashkernel when using anaconda to
install the system. But grubby isn't ready and complains that,
  10:33:17,631 INF packaging: Configuring (running scriptlet for): kernel-core-5.14.0-70.el9.x86_64 1645746534 03dcd32db234b72440ee6764d59b32347c5f0cd98ac3fb55beb47214a76f33b4
  10:34:16,696 INF dnf.rpm: grep: /boot/grub2/grubenv: No such file or directory
  grep: /boot/grub2/grubenv: No such file or directory

We only need to try resetting crashkernel for osbuild. Skip it for other
cases. To tell if it's package install instead of package upgrade, make
use of %pre to write a file /tmp/kexec-tools-install when "$1 == 1" [1].

[1] https://docs.fedoraproject.org/en-US/packaging-guidelines/Scriptlets/#_syntax

Reported-by: Jan Stodola <jstodola@redhat.com>
Signed-off-by: Coiby Xu <coxu@redhat.com>
Reviewed-by: Lichen Liu <lichenliu@redhat.com>
2022-10-20 13:54:10 +08:00
Coiby Xu
a7ead187a4 Prefix reset-crashkernel-{for-installed_kernel,after-update} with underscore
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2048690

To indicate they are for internal use only, underscore them.

Reported-by: rcheerla@redhat.com
Signed-off-by: Coiby Xu <coxu@redhat.com>
Reviewed-by: Lichen Liu <lichenliu@redhat.com>
2022-10-20 13:54:10 +08:00
Tao Liu
c743881ae6 virtiofs support for kexec-tools
This patch add virtiofs support for kexec-tools by introducing a new option
for /etc/kdump.conf:

virtiofs myfs

Where myfs is a variable tag name specified in qemu cmdline
"-device vhost-user-fs-pci,tag=myfs".

The patch covers the following cases:
1) Dumping VM's vmcore to a virtiofs shared directory;
2) When the VM's rootfs is a virtiofs shared directory and dumping the
   VM's vmcore to its subdirectory, such as /var/crash;
3) The combination of case 1 & 2: The VM's rootfs is a virtiofs shared
   directory and dumping the VM's vmcore to another virtiofs shared
   directory.

Case 2 & 3 need dracut >= 057, otherwise VM cannot boot from virtiofs
shared rootfs. But it is not the issue of kexec-tools.

Reviewed-by: Philipp Rudo <prudo@redhat.com>
Signed-off-by: Tao Liu <ltao@redhat.com>
2022-09-29 12:22:49 +08:00
Lichen Liu
4edcd9a400 kdumpctl: make the kdump.log root-readable-only
Decrease the risk that of leaking information that could potentially
be used to exploit the crash further (think location of keys).

Signed-off-by: Lichen Liu <lichliu@redhat.com>
Acked-by: Coiby Xu <coxu@redhat.com>
2022-09-06 20:21:31 +08:00
Coiby Xu
58eef4582a remove useless --zipl when calling grubby to update kernel command line
"grubby --zipl" only takes effect when setting default kernel. It's
useless to add "--zipl" when updating kernel command line. Also rename
_update_grub to _update_kernel_cmdline since s390x doesn't use GRUB.

Reviewed-by: Philipp Rudo <prudo@redhat.com>
Signed-off-by: Coiby Xu <coxu@redhat.com>
2022-08-03 11:09:45 +08:00
Coiby Xu
e8ae897595 skip updating /etc/default/grub for s390x
Resolves: bz2104534

When running "kdumpctl reset-crashkernel --kernel=ALL" on s390x,
sed: can't read /etc/default/grub: No such file or directory
sed: can't read /etc/default/grub: No such file or directory

This happens because s390x doesn't use the grub bootloader and
/etc/default/grub doesn't exist.

Reported-by: smitterl@redhat.com
Reviewed-by: Philipp Rudo <prudo@redhat.com>
Signed-off-by: Coiby Xu <coxu@redhat.com>
2022-08-03 11:09:37 +08:00
Coiby Xu
da0ca0d205 Allow to update kexec-tools using virt-customize for cloud base image
Resolves: bz2089871

Currently, kexec-tools can't be updated using virt-customize because
older version of kdumpctl can't acquire instance lock for the
get-default-crashkernel subcommand. The reason is /var/lock is linked to
/run/lock which however doesn't exist in the case of virt-customize.

This patch fixes this problem by using /tmp/kdump.lock as the lock
file if /run/lock doesn't exist.

Note
1. The lock file is now created in /run/lock instead of /var/run/lock since
   Fedora has adopted adopted /run [2] since F15.
2. %pre scriptlet now always return success since package update won't
   be blocked

[1] https://fedoraproject.org/wiki/Features/var-run-tmpfs

Fixes: 0adb0f4 ("try to reset kernel crashkernel when kexec-tools updates the default crashkernel value")

Reported-by: Nicolas Hicher <nhicher@redhat.com>
Suggested-by: Laszlo Ersek <lersek@redhat.com>
Suggested-by: Philipp Rudo <prudo@redhat.com>
Signed-off-by: Coiby Xu <coxu@redhat.com>
Reviewed-by: Philipp Rudo <prudo@redhat.com>
2022-08-02 18:36:34 +08:00
Pingfan Liu
d593bfa6fc KDUMP_COMMANDLINE: remove irqpoll parameter on aws aarch64 platform
Currently, kdump may experience failure on some aws aarch64 platform.
The final scenario is:

    [   79.145089] printk: console [ttyS0] disabled
Then the system has no response any more. And after reboot, there is no
vmcore generated under /var/crash/. More detail [1].

In a short word, it is caused by the irqpoll policy and some unknown
acpi issue. The serial device is hot-removed as a pci device.

More detailed, the irqpoll policy demands to iterate over all interrupt
handler, if the interrupt line is shared, then the handler is
dispatched. And acpi handler acpi_irq() is on a shared interrupt line,
so it is called.  But for some unknown reason, the acpi hardware regs
hold wrong state, and the acpi driver decides that a hot-removed event
happens on a pci slot, which finally removes the pci serial device.

To tackle this issue by removing the irqpoll parameter on aws aarch64
platform, until the real root cause in acpi is found and resolved.

[1]: https://bugzilla.redhat.com/show_bug.cgi?id=2080468#c0

Signed-off-by: Pingfan Liu <piliu@redhat.com>

Acked-by: Coiby Xu <coxu@redhat.com>
2022-07-21 19:03:37 +08:00
Dusty Mabe
980f10aa40 kdump-lib: clear up references to Atomic/CoreOS
There are many variants on OSTree based systems these days so
we should probably refer to the class of systems as "OSTree
based systems". Also, Atomic Host is dead.

Signed-off-by: Dusty Mabe <dusty@dustymabe.com>
Acked-by: Coiby Xu <coxu@redhat.com>
2022-06-30 16:00:06 +08:00
Coiby Xu
b97310428f unit tests: prepare for kdumpctl and kdump-lib.sh to be unit-tested
Currently there are two issues with unit-testing the functions defined
in kdumpctl and other shell scripts after sourcing them,
  - kdumpctl would call main which requires root permission and would
    create single instance lock (/var/lock/kdump)
  - kdumpctl and other shell scripts directly source files under /usr/lib/kdump/

When ShellSpec load a script via "Include", it defines the__SOURCED__
variable. By making use of __SOURCED__, we can
1. let kdumpctl not call main when kdumpctl is "Include"d by ShellSpec
2. instruct kdumpctl and kdump-lib.sh to source the files in the repo
   when running ShelSpec tests

Note coverage/ is added to .gitignore because ShellSpec generates code
coverage results in this folder.

Reviewed-by: Philipp Rudo <prudo@redhat.com>
Signed-off-by: Coiby Xu <coxu@redhat.com>
2022-04-14 11:44:12 +08:00
Philipp Rudo
55b5c4e2b0 kdumpctl: simplify local_fs_dump_target
Make use of the new ${OPT[]} array and simplify local_fs_dump_target to
remove one more file operations.

While at it rename the local_fs_dump_target to is_local_target

Signed-off-by: Philipp Rudo <prudo@redhat.com>
Reviewed-by: Tao Liu <ltao@redhat.com>
Reviewed-by: Coiby Xu <coxu@redhat.com>
2022-04-02 16:24:32 +08:00
Philipp Rudo
ac5968218f kdumpctl: remove kdump_get_conf_val in save_raw
With the introduction of ${OPT[fstype]} this call to kdump_get_conf_val
can be removed now as well.

Signed-off-by: Philipp Rudo <prudo@redhat.com>
Reviewed-by: Tao Liu <ltao@redhat.com>
Reviewed-by: Coiby Xu <coxu@redhat.com>
2022-04-02 16:24:32 +08:00
Philipp Rudo
5118daf2ff kdumpctl: drop DUMP_TARGET variable
The variable is only used for ssh dump targets. Furthermore it is
identical to the value stored in ${OPT[_target]}. Thus drop DUMP_TARGET and
use ${OPT[_target]} instead.

In order to be able to distinguish between the different target types
introduce the internal ${OPT[_fstype]}.

Signed-off-by: Philipp Rudo <prudo@redhat.com>
Reviewed-by: Tao Liu <ltao@redhat.com>
Reviewed-by: Coiby Xu <coxu@redhat.com>
2022-04-02 16:24:32 +08:00
Philipp Rudo
a859abe365 kdumpctl: drop SSH_KEY_LOCATION variable
The variable is only used for ssh dump targets. Furthermore it is
identical to the value stored in ${OPT[sshkey]}. Thus drop
SSH_KEY_LOCATION and use ${OPT[sshkey]} instead.

Signed-off-by: Philipp Rudo <prudo@redhat.com>
Reviewed-by: Tao Liu <ltao@redhat.com>
Reviewed-by: Coiby Xu <coxu@redhat.com>
2022-04-02 16:24:32 +08:00
Philipp Rudo
0460f0a768 kdumpctl: drop SAVE_PATH variable
The variable is only used for ssh dump targets. Furthermore it is
identical to the value stored in ${OPT[path]}. Thus drop SAVE_PATH and
use ${OPT[path]} instead.

Also make sure that ${OPT[path]} is always set to the default value when
no entry in kdump.conf is found.

Signed-off-by: Philipp Rudo <prudo@redhat.com>
Reviewed-by: Tao Liu <ltao@redhat.com>
Reviewed-by: Coiby Xu <coxu@redhat.com>
2022-04-02 16:24:32 +08:00
Philipp Rudo
edb1d04425 kdumpctl: reduce file operations on kdump.conf
Every call to kdump_get_conf_val parses kdump.conf although the file has
already been parsed in check_config. Thus store the values parsed in
check_config in an array and use them later instead of re-parsing the
file over and over again.

While at it rename check_config to parse_config.

Signed-off-by: Philipp Rudo <prudo@redhat.com>
Reviewed-by: Tao Liu <ltao@redhat.com>
Reviewed-by: Coiby Xu <coxu@redhat.com>
2022-04-02 16:24:32 +08:00
Philipp Rudo
4adf6d3cc8 kdumpctl: merge check_ssh_config into check_config
check_config and check_ssh_config both parse /etc/kdump.conf and are
usually used together. The difference between both is that
check_ssh_config does some extra checks on the format of the provided
ssh destination but ignores invalid or deprecated options in the config.
Thus merge check_ssh_config into check_config. Leave the additional
checks on the ssh destination in check_ssh_config but treat it like the
checks done for e.g. the failure_action.

This slightly changes the behavior of 'kdumpctl propagate', which now
fails if kdump.conf contains an invalid value unrelated to ssh. This
change in behavior isn't problematic because 'kdumpctl propagate' always
needs to be followed by a 'kdumpctl start' to have a working kdump
environment. For the situations where 'propagate' fails now the 'start'
would have failed in the past. So the failure only moved one step ahead
in the sequence.

While at it drop check_ssh_target and call check_and_wait_network_ready
directly.

Signed-off-by: Philipp Rudo <prudo@redhat.com>
Reviewed-by: Tao Liu <ltao@redhat.com>
Reviewed-by: Coiby Xu <coxu@redhat.com>
2022-04-02 16:24:32 +08:00
Philipp Rudo
e3fa367840 kdumpctl: simplify propagate_ssh_key
The function has multiple problems:

1) SSH_{USER,SERVER} aren't defined local
2) Weird use of cut and sed to parse the DUMP_TARGET for the user and
   host although check_ssh_config guarantees that it has the format
   <user>@<host>.
3) Unnecessary use of a variable for the return value
4) Weird behavior to first unpack the DUMP_TARGET to SSH_USER and
   SSH_SERVER and then putting it back together again
5) Definition of variable errmsg that is only used once but breaks
   grep-ability of error message.
6) Wrong order when redirecting output of ssh-keygen, see SC2069 [1]

Fix them now.

While at it also improve the error messages in the function.

[1] https://www.shellcheck.net/wiki/SC2069

Signed-off-by: Philipp Rudo <prudo@redhat.com>
Reviewed-by: Tao Liu <ltao@redhat.com>
Reviewed-by: Coiby Xu <coxu@redhat.com>
2022-04-02 16:24:32 +08:00
Philipp Rudo
b802dbff9f kdumpctl: forbid aliases from ssh config
For ssh targets kdumpctl only verifies that the config value has the
correct <user>@<host> format itself. For all other tests, e.g. if the
destination can be reached, it relies on ssh. This allows users to
provide a <host> that isn't the proper hostname but an alias defined in
the ssh_config without failing the tests. If this is done
dracut-module-setup.sh:kdump_get_remote_ip will fail to obtain the
targets ip address. This failure is not detected and thus will not fail
the initramfs creation. The resulting initramfs however doesn't have the
necessary information for setting up the network and thus will fail to
boot.

Prevent the use of alias hostnames by verifying that the given hostname
is the same one ssh would use after parsing the ssh_config.

Note: Don't use getent ahosts to verify that the given host can be
resolved as this requires the network to be up which cannot be
guaranteed when the kdump.conf is parsed.

Signed-off-by: Philipp Rudo <prudo@redhat.com>
Reviewed-by: Tao Liu <ltao@redhat.com>
Reviewed-by: Coiby Xu <coxu@redhat.com>
2022-04-02 16:24:32 +08:00
Philipp Rudo
247b3dd297 kdumpctl: fix comment in check_and_wait_network_ready
The time out was increased to 180 seconds in 680c0d3 ("kdumpctl:
distinguish the failed reason of ssh"). Update the comment to reflect
that change.

Signed-off-by: Philipp Rudo <prudo@redhat.com>
Reviewed-by: Tao Liu <ltao@redhat.com>
Reviewed-by: Coiby Xu <coxu@redhat.com>
2022-04-02 16:24:32 +08:00
Philipp Rudo
7cd3f232d5 kdump-lib-initramfs: merge definitions for default ssh key
There are currently three identical definitions for the default ssh key.
Combine them into one in kdump-lib-initramfs.sh.

Signed-off-by: Philipp Rudo <prudo@redhat.com>
Reviewed-by: Tao Liu <ltao@redhat.com>
Reviewed-by: Coiby Xu <coxu@redhat.com>
2022-04-02 16:24:32 +08:00
Philipp Rudo
b49083126f kdumpctl: remove unnecessary uses of $?
Signed-off-by: Philipp Rudo <prudo@redhat.com>
Reviewed-by: Tao Liu <ltao@redhat.com>
Reviewed-by: Coiby Xu <coxu@redhat.com>
2022-04-02 16:24:32 +08:00
Philipp Rudo
8736aa5bb3 kdumpctl/estimate: Fix unnecessary warning
do_estimate prints the warning that the reserved crashkernel is lower
than the recommended one even then when both values are identical. This
might cause confusion. So omit printing the warning when both values are
equal.

Signed-off-by: Philipp Rudo <prudo@redhat.com>
Acked-by: Coiby Xu <coxu@redhat.com>
2022-04-02 11:32:49 +08:00
Lichen Liu
7141d044c8 kdumpctl: sync the $TARGET_INITRD after rebuild
There is a system-wide sync call at the end of mkdumprd, move it to
kdumpctl after rebuild initrd and add another one for mkfadumprd.
Sync only the $TARGET_INITRD to avoid a system-wide sync taking too
long on a system with high disk activity.

Also update the sync in kdumpctl:restore_default_initrd which will
mv the $DEFAULT_INITRD_BAK to $DEFAULT_INITRD.

Signed-off-by: Lichen Liu <lichliu@redhat.com>
Reviewed-by: Philipp Rudo <prudo@redhat.com>
2022-03-01 17:54:29 +08:00
Coiby Xu
6d4062a936 try to update the crashkernel in GRUB_ETC_DEFAULT after kexec-tools updates the default crashkernel value
If GRUB_ETC_DEFAULT use crashkernel=auto or
crashkernel=OLD_DEFAULT_CRASHKERNEL, it should be updated as well.

Add a helper function to read kernel cmdline parameter from
GRUB_ETC_DEFAULT. This function is used to read kernel cmdline
parameter like fadump or crashkernel.

Reviewed-by: Philipp Rudo <prudo@redhat.com>
2022-03-01 10:29:20 +08:00
Coiby Xu
37f4f2c1f6 address the case where there are multiple values for the same kernel arg
There is the case where there are multiple entries of the same parameter on
the command line, e.g.
GRUB_CMDLINE_LINUX="crashkernel=110M crashkernel=220M fadump=on crashkernel=330M".

In such an situation _update_kernel_cmdline_in_grub_etc_default only
updates/removes the last entry which is usually not what you want as the
kernel (for crashkernel) takes the last entry it can find.

Thus make sure the case with multiple entries of the same parameter is
handled properly by removing all occurrences of given parameter first.

Note
1. sed command group and conditional control has been used to get rid of
   grep.
2. Fully supporting kernel cmdline as documented in
   Documentation/admin-guide/kernel-parameters.rst is complex and in
   foreseeable future a full implementation is not needed. So simply
   document the unsupported cases instead.

Fixes: 140da74 ("rewrite reset_crashkernel to support fadump and to used by RPM scriptlet")

Reported-by: Philipp Rudo <prudo@redhat.com>
Suggested-by: Philipp Rudo <prudo@redhat.com>
Reviewed-by: Philipp Rudo <prudo@redhat.com>
2022-03-01 10:28:53 +08:00
Coiby Xu
41b8f9528c fix incorrect usage of _get_all_kernels_from_grubby
It's found that the kernel cmdline crashkernel=auto doesn't get updated
when upgrading kexec-tools. This happens because _get_all_kernels_from_grubby
is called with no argument by reset_crashkernel_after_update. When retrieving
all kernel paths on the system, "grubby --info ALL" should be used. Fix this
error by passing "ALL" argument.

Fixes: 0adb0f4 ("try to reset kernel crashkernel when kexec-tools updates the default crashkernel value")

Reported-by: Jie Li <jieli@redhat.com>
Signed-off-by: Coiby Xu <coxu@redhat.com>
Reviewed-by: Tao Liu <ltao@redhat.com>
2022-02-14 10:34:55 +08:00
Coiby Xu
5111c01334 fix the mistake of swapping function parameters of read_proc_environ_var
_is_osbuild fails because it expects the 1st and 2nd function parameter
to be the environment variable and environ file path respectively. Fix
it by swapping the parameters in read_proc_environ_var.

Note the osbuild environ file path is defined in _OSBUILD_ENVIRON_PATH
so _is_osbuild can be unit-tested by overwriting _OSBUILD_ENVIRON_PATH.

Fixes: 6a3ce83 ("fix the error of parsing the container environ variable for osbuild")
Signed-off-by: Coiby Xu <coxu@redhat.com>
Reviewed-by: Pingfan Liu <piliu@redhat.com>
2022-02-08 10:42:40 +08:00
Coiby Xu
6a3ce83a60 fix the error of parsing the container environ variable for osbuild
The environment variable entries in /proc/[pid]/environ are separated by
null bytes instead of by spaces. Update the sed regex to fix this issue.

Note that,
  1. this patch also fixes a issue which is kdumpctl would try to reset
     crashkernel even osbuild has provided custom crashkernel value.
  2. kernel hook 92-crashkernel.install installed by kexec-tools is
     guaranteed to be ran by kernel-install. kexec-tools doesn't recommend
     kernel so there is no guarantee kernel is installed after kexec-tools.
     But dnf invokes kernel-install in the posttrans scriptlet (of kernel-core)
     which is always ran after all packages including kexec-tools and kernel
     in a dnf transaction.
  3. To be able to do unit tests, the logic of reading environment variable
     has been extracted as a separate function.

Fixes: ddd428a ("set up kernel crashkernel for osbuild in kernel hook")
Signed-off-by: Coiby Xu <coxu@redhat.com>
Reviewed-by: Pingfan Liu <piliu@redhat.com>
Reviewed-by: Philipp Rudo <prudo@redhat.com>
2022-01-26 08:32:06 +08:00
Coiby Xu
ae0cbdf34a fix "kdump: Invalid kdump config option auto_reset_crashkernel" error
kdumpctl only accepts a specified set of options. Add
auto_reset_crashkernel to this set.

Fixes: 73ced7f ("introduce the auto_reset_crashkernel option to kdump.conf")
Signed-off-by: Coiby Xu <coxu@redhat.com>
Acked-by: Tao Liu <ltao@redhat.com>
2022-01-07 12:20:37 +08:00
Coiby Xu
d5c31605f3 use grep -s to suppress error messages about nonexistent or unreadable files
When a file doesn't exist or isn't readable, grep complains as follows,

grep: /proc/cmdline: No such file or directory
grep: /etc/kernel/cmdline: No such file or directory

/proc/cmdline doesn't exist when installing package for an OS image and
/etc/kernel/cmdline may not exist if osbuild doesn't want set custom
kernel cmdline.

Use "-s" to suppress the error messages.

Fixes: 0adb0f4 ("try to reset kernel crashkernel when kexec-tools updates the default crashkernel value")
Fixes: ddd428a ("set up kernel crashkernel for osbuild in kernel hook")
Signed-off-by: Coiby Xu <coxu@redhat.com>
Acked-by: Tao Liu <ltao@redhat.com>
2022-01-07 12:20:21 +08:00
Coiby Xu
ddd428a1d0 set up kernel crashkernel for osbuild in kernel hook
osbuild is a tool to build OS images. It uses bwrap to install packages
inside a sandbox/container. Since the kernel package recommends
kexec-tools which in turn recommends grubby, the installation order would
be grubby -> kexec-tools -> kernel. So we can use the kernel hook
92-crashkernel.install provided by kexec-tools to set up kernel
crashkernel for the target OS image. But in osbuild's case, there is no
current running kernel and running `uname -r` in the container/sandbox
actually returns the host kernel release. To set up kernel crashkernel for
the OS image built by osbuild, a different logic is needed.

We will check if kernel hook is running inside the osbuild container
then set up kernel crashkernel only if osbuild hasn't specified a
custome value. osbuild exposes [1] the container=bwrap-osbuild environment
variable. According to [2], the environment variable is not inherited down
the process tree, so we need to check /proc/1/environ to detect this
environment variable to tell if the kernel hook is running inside a
bwrap-osbuild container. After that we need to know if osbuild wants to use
custom crashkernel value. This is done by checking if /etc/kernel/cmdline
has crashkernel set [3]. /etc/kernel/cmdline is written before packages
are installed.

[1] https://github.com/osbuild/osbuild/pull/926
[2] https://systemd.io/CONTAINER_INTERFACE/
[3] https://bugzilla.redhat.com/show_bug.cgi?id=2024976#c5

Reviewed-by: Pingfan Liu <piliu@redhat.com>
Reviewed-by: Philipp Rudo <prudo@redhat.com>
Signed-off-by: Coiby Xu <coxu@redhat.com>
2022-01-05 09:40:24 +08:00
Coiby Xu
5e8c751c39 reset kernel crashkernel for the special case where the kernel is updated right after kexec-tools
When kexec-tools updates the default crashkernel value, it will try to
reset the existing installed kernels including the currently running
kernel. So the running kernel could have different kernel cmdline
parameters from /proc/cmdline. When installing a kernel after updating
kexec-tools, /usr/lib/kernel/install.d/20-grub.install would be called
by kernel-install [1] which would use /proc/cmdline to set up new kernel's
cmdline. To address this special case, reset the new kernel's crashkernel
and fadump value to the value that would be used by running kernel after
rebooting by the installation hook. One side effect of this commit is it
would reset the installed kernel's crashkernel even currently running kernel
don't use the default crashkernel value after rebooting. But I think this
side effect is a benefit for the user.

The implementation depends on kernel-install which run the scripts in
/usr/lib/kernel/install.d passing the following arguments,

  add KERNEL-VERSION $BOOT/MACHINE-ID/KERNEL-VERSION/ KERNEL-IMAGE [INITRD-FILE ...]

An concrete example is given as follows,
  add 5.11.12-300.fc34.x86_64 /boot/e986846f63134c7295458cf36300ba5b/5.11.12-300.fc34.x86_64 /lib/modules/5.11.12-300.fc34.x86_64/vmlinuz

kernel-install could be started by the kernel package's RPM scriplet [2].
As mentioned in previous commit "try to reset kernel crashkernel when
kexec-tools updates the default crashkernel value", kdumpctl has difficulty
running in RPM scriptlet fore CoreOS. But rpm-ostree ignores all kernel hooks,
there is no need to disable the kernel hook for CoreOS/Atomic/Silverblue. But a
collaboration between rpm-ostree and kexec-tools is needed [3] to take care
of this special case.

Note the crashkernel.default support is dropped.

[1] https://www.freedesktop.org/software/systemd/man/kernel-install.html
[2] https://src.fedoraproject.org/rpms/kernel/blob/rawhide/f/kernel.spec#_2680
[3] https://github.com/coreos/rpm-ostree/issues/2894

Reviewed-by: Pingfan Liu <piliu@redhat.com>
Reviewed-by: Philipp Rudo <prudo@redhat.com>
Signed-off-by: Coiby Xu <coxu@redhat.com>
2022-01-05 09:40:24 +08:00
Coiby Xu
0adb0f4a8c try to reset kernel crashkernel when kexec-tools updates the default crashkernel value
kexec-tools could update the default crashkernel value.
When auto_reset_crashkernel=yes, reset kernel to new crashkernel
value in the following two cases,
 - crashkernel=auto is found in the kernel cmdline
 - the kernel crashkernel was previously set by kexec-tools i.e.
   the kernel is using old default crashkernel value

To tell if the user is using a custom value for the kernel crashkernel
or not, we assume the user would never use the default crashkernel value
as custom value. When kexec-tools gets updated,
 1. save the default crashkernel value of the older package to
    /tmp/crashkernel (for POWER system, /tmp/crashkernel_fadump is saved
    as well).
 2. If auto_reset_crashkernel=yes, iterate all installed kernels.
    For each kernel, compare its crashkernel value with the old
    default crashkernel and reset it if yes

The implementation makes use of two RPM scriptlets [2],
 - %pre is run before a package is installed so we can use it to save
   old default crashkernel value
 - %post is run after a package installed so we can use it to try to reset
   kernel crashkernel

There are several problems when running kdumpctl in the RPM scripts
for CoreOS/Atomic/Silverblue, for example, the lock can't be acquired by
kdumpctl, "rpm-ostree kargs" can't be run and etc.. So don't enable this
feature for CoreOS/Atomic/Silverblue.

Note latest shellcheck (0.8.0) gives false positives about the
associative array as of this commit. And Fedora's shellcheck is 0.7.2
and can't even correctly parse the shell code because of the associative
array.

[1] https://github.com/koalaman/shellcheck/issues/2399
[2] https://docs.fedoraproject.org/en-US/packaging-guidelines/Scriptlets/

Reviewed-by: Pingfan Liu <piliu@redhat.com>
Reviewed-by: Philipp Rudo <prudo@redhat.com>
Signed-off-by: Coiby Xu <coxu@redhat.com>
2022-01-05 09:40:24 +08:00
Coiby Xu
140da74a34 rewrite reset_crashkernel to support fadump and to used by RPM scriptlet
Rewrite kdumpctl reset-crashkernel KERNEL_PATH as
kdumpctl reset-crashkernel [--fadump=[on|off|nocma]]  [--kernel=path_to_kernel] [--reboot]

This interface would reset a specific kernel to the default crashkernel value
given the kernel path. And it also supports grubby's syntax so there are the
following special cases,
 - if --kernel not specified,
    - use KDUMP_KERNELVER if it's defined in /etc/sysconfig/kdump
    - otherwise use current running kernel, i.e. `uname -r`
 - if --kernel=DEFAULT, the default boot kernel is chosen
 - if --kernel=ALL, all kernels would have its crashkernel reset to the
   default value and the /etc/default/grub is updated as well

--fadump=[on|off|nocma] toggles fadump on/off for the kernel provided
in KERNEL_PATH. If --fadump is omitted, the dump mode is determined by
parsing the kernel command line for the kernel(s) to update.

CoreOS/Atomic/Silverblue needs to be treated as a special case because,
 - "rpm-ostree kargs" is used to manage kernel command line parameters
    so --kernel doesn't make sense and there is no need to find current
    running kernel
 - "rpm-ostree kargs" itself would prompt the user to reboot the system
   after modify the kernel command line parameter
 - POWER is not supported so we can assume the dump mode is always kdump

This interface will also be called by kexec-tools RPM scriptlets [1]
to reset crashkernel.

Note the support of crashkenrel.default is dropped.

[1] https://docs.fedoraproject.org/en-US/packaging-guidelines/Scriptlets/

Reviewed-by: Pingfan Liu <piliu@redhat.com>
Reviewed-by: Philipp Rudo <prudo@redhat.com>
Signed-off-by: Coiby Xu <coxu@redhat.com>
2022-01-05 09:40:24 +08:00
Coiby Xu
12ecbce359 fix incorrect usage of rpm-ostree to update kernel command line parameters
CoreOS/Atomic/Silverblue use "rpm-ostree kargs" to manage kernel command
line parameters.

Fixes: 86130ec ("kdumpctl: Add kdumpctl reset-crashkernel")

Reviewed-by: Pingfan Liu <piliu@redhat.com>
Reviewed-by: Philipp Rudo <prudo@redhat.com>
Signed-off-by: Coiby Xu <coxu@redhat.com>
2022-01-05 09:40:24 +08:00
Coiby Xu
945cbbd59b add helper functions to get kernel path by kernel release and the path of current running kernel
grubby --info=kernel-path or --add-kernel=kernel-path accepts a kernel
path (e.g. /boot/vmlinuz-5.14.14-200.fc34.x86_64) instead of kernel release
(e.g 5.14.14-200.fc34.x86_64). So we need to know the kernel path given
a kernel release. Although for Fedora/RHEL, the kernel path is
"/boot/vmlinuz-<KERNEL_RELEASE>", a path kernel could also be
/boot/<machine-id>/<KERNEL_RELEASE>/vmlinuz. So the most reliable way to
find the kernel path given a kernel release is to use "grubby --info".

For osbuild, a kernel path may not yet exist but it's valid for
"grubby --update-kernel=KERNEL_PATH". For example, "grubby -info" may
output something as follows,

index=0
kernel="/var/cache/osbuild-worker/osbuild-store/tmp/tmp2prywdy5object/tree/boot/vmlinuz-5.15.10-100.fc34.x86_64"
args="ro no_timer_check net.ifnames=0 console=tty1 console=ttyS0,115200n8"
root="UUID=76a22bf4-f153-4541-b6c7-0332c0dfaeac"
initrd="/var/cache/osbuild-worker/osbuild-store/tmp/tmp2prywdy5object/tree/boot/initramfs-5.15.10-100.fc34.x86_64.img"

There is no need to check if path like
/var/cache/osbuild-worker/osbuild-store/tmp/tmp2prywdy5object/tree/boot/vmlinuz-5.15.10-100.fc34.x86_64
physically exists.

Note these helper functions doesn't support CoreOS/Atomic/Silverblue
since grubby isn't used by them.

Reviewed-by: Pingfan Liu <piliu@redhat.com>
Reviewed-by: Philipp Rudo <prudo@redhat.com>
Signed-off-by: Coiby Xu <coxu@redhat.com>
2022-01-05 09:40:24 +08:00
Coiby Xu
3d2079c31c add helper functions to get dump mode
Add a helper function to get dump mode. The dump mode would be
 - fadump if fadump=on or fadump=nocma
 - kdump if fadump=off or empty fadump

Otherwise return 1.

Also add another helper function to return a kernel's dump mode.

Reviewed-by: Pingfan Liu <piliu@redhat.com>
Reviewed-by: Philipp Rudo <prudo@redhat.com>
Signed-off-by: Coiby Xu <coxu@redhat.com>
2022-01-05 09:40:24 +08:00
Coiby Xu
fb9e6838ab add a helper function to read kernel cmdline parameter from grubby --info
This helper function will be used to retrieve the value of kernel
cmdline parameters including crashkernel, fadump, swiotlb and etc.

Suggested-by: Philipp Rudo <prudo@redhat.com>
Reviewed-by: Pingfan Liu <piliu@redhat.com>
Signed-off-by: Coiby Xu <coxu@redhat.com>
2022-01-05 09:40:24 +08:00
Coiby Xu
796d0f6fd2 provide kdumpctl get-default-crashkernel for kdump_anaconda_addon and RPM scriptlet
Provide "kdumpctl get-default-crashkernel" for kdump_anaconda_addon
so crashkernel.default isn't needed.

When fadump is on, kdump_anaconda_addon would need to specify the dump
mode, i.e. "kdumpctl get-default-crashkernel fadump".

This interface would also be used by RPM scriptlet [1] to fetch default
crashkernel value.

[1] https://docs.fedoraproject.org/en-US/packaging-guidelines/Scriptlets/

Reviewed-by: Pingfan Liu <piliu@redhat.com>
Reviewed-by: Philipp Rudo <prudo@redhat.com>
Signed-off-by: Coiby Xu <coxu@redhat.com>
2022-01-05 09:40:24 +08:00
Kairui Song
546c81a205 kdumpctl: remove some legacy code
It seems the save_core function and vmcore detection was used a long
time ago when kdump shares same userspace in first and second kernel.
It's now heavily deprecated (only support cp, hardcoded path, dumpoops
no longer exists) and not used.

Now vmcore will never show up in first kernel for both kdump and fadump
case, and kdumpctl is only used in first kernel, so just remove them.

Signed-off-by: Kairui Song <kasong@tencent.com>
Acked-by: Coiby Xu <coxu@redhat.com>
2021-12-31 11:37:19 +08:00
Kairui Song
0e4b66b1ab bash scripts: reformat with shfmt
This is a batch update done with:
shfmt -s -w mkfadumprd mkdumprd kdumpctl *-module-setup.sh

Clean up code style and reduce code base size, no behaviour change.

Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Philipp Rudo <prudo@redhat.com>
2021-09-14 03:25:29 +08:00
Kairui Song
4f75e16700 bash scripts: declare and assign separately
Declare and assign separately to avoid masking return values:
https://github.com/koalaman/shellcheck/wiki/SC2155

Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Philipp Rudo <prudo@redhat.com>
2021-09-14 03:25:29 +08:00
Kairui Song
a4648fc851 bash scripts: fix redundant exit code check
As suggested by:
https://github.com/koalaman/shellcheck/wiki/SC2181

Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Philipp Rudo <prudo@redhat.com>
2021-09-14 03:25:29 +08:00
Kairui Song
86538ca6e2 bash scripts: fix variable quoting issue
Fixed quoting issues found by shellcheck, no feature
change. This should fix many errors when there is space
in any shell variables, eg. dump target's name/path/id.

False positives are marked with "# shellcheck disable=SCXXXX", for
example, args are expected to split so it should not be quoted.

And replaced some `cut -d ' ' -fX` with `awk '{print $X}'` since cut
is fragile, and doesn't work well with any quoted strings that have
redundant space.

Following quoting related issues are fixed (check the link
for example code and what could go wrong):

https://github.com/koalaman/shellcheck/wiki/SC2046
https://github.com/koalaman/shellcheck/wiki/SC2053
https://github.com/koalaman/shellcheck/wiki/SC2068
https://github.com/koalaman/shellcheck/wiki/SC2086
https://github.com/koalaman/shellcheck/wiki/SC2206

Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Philipp Rudo <prudo@redhat.com>
2021-09-14 03:25:29 +08:00
Kairui Song
70978c00e5 bash scripts: replace '[ ]' with '[[ ]]' for bash scripts
kdumpctl, mkdumprd, *-module-setup.sh only target bash, since they
only run in first kernel and depend on dracut, and dracut depends
on bash. So use '[[ ]]' to replace '[ ]'.

This is a batch update done with following command:
`sed -i -e 's/\(\s\)\[\s\([^]]*\)\s\]/\1\[\[\ \2 \]\]/g' kdumpctl, mkdumprd, *-module-setup.sh`
and replaced [ ... -a ... ] with [[ ... ]] && [[ ... ]] manually.

See https://tldp.org/LDP/abs/html/testconstructs.html for more details
on '[[ ]]', it's more versatile, safer, and slightly faster than '[ ]'.

This will also help shfmt to clean up the code in later commits.

Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Philipp Rudo <prudo@redhat.com>
2021-09-14 03:25:29 +08:00