Commit Graph

267 Commits

Author SHA1 Message Date
Coiby Xu
e8ae897595 skip updating /etc/default/grub for s390x
Resolves: bz2104534

When running "kdumpctl reset-crashkernel --kernel=ALL" on s390x,
sed: can't read /etc/default/grub: No such file or directory
sed: can't read /etc/default/grub: No such file or directory

This happens because s390x doesn't use the grub bootloader and
/etc/default/grub doesn't exist.

Reported-by: smitterl@redhat.com
Reviewed-by: Philipp Rudo <prudo@redhat.com>
Signed-off-by: Coiby Xu <coxu@redhat.com>
2022-08-03 11:09:37 +08:00
Coiby Xu
da0ca0d205 Allow to update kexec-tools using virt-customize for cloud base image
Resolves: bz2089871

Currently, kexec-tools can't be updated using virt-customize because
older version of kdumpctl can't acquire instance lock for the
get-default-crashkernel subcommand. The reason is /var/lock is linked to
/run/lock which however doesn't exist in the case of virt-customize.

This patch fixes this problem by using /tmp/kdump.lock as the lock
file if /run/lock doesn't exist.

Note
1. The lock file is now created in /run/lock instead of /var/run/lock since
   Fedora has adopted adopted /run [2] since F15.
2. %pre scriptlet now always return success since package update won't
   be blocked

[1] https://fedoraproject.org/wiki/Features/var-run-tmpfs

Fixes: 0adb0f4 ("try to reset kernel crashkernel when kexec-tools updates the default crashkernel value")

Reported-by: Nicolas Hicher <nhicher@redhat.com>
Suggested-by: Laszlo Ersek <lersek@redhat.com>
Suggested-by: Philipp Rudo <prudo@redhat.com>
Signed-off-by: Coiby Xu <coxu@redhat.com>
Reviewed-by: Philipp Rudo <prudo@redhat.com>
2022-08-02 18:36:34 +08:00
Pingfan Liu
d593bfa6fc KDUMP_COMMANDLINE: remove irqpoll parameter on aws aarch64 platform
Currently, kdump may experience failure on some aws aarch64 platform.
The final scenario is:

    [   79.145089] printk: console [ttyS0] disabled
Then the system has no response any more. And after reboot, there is no
vmcore generated under /var/crash/. More detail [1].

In a short word, it is caused by the irqpoll policy and some unknown
acpi issue. The serial device is hot-removed as a pci device.

More detailed, the irqpoll policy demands to iterate over all interrupt
handler, if the interrupt line is shared, then the handler is
dispatched. And acpi handler acpi_irq() is on a shared interrupt line,
so it is called.  But for some unknown reason, the acpi hardware regs
hold wrong state, and the acpi driver decides that a hot-removed event
happens on a pci slot, which finally removes the pci serial device.

To tackle this issue by removing the irqpoll parameter on aws aarch64
platform, until the real root cause in acpi is found and resolved.

[1]: https://bugzilla.redhat.com/show_bug.cgi?id=2080468#c0

Signed-off-by: Pingfan Liu <piliu@redhat.com>

Acked-by: Coiby Xu <coxu@redhat.com>
2022-07-21 19:03:37 +08:00
Dusty Mabe
980f10aa40 kdump-lib: clear up references to Atomic/CoreOS
There are many variants on OSTree based systems these days so
we should probably refer to the class of systems as "OSTree
based systems". Also, Atomic Host is dead.

Signed-off-by: Dusty Mabe <dusty@dustymabe.com>
Acked-by: Coiby Xu <coxu@redhat.com>
2022-06-30 16:00:06 +08:00
Coiby Xu
b97310428f unit tests: prepare for kdumpctl and kdump-lib.sh to be unit-tested
Currently there are two issues with unit-testing the functions defined
in kdumpctl and other shell scripts after sourcing them,
  - kdumpctl would call main which requires root permission and would
    create single instance lock (/var/lock/kdump)
  - kdumpctl and other shell scripts directly source files under /usr/lib/kdump/

When ShellSpec load a script via "Include", it defines the__SOURCED__
variable. By making use of __SOURCED__, we can
1. let kdumpctl not call main when kdumpctl is "Include"d by ShellSpec
2. instruct kdumpctl and kdump-lib.sh to source the files in the repo
   when running ShelSpec tests

Note coverage/ is added to .gitignore because ShellSpec generates code
coverage results in this folder.

Reviewed-by: Philipp Rudo <prudo@redhat.com>
Signed-off-by: Coiby Xu <coxu@redhat.com>
2022-04-14 11:44:12 +08:00
Philipp Rudo
55b5c4e2b0 kdumpctl: simplify local_fs_dump_target
Make use of the new ${OPT[]} array and simplify local_fs_dump_target to
remove one more file operations.

While at it rename the local_fs_dump_target to is_local_target

Signed-off-by: Philipp Rudo <prudo@redhat.com>
Reviewed-by: Tao Liu <ltao@redhat.com>
Reviewed-by: Coiby Xu <coxu@redhat.com>
2022-04-02 16:24:32 +08:00
Philipp Rudo
ac5968218f kdumpctl: remove kdump_get_conf_val in save_raw
With the introduction of ${OPT[fstype]} this call to kdump_get_conf_val
can be removed now as well.

Signed-off-by: Philipp Rudo <prudo@redhat.com>
Reviewed-by: Tao Liu <ltao@redhat.com>
Reviewed-by: Coiby Xu <coxu@redhat.com>
2022-04-02 16:24:32 +08:00
Philipp Rudo
5118daf2ff kdumpctl: drop DUMP_TARGET variable
The variable is only used for ssh dump targets. Furthermore it is
identical to the value stored in ${OPT[_target]}. Thus drop DUMP_TARGET and
use ${OPT[_target]} instead.

In order to be able to distinguish between the different target types
introduce the internal ${OPT[_fstype]}.

Signed-off-by: Philipp Rudo <prudo@redhat.com>
Reviewed-by: Tao Liu <ltao@redhat.com>
Reviewed-by: Coiby Xu <coxu@redhat.com>
2022-04-02 16:24:32 +08:00
Philipp Rudo
a859abe365 kdumpctl: drop SSH_KEY_LOCATION variable
The variable is only used for ssh dump targets. Furthermore it is
identical to the value stored in ${OPT[sshkey]}. Thus drop
SSH_KEY_LOCATION and use ${OPT[sshkey]} instead.

Signed-off-by: Philipp Rudo <prudo@redhat.com>
Reviewed-by: Tao Liu <ltao@redhat.com>
Reviewed-by: Coiby Xu <coxu@redhat.com>
2022-04-02 16:24:32 +08:00
Philipp Rudo
0460f0a768 kdumpctl: drop SAVE_PATH variable
The variable is only used for ssh dump targets. Furthermore it is
identical to the value stored in ${OPT[path]}. Thus drop SAVE_PATH and
use ${OPT[path]} instead.

Also make sure that ${OPT[path]} is always set to the default value when
no entry in kdump.conf is found.

Signed-off-by: Philipp Rudo <prudo@redhat.com>
Reviewed-by: Tao Liu <ltao@redhat.com>
Reviewed-by: Coiby Xu <coxu@redhat.com>
2022-04-02 16:24:32 +08:00
Philipp Rudo
edb1d04425 kdumpctl: reduce file operations on kdump.conf
Every call to kdump_get_conf_val parses kdump.conf although the file has
already been parsed in check_config. Thus store the values parsed in
check_config in an array and use them later instead of re-parsing the
file over and over again.

While at it rename check_config to parse_config.

Signed-off-by: Philipp Rudo <prudo@redhat.com>
Reviewed-by: Tao Liu <ltao@redhat.com>
Reviewed-by: Coiby Xu <coxu@redhat.com>
2022-04-02 16:24:32 +08:00
Philipp Rudo
4adf6d3cc8 kdumpctl: merge check_ssh_config into check_config
check_config and check_ssh_config both parse /etc/kdump.conf and are
usually used together. The difference between both is that
check_ssh_config does some extra checks on the format of the provided
ssh destination but ignores invalid or deprecated options in the config.
Thus merge check_ssh_config into check_config. Leave the additional
checks on the ssh destination in check_ssh_config but treat it like the
checks done for e.g. the failure_action.

This slightly changes the behavior of 'kdumpctl propagate', which now
fails if kdump.conf contains an invalid value unrelated to ssh. This
change in behavior isn't problematic because 'kdumpctl propagate' always
needs to be followed by a 'kdumpctl start' to have a working kdump
environment. For the situations where 'propagate' fails now the 'start'
would have failed in the past. So the failure only moved one step ahead
in the sequence.

While at it drop check_ssh_target and call check_and_wait_network_ready
directly.

Signed-off-by: Philipp Rudo <prudo@redhat.com>
Reviewed-by: Tao Liu <ltao@redhat.com>
Reviewed-by: Coiby Xu <coxu@redhat.com>
2022-04-02 16:24:32 +08:00
Philipp Rudo
e3fa367840 kdumpctl: simplify propagate_ssh_key
The function has multiple problems:

1) SSH_{USER,SERVER} aren't defined local
2) Weird use of cut and sed to parse the DUMP_TARGET for the user and
   host although check_ssh_config guarantees that it has the format
   <user>@<host>.
3) Unnecessary use of a variable for the return value
4) Weird behavior to first unpack the DUMP_TARGET to SSH_USER and
   SSH_SERVER and then putting it back together again
5) Definition of variable errmsg that is only used once but breaks
   grep-ability of error message.
6) Wrong order when redirecting output of ssh-keygen, see SC2069 [1]

Fix them now.

While at it also improve the error messages in the function.

[1] https://www.shellcheck.net/wiki/SC2069

Signed-off-by: Philipp Rudo <prudo@redhat.com>
Reviewed-by: Tao Liu <ltao@redhat.com>
Reviewed-by: Coiby Xu <coxu@redhat.com>
2022-04-02 16:24:32 +08:00
Philipp Rudo
b802dbff9f kdumpctl: forbid aliases from ssh config
For ssh targets kdumpctl only verifies that the config value has the
correct <user>@<host> format itself. For all other tests, e.g. if the
destination can be reached, it relies on ssh. This allows users to
provide a <host> that isn't the proper hostname but an alias defined in
the ssh_config without failing the tests. If this is done
dracut-module-setup.sh:kdump_get_remote_ip will fail to obtain the
targets ip address. This failure is not detected and thus will not fail
the initramfs creation. The resulting initramfs however doesn't have the
necessary information for setting up the network and thus will fail to
boot.

Prevent the use of alias hostnames by verifying that the given hostname
is the same one ssh would use after parsing the ssh_config.

Note: Don't use getent ahosts to verify that the given host can be
resolved as this requires the network to be up which cannot be
guaranteed when the kdump.conf is parsed.

Signed-off-by: Philipp Rudo <prudo@redhat.com>
Reviewed-by: Tao Liu <ltao@redhat.com>
Reviewed-by: Coiby Xu <coxu@redhat.com>
2022-04-02 16:24:32 +08:00
Philipp Rudo
247b3dd297 kdumpctl: fix comment in check_and_wait_network_ready
The time out was increased to 180 seconds in 680c0d3 ("kdumpctl:
distinguish the failed reason of ssh"). Update the comment to reflect
that change.

Signed-off-by: Philipp Rudo <prudo@redhat.com>
Reviewed-by: Tao Liu <ltao@redhat.com>
Reviewed-by: Coiby Xu <coxu@redhat.com>
2022-04-02 16:24:32 +08:00
Philipp Rudo
7cd3f232d5 kdump-lib-initramfs: merge definitions for default ssh key
There are currently three identical definitions for the default ssh key.
Combine them into one in kdump-lib-initramfs.sh.

Signed-off-by: Philipp Rudo <prudo@redhat.com>
Reviewed-by: Tao Liu <ltao@redhat.com>
Reviewed-by: Coiby Xu <coxu@redhat.com>
2022-04-02 16:24:32 +08:00
Philipp Rudo
b49083126f kdumpctl: remove unnecessary uses of $?
Signed-off-by: Philipp Rudo <prudo@redhat.com>
Reviewed-by: Tao Liu <ltao@redhat.com>
Reviewed-by: Coiby Xu <coxu@redhat.com>
2022-04-02 16:24:32 +08:00
Philipp Rudo
8736aa5bb3 kdumpctl/estimate: Fix unnecessary warning
do_estimate prints the warning that the reserved crashkernel is lower
than the recommended one even then when both values are identical. This
might cause confusion. So omit printing the warning when both values are
equal.

Signed-off-by: Philipp Rudo <prudo@redhat.com>
Acked-by: Coiby Xu <coxu@redhat.com>
2022-04-02 11:32:49 +08:00
Lichen Liu
7141d044c8 kdumpctl: sync the $TARGET_INITRD after rebuild
There is a system-wide sync call at the end of mkdumprd, move it to
kdumpctl after rebuild initrd and add another one for mkfadumprd.
Sync only the $TARGET_INITRD to avoid a system-wide sync taking too
long on a system with high disk activity.

Also update the sync in kdumpctl:restore_default_initrd which will
mv the $DEFAULT_INITRD_BAK to $DEFAULT_INITRD.

Signed-off-by: Lichen Liu <lichliu@redhat.com>
Reviewed-by: Philipp Rudo <prudo@redhat.com>
2022-03-01 17:54:29 +08:00
Coiby Xu
6d4062a936 try to update the crashkernel in GRUB_ETC_DEFAULT after kexec-tools updates the default crashkernel value
If GRUB_ETC_DEFAULT use crashkernel=auto or
crashkernel=OLD_DEFAULT_CRASHKERNEL, it should be updated as well.

Add a helper function to read kernel cmdline parameter from
GRUB_ETC_DEFAULT. This function is used to read kernel cmdline
parameter like fadump or crashkernel.

Reviewed-by: Philipp Rudo <prudo@redhat.com>
2022-03-01 10:29:20 +08:00
Coiby Xu
37f4f2c1f6 address the case where there are multiple values for the same kernel arg
There is the case where there are multiple entries of the same parameter on
the command line, e.g.
GRUB_CMDLINE_LINUX="crashkernel=110M crashkernel=220M fadump=on crashkernel=330M".

In such an situation _update_kernel_cmdline_in_grub_etc_default only
updates/removes the last entry which is usually not what you want as the
kernel (for crashkernel) takes the last entry it can find.

Thus make sure the case with multiple entries of the same parameter is
handled properly by removing all occurrences of given parameter first.

Note
1. sed command group and conditional control has been used to get rid of
   grep.
2. Fully supporting kernel cmdline as documented in
   Documentation/admin-guide/kernel-parameters.rst is complex and in
   foreseeable future a full implementation is not needed. So simply
   document the unsupported cases instead.

Fixes: 140da74 ("rewrite reset_crashkernel to support fadump and to used by RPM scriptlet")

Reported-by: Philipp Rudo <prudo@redhat.com>
Suggested-by: Philipp Rudo <prudo@redhat.com>
Reviewed-by: Philipp Rudo <prudo@redhat.com>
2022-03-01 10:28:53 +08:00
Coiby Xu
41b8f9528c fix incorrect usage of _get_all_kernels_from_grubby
It's found that the kernel cmdline crashkernel=auto doesn't get updated
when upgrading kexec-tools. This happens because _get_all_kernels_from_grubby
is called with no argument by reset_crashkernel_after_update. When retrieving
all kernel paths on the system, "grubby --info ALL" should be used. Fix this
error by passing "ALL" argument.

Fixes: 0adb0f4 ("try to reset kernel crashkernel when kexec-tools updates the default crashkernel value")

Reported-by: Jie Li <jieli@redhat.com>
Signed-off-by: Coiby Xu <coxu@redhat.com>
Reviewed-by: Tao Liu <ltao@redhat.com>
2022-02-14 10:34:55 +08:00
Coiby Xu
5111c01334 fix the mistake of swapping function parameters of read_proc_environ_var
_is_osbuild fails because it expects the 1st and 2nd function parameter
to be the environment variable and environ file path respectively. Fix
it by swapping the parameters in read_proc_environ_var.

Note the osbuild environ file path is defined in _OSBUILD_ENVIRON_PATH
so _is_osbuild can be unit-tested by overwriting _OSBUILD_ENVIRON_PATH.

Fixes: 6a3ce83 ("fix the error of parsing the container environ variable for osbuild")
Signed-off-by: Coiby Xu <coxu@redhat.com>
Reviewed-by: Pingfan Liu <piliu@redhat.com>
2022-02-08 10:42:40 +08:00
Coiby Xu
6a3ce83a60 fix the error of parsing the container environ variable for osbuild
The environment variable entries in /proc/[pid]/environ are separated by
null bytes instead of by spaces. Update the sed regex to fix this issue.

Note that,
  1. this patch also fixes a issue which is kdumpctl would try to reset
     crashkernel even osbuild has provided custom crashkernel value.
  2. kernel hook 92-crashkernel.install installed by kexec-tools is
     guaranteed to be ran by kernel-install. kexec-tools doesn't recommend
     kernel so there is no guarantee kernel is installed after kexec-tools.
     But dnf invokes kernel-install in the posttrans scriptlet (of kernel-core)
     which is always ran after all packages including kexec-tools and kernel
     in a dnf transaction.
  3. To be able to do unit tests, the logic of reading environment variable
     has been extracted as a separate function.

Fixes: ddd428a ("set up kernel crashkernel for osbuild in kernel hook")
Signed-off-by: Coiby Xu <coxu@redhat.com>
Reviewed-by: Pingfan Liu <piliu@redhat.com>
Reviewed-by: Philipp Rudo <prudo@redhat.com>
2022-01-26 08:32:06 +08:00
Coiby Xu
ae0cbdf34a fix "kdump: Invalid kdump config option auto_reset_crashkernel" error
kdumpctl only accepts a specified set of options. Add
auto_reset_crashkernel to this set.

Fixes: 73ced7f ("introduce the auto_reset_crashkernel option to kdump.conf")
Signed-off-by: Coiby Xu <coxu@redhat.com>
Acked-by: Tao Liu <ltao@redhat.com>
2022-01-07 12:20:37 +08:00
Coiby Xu
d5c31605f3 use grep -s to suppress error messages about nonexistent or unreadable files
When a file doesn't exist or isn't readable, grep complains as follows,

grep: /proc/cmdline: No such file or directory
grep: /etc/kernel/cmdline: No such file or directory

/proc/cmdline doesn't exist when installing package for an OS image and
/etc/kernel/cmdline may not exist if osbuild doesn't want set custom
kernel cmdline.

Use "-s" to suppress the error messages.

Fixes: 0adb0f4 ("try to reset kernel crashkernel when kexec-tools updates the default crashkernel value")
Fixes: ddd428a ("set up kernel crashkernel for osbuild in kernel hook")
Signed-off-by: Coiby Xu <coxu@redhat.com>
Acked-by: Tao Liu <ltao@redhat.com>
2022-01-07 12:20:21 +08:00
Coiby Xu
ddd428a1d0 set up kernel crashkernel for osbuild in kernel hook
osbuild is a tool to build OS images. It uses bwrap to install packages
inside a sandbox/container. Since the kernel package recommends
kexec-tools which in turn recommends grubby, the installation order would
be grubby -> kexec-tools -> kernel. So we can use the kernel hook
92-crashkernel.install provided by kexec-tools to set up kernel
crashkernel for the target OS image. But in osbuild's case, there is no
current running kernel and running `uname -r` in the container/sandbox
actually returns the host kernel release. To set up kernel crashkernel for
the OS image built by osbuild, a different logic is needed.

We will check if kernel hook is running inside the osbuild container
then set up kernel crashkernel only if osbuild hasn't specified a
custome value. osbuild exposes [1] the container=bwrap-osbuild environment
variable. According to [2], the environment variable is not inherited down
the process tree, so we need to check /proc/1/environ to detect this
environment variable to tell if the kernel hook is running inside a
bwrap-osbuild container. After that we need to know if osbuild wants to use
custom crashkernel value. This is done by checking if /etc/kernel/cmdline
has crashkernel set [3]. /etc/kernel/cmdline is written before packages
are installed.

[1] https://github.com/osbuild/osbuild/pull/926
[2] https://systemd.io/CONTAINER_INTERFACE/
[3] https://bugzilla.redhat.com/show_bug.cgi?id=2024976#c5

Reviewed-by: Pingfan Liu <piliu@redhat.com>
Reviewed-by: Philipp Rudo <prudo@redhat.com>
Signed-off-by: Coiby Xu <coxu@redhat.com>
2022-01-05 09:40:24 +08:00
Coiby Xu
5e8c751c39 reset kernel crashkernel for the special case where the kernel is updated right after kexec-tools
When kexec-tools updates the default crashkernel value, it will try to
reset the existing installed kernels including the currently running
kernel. So the running kernel could have different kernel cmdline
parameters from /proc/cmdline. When installing a kernel after updating
kexec-tools, /usr/lib/kernel/install.d/20-grub.install would be called
by kernel-install [1] which would use /proc/cmdline to set up new kernel's
cmdline. To address this special case, reset the new kernel's crashkernel
and fadump value to the value that would be used by running kernel after
rebooting by the installation hook. One side effect of this commit is it
would reset the installed kernel's crashkernel even currently running kernel
don't use the default crashkernel value after rebooting. But I think this
side effect is a benefit for the user.

The implementation depends on kernel-install which run the scripts in
/usr/lib/kernel/install.d passing the following arguments,

  add KERNEL-VERSION $BOOT/MACHINE-ID/KERNEL-VERSION/ KERNEL-IMAGE [INITRD-FILE ...]

An concrete example is given as follows,
  add 5.11.12-300.fc34.x86_64 /boot/e986846f63134c7295458cf36300ba5b/5.11.12-300.fc34.x86_64 /lib/modules/5.11.12-300.fc34.x86_64/vmlinuz

kernel-install could be started by the kernel package's RPM scriplet [2].
As mentioned in previous commit "try to reset kernel crashkernel when
kexec-tools updates the default crashkernel value", kdumpctl has difficulty
running in RPM scriptlet fore CoreOS. But rpm-ostree ignores all kernel hooks,
there is no need to disable the kernel hook for CoreOS/Atomic/Silverblue. But a
collaboration between rpm-ostree and kexec-tools is needed [3] to take care
of this special case.

Note the crashkernel.default support is dropped.

[1] https://www.freedesktop.org/software/systemd/man/kernel-install.html
[2] https://src.fedoraproject.org/rpms/kernel/blob/rawhide/f/kernel.spec#_2680
[3] https://github.com/coreos/rpm-ostree/issues/2894

Reviewed-by: Pingfan Liu <piliu@redhat.com>
Reviewed-by: Philipp Rudo <prudo@redhat.com>
Signed-off-by: Coiby Xu <coxu@redhat.com>
2022-01-05 09:40:24 +08:00
Coiby Xu
0adb0f4a8c try to reset kernel crashkernel when kexec-tools updates the default crashkernel value
kexec-tools could update the default crashkernel value.
When auto_reset_crashkernel=yes, reset kernel to new crashkernel
value in the following two cases,
 - crashkernel=auto is found in the kernel cmdline
 - the kernel crashkernel was previously set by kexec-tools i.e.
   the kernel is using old default crashkernel value

To tell if the user is using a custom value for the kernel crashkernel
or not, we assume the user would never use the default crashkernel value
as custom value. When kexec-tools gets updated,
 1. save the default crashkernel value of the older package to
    /tmp/crashkernel (for POWER system, /tmp/crashkernel_fadump is saved
    as well).
 2. If auto_reset_crashkernel=yes, iterate all installed kernels.
    For each kernel, compare its crashkernel value with the old
    default crashkernel and reset it if yes

The implementation makes use of two RPM scriptlets [2],
 - %pre is run before a package is installed so we can use it to save
   old default crashkernel value
 - %post is run after a package installed so we can use it to try to reset
   kernel crashkernel

There are several problems when running kdumpctl in the RPM scripts
for CoreOS/Atomic/Silverblue, for example, the lock can't be acquired by
kdumpctl, "rpm-ostree kargs" can't be run and etc.. So don't enable this
feature for CoreOS/Atomic/Silverblue.

Note latest shellcheck (0.8.0) gives false positives about the
associative array as of this commit. And Fedora's shellcheck is 0.7.2
and can't even correctly parse the shell code because of the associative
array.

[1] https://github.com/koalaman/shellcheck/issues/2399
[2] https://docs.fedoraproject.org/en-US/packaging-guidelines/Scriptlets/

Reviewed-by: Pingfan Liu <piliu@redhat.com>
Reviewed-by: Philipp Rudo <prudo@redhat.com>
Signed-off-by: Coiby Xu <coxu@redhat.com>
2022-01-05 09:40:24 +08:00
Coiby Xu
140da74a34 rewrite reset_crashkernel to support fadump and to used by RPM scriptlet
Rewrite kdumpctl reset-crashkernel KERNEL_PATH as
kdumpctl reset-crashkernel [--fadump=[on|off|nocma]]  [--kernel=path_to_kernel] [--reboot]

This interface would reset a specific kernel to the default crashkernel value
given the kernel path. And it also supports grubby's syntax so there are the
following special cases,
 - if --kernel not specified,
    - use KDUMP_KERNELVER if it's defined in /etc/sysconfig/kdump
    - otherwise use current running kernel, i.e. `uname -r`
 - if --kernel=DEFAULT, the default boot kernel is chosen
 - if --kernel=ALL, all kernels would have its crashkernel reset to the
   default value and the /etc/default/grub is updated as well

--fadump=[on|off|nocma] toggles fadump on/off for the kernel provided
in KERNEL_PATH. If --fadump is omitted, the dump mode is determined by
parsing the kernel command line for the kernel(s) to update.

CoreOS/Atomic/Silverblue needs to be treated as a special case because,
 - "rpm-ostree kargs" is used to manage kernel command line parameters
    so --kernel doesn't make sense and there is no need to find current
    running kernel
 - "rpm-ostree kargs" itself would prompt the user to reboot the system
   after modify the kernel command line parameter
 - POWER is not supported so we can assume the dump mode is always kdump

This interface will also be called by kexec-tools RPM scriptlets [1]
to reset crashkernel.

Note the support of crashkenrel.default is dropped.

[1] https://docs.fedoraproject.org/en-US/packaging-guidelines/Scriptlets/

Reviewed-by: Pingfan Liu <piliu@redhat.com>
Reviewed-by: Philipp Rudo <prudo@redhat.com>
Signed-off-by: Coiby Xu <coxu@redhat.com>
2022-01-05 09:40:24 +08:00
Coiby Xu
12ecbce359 fix incorrect usage of rpm-ostree to update kernel command line parameters
CoreOS/Atomic/Silverblue use "rpm-ostree kargs" to manage kernel command
line parameters.

Fixes: 86130ec ("kdumpctl: Add kdumpctl reset-crashkernel")

Reviewed-by: Pingfan Liu <piliu@redhat.com>
Reviewed-by: Philipp Rudo <prudo@redhat.com>
Signed-off-by: Coiby Xu <coxu@redhat.com>
2022-01-05 09:40:24 +08:00
Coiby Xu
945cbbd59b add helper functions to get kernel path by kernel release and the path of current running kernel
grubby --info=kernel-path or --add-kernel=kernel-path accepts a kernel
path (e.g. /boot/vmlinuz-5.14.14-200.fc34.x86_64) instead of kernel release
(e.g 5.14.14-200.fc34.x86_64). So we need to know the kernel path given
a kernel release. Although for Fedora/RHEL, the kernel path is
"/boot/vmlinuz-<KERNEL_RELEASE>", a path kernel could also be
/boot/<machine-id>/<KERNEL_RELEASE>/vmlinuz. So the most reliable way to
find the kernel path given a kernel release is to use "grubby --info".

For osbuild, a kernel path may not yet exist but it's valid for
"grubby --update-kernel=KERNEL_PATH". For example, "grubby -info" may
output something as follows,

index=0
kernel="/var/cache/osbuild-worker/osbuild-store/tmp/tmp2prywdy5object/tree/boot/vmlinuz-5.15.10-100.fc34.x86_64"
args="ro no_timer_check net.ifnames=0 console=tty1 console=ttyS0,115200n8"
root="UUID=76a22bf4-f153-4541-b6c7-0332c0dfaeac"
initrd="/var/cache/osbuild-worker/osbuild-store/tmp/tmp2prywdy5object/tree/boot/initramfs-5.15.10-100.fc34.x86_64.img"

There is no need to check if path like
/var/cache/osbuild-worker/osbuild-store/tmp/tmp2prywdy5object/tree/boot/vmlinuz-5.15.10-100.fc34.x86_64
physically exists.

Note these helper functions doesn't support CoreOS/Atomic/Silverblue
since grubby isn't used by them.

Reviewed-by: Pingfan Liu <piliu@redhat.com>
Reviewed-by: Philipp Rudo <prudo@redhat.com>
Signed-off-by: Coiby Xu <coxu@redhat.com>
2022-01-05 09:40:24 +08:00
Coiby Xu
3d2079c31c add helper functions to get dump mode
Add a helper function to get dump mode. The dump mode would be
 - fadump if fadump=on or fadump=nocma
 - kdump if fadump=off or empty fadump

Otherwise return 1.

Also add another helper function to return a kernel's dump mode.

Reviewed-by: Pingfan Liu <piliu@redhat.com>
Reviewed-by: Philipp Rudo <prudo@redhat.com>
Signed-off-by: Coiby Xu <coxu@redhat.com>
2022-01-05 09:40:24 +08:00
Coiby Xu
fb9e6838ab add a helper function to read kernel cmdline parameter from grubby --info
This helper function will be used to retrieve the value of kernel
cmdline parameters including crashkernel, fadump, swiotlb and etc.

Suggested-by: Philipp Rudo <prudo@redhat.com>
Reviewed-by: Pingfan Liu <piliu@redhat.com>
Signed-off-by: Coiby Xu <coxu@redhat.com>
2022-01-05 09:40:24 +08:00
Coiby Xu
796d0f6fd2 provide kdumpctl get-default-crashkernel for kdump_anaconda_addon and RPM scriptlet
Provide "kdumpctl get-default-crashkernel" for kdump_anaconda_addon
so crashkernel.default isn't needed.

When fadump is on, kdump_anaconda_addon would need to specify the dump
mode, i.e. "kdumpctl get-default-crashkernel fadump".

This interface would also be used by RPM scriptlet [1] to fetch default
crashkernel value.

[1] https://docs.fedoraproject.org/en-US/packaging-guidelines/Scriptlets/

Reviewed-by: Pingfan Liu <piliu@redhat.com>
Reviewed-by: Philipp Rudo <prudo@redhat.com>
Signed-off-by: Coiby Xu <coxu@redhat.com>
2022-01-05 09:40:24 +08:00
Kairui Song
546c81a205 kdumpctl: remove some legacy code
It seems the save_core function and vmcore detection was used a long
time ago when kdump shares same userspace in first and second kernel.
It's now heavily deprecated (only support cp, hardcoded path, dumpoops
no longer exists) and not used.

Now vmcore will never show up in first kernel for both kdump and fadump
case, and kdumpctl is only used in first kernel, so just remove them.

Signed-off-by: Kairui Song <kasong@tencent.com>
Acked-by: Coiby Xu <coxu@redhat.com>
2021-12-31 11:37:19 +08:00
Kairui Song
0e4b66b1ab bash scripts: reformat with shfmt
This is a batch update done with:
shfmt -s -w mkfadumprd mkdumprd kdumpctl *-module-setup.sh

Clean up code style and reduce code base size, no behaviour change.

Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Philipp Rudo <prudo@redhat.com>
2021-09-14 03:25:29 +08:00
Kairui Song
4f75e16700 bash scripts: declare and assign separately
Declare and assign separately to avoid masking return values:
https://github.com/koalaman/shellcheck/wiki/SC2155

Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Philipp Rudo <prudo@redhat.com>
2021-09-14 03:25:29 +08:00
Kairui Song
a4648fc851 bash scripts: fix redundant exit code check
As suggested by:
https://github.com/koalaman/shellcheck/wiki/SC2181

Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Philipp Rudo <prudo@redhat.com>
2021-09-14 03:25:29 +08:00
Kairui Song
86538ca6e2 bash scripts: fix variable quoting issue
Fixed quoting issues found by shellcheck, no feature
change. This should fix many errors when there is space
in any shell variables, eg. dump target's name/path/id.

False positives are marked with "# shellcheck disable=SCXXXX", for
example, args are expected to split so it should not be quoted.

And replaced some `cut -d ' ' -fX` with `awk '{print $X}'` since cut
is fragile, and doesn't work well with any quoted strings that have
redundant space.

Following quoting related issues are fixed (check the link
for example code and what could go wrong):

https://github.com/koalaman/shellcheck/wiki/SC2046
https://github.com/koalaman/shellcheck/wiki/SC2053
https://github.com/koalaman/shellcheck/wiki/SC2068
https://github.com/koalaman/shellcheck/wiki/SC2086
https://github.com/koalaman/shellcheck/wiki/SC2206

Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Philipp Rudo <prudo@redhat.com>
2021-09-14 03:25:29 +08:00
Kairui Song
70978c00e5 bash scripts: replace '[ ]' with '[[ ]]' for bash scripts
kdumpctl, mkdumprd, *-module-setup.sh only target bash, since they
only run in first kernel and depend on dracut, and dracut depends
on bash. So use '[[ ]]' to replace '[ ]'.

This is a batch update done with following command:
`sed -i -e 's/\(\s\)\[\s\([^]]*\)\s\]/\1\[\[\ \2 \]\]/g' kdumpctl, mkdumprd, *-module-setup.sh`
and replaced [ ... -a ... ] with [[ ... ]] && [[ ... ]] manually.

See https://tldp.org/LDP/abs/html/testconstructs.html for more details
on '[[ ]]', it's more versatile, safer, and slightly faster than '[ ]'.

This will also help shfmt to clean up the code in later commits.

Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Philipp Rudo <prudo@redhat.com>
2021-09-14 03:25:29 +08:00
Kairui Song
54cc5c44be bash scripts: use $(...) notation instead of legacy ...
This is a batch update done with following command:

`sed -i -e 's/`\([^`]*\)`/\$(\1)/g' mkfadumprd mkdumprd \
 kdumpctl dracut-module-setup.sh dracut-fadump-module-setup.sh \
 dracut-early-kdump-module-setup.sh`

And manually converted some corner cases. This fixes
all related issues detected by shellcheck.
Make it easier to do clean up in later commits.

Check following link for reasons to switch to the new syntax:
https://github.com/koalaman/shellcheck/wiki/SC2006

Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Philipp Rudo <prudo@redhat.com>
2021-09-14 03:25:29 +08:00
Kairui Song
a416930706 bash scripts: always use "read -r"
This helps to strip spaces and avoid mangling backslashes:

https://github.com/koalaman/shellcheck/wiki/SC2162

Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Philipp Rudo <prudo@redhat.com>
2021-09-14 03:25:29 +08:00
Kairui Song
c4d85142be bash scripts: get rid of expr and let
As suggested by:
https://github.com/koalaman/shellcheck/wiki/SC2219

Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Philipp Rudo <prudo@redhat.com>
2021-09-14 03:25:29 +08:00
Kairui Song
6d45257cc1 bash scripts: remove useless cat
Some `cat` calls are useless, remove them to make it cleaner.
See: https://github.com/koalaman/shellcheck/wiki/SC2002

Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Philipp Rudo <prudo@redhat.com>
2021-09-14 03:25:29 +08:00
Kairui Song
80525aface kdumpctl: refine grep usage
Use `grep -q` instead of redirect to /dev/null.

Use `grep -c` instead, as suggested in:
https://github.com/koalaman/shellcheck/wiki/SC2126

Use `grep -E` instead of `egrep`.
https://github.com/koalaman/shellcheck/wiki/SC2196

Signed-off-by: Kairui Song <kasong@redhat.com>
2021-09-14 03:25:29 +08:00
Kairui Song
dfb76467c9 kdumpctl: fix fragile loops over find output
For loops over find output are fragile, use a while read loop:
https://github.com/koalaman/shellcheck/wiki/SC2044

Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Philipp Rudo <prudo@redhat.com
2021-09-14 03:25:29 +08:00
Kairui Song
01613b7211 kdumpctl: use kdump_get_conf_val to read config values
Also fixed kdumpctl, use `awk` instead of `cut` to read
core_collector's executable name correctly when its arguments
are not seperated by space.

Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Philipp Rudo <prudo@redhat.com>
2021-09-14 03:25:29 +08:00
Kairui Song
09ccf88405 kdump-lib.sh: add a config value retrive helper
Add a helper kdump_get_conf_val to replace get_option_value.

It can help cover more corner cases in the code, like when there are
multiple spaces in config file, config value separated by a tab,
heading spaces, or trailing comments.

And this uses "sed group command" and "sed hold buffer", make it much
faster than previous `grep <config> | tail -1`.

This helper is supposed to provide a universal way for kexec-tools
scripts to read in config value. Currently, different scripts are
reading the config in many different fragile ways.

For example, following codes are found in kexec-tools script code base:
  1. grep ^force_rebuild $KDUMP_CONFIG_FILE
     echo $_force_rebuild | cut -d' '  -f2

  2. grep ^kdump_post $KDUMP_CONFIG_FILE | cut -d\  -f2

  3. awk '/^sshkey/ {print $2}' $conf_file

  4. grep ^path $KDUMP_CONFIG_FILE | cut -d' '  -f2-

1, 2, and 4 will fail if the space is replaced by, e.g. a tab

1 and 2 might fail if there are multiple spaces between config name
and config value:
"kdump_post  /var/crash/scripts/kdump-post.sh"
A space will be read instead of config value.

1, 2, 3 will fail if there are space in file path, like:
"kdump_post /var/crash/scripts dir/kdump-post.sh"

4 will fail if there are trailing comments:
"path /var/crash # some comment here"

And all will fail if there are heading space,
" path /var/crash"

And all will most likely cause problems if the config file contains
the same option more than once.

And all of them are slower than the new sed call. Old get_option_value
is also very slow and doesn't handle heading space.

Although we never claim to support heading space or tailing comments
before, it's harmless to be more robust on config reading, and many
conf files in /etc support heading spaces. And have a faster and
safer config reading helper makes it easier to clean up the code.

Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Philipp Rudo <prudo@redhat.com>
2021-09-14 03:25:29 +08:00
Kairui Song
a0282ab22c kdump-lib.sh: add a config format and read helper
Add a helper `kdump_read_conf` to replace read_strip_comments.
`kdump_read_conf` does a few more things:

  - remove trailing spaces.
  - format the content, remove duplicated spaces between name and value.
  - read from KDUMP_CONFIG_FILE (/etc/kdump.conf) directly, avoid pasting
    "/etc/kdump.conf" path everywhere in the code.
  - check if config file exists, just in case.

Also unify the environmental variable, now KDUMP_CONFIG_FILE stands for
the default config location.

This helps avoid some shell pitfalls about spaces when reading config.

Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Philipp Rudo <prudo@redhat.com>
2021-09-14 03:25:29 +08:00
Kairui Song
097059dedc Clear old crashkernl=auto in comment and doc
Acked-by: Pingfan Liu <piliu@redhat.com>
Signed-off-by: Kairui Song <kasong@redhat.com>
2021-08-05 17:54:20 +08:00
Kairui Song
bcd8d6a47b kdumpctl: fix a typo
Recommanded -> Recommended

Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Coiby Xu <coxu@redhat.com>
2021-07-20 15:57:05 +08:00
Kairui Song
86130ec10f kdumpctl: Add kdumpctl reset-crashkernel
In newer kernel, crashkernel.default will contain the default
crashkernel value of a kernel build. So introduce a new sub command
to help user reset kernel crashkernel size to the default value.

Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Baoquan He <bhe@redhat.com>
2021-07-08 15:18:45 +08:00
Kairui Song
bf6671b60d fadump: kdumpctl should check the modules used by the fadump initramfs
After fadump embedded the fadump initramfs in the normal initramfs,
kdumpctl will mistakenly rebuild the initramfs everytime.

kdumpctl checks the hostonly-kernel-modules.txt file in initramfs
to check if required drivers are included, but the normal initramfs
is built in non-hostonly mode, so it doesn't have a
hostonly-kernel-modules.txt file. The check will always fail.

So let mkfadumprd make a copy of the hostonly-kernel-modules.txt in the
fadump initramfs and let kdumpctl check that file instead.

Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Hari Bathini <hbathini@linux.ibm.com>
2021-06-30 17:27:02 +08:00
Hari Bathini
fa9201b240 fadump: isolate fadump initramfs image within the default one
In case of fadump, the initramfs image has to be built to boot into
the production environment as well as to offload the active crash dump
to the specified dump target (for boot after crash). As the same image
would be used for both boot scenarios, it could not be built optimally
while accommodating both cases.

Use --include to include the initramfs image built for offloading
active crash dump to the specified dump target. Also, introduce a new
out-of-tree dracut module (99zz-fadumpinit) that installs a customized
init program while moving the default /init to /init.dracut. This
customized init program is leveraged to isolate fadump image within
the default initramfs image by kicking off default boot process
(exec /init.dracut) for regular boot scenario and activating fadump
initramfs image, if the system is booting after a crash.

If squash is available, ensure default initramfs image is also built
with squash module to reduce memory consumption in capture kernel.

Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Kairui Song <kasong@redhat.com>
2021-06-29 21:35:58 +08:00
Kairui Song
e9e6a2c745 kdumpctl: Add kdumpctl estimate
Add a rough esitimation support, currently, following memory usage are
checked by this sub command:

- System RAM
- Kdump Initramfs size
- Kdump Kernel image size
- Kdump Kernel module size
- Kdump userspace user and other runtime allocated memory (currently
  simply using a fixed value: 64M)
- LUKS encryption memory usage

The output of kdumpctl estimate looks like this:
  # kdumpctl estimate
  Reserved crashkernel:    256M
  Recommanded crashkernel: 160M

  Kernel image size:   47M
  Kernel modules size: 12M
  Initramfs size:      19M
  Runtime reservation: 64M
  Large modules:
      xfs: 1892352
      nouveau: 2318336

And if the kdump target is encrypted:
  # kdumpctl estimate
  Encrypted kdump target requires extra memory, assuming using the keyslot with minimun memory requirement

  Reserved crashkernel:    256M
  Recommanded crashkernel: 655M

  Kernel image size:   47M
  Kernel modules size: 12M
  Initramfs size:      19M
  Runtime reservation: 64M
  LUKS required size:  512M
  Large modules:
      xfs: 1892352
      nouveau: 2318336
  WARNING: Current crashkernel size is lower than recommanded size 655M.

The "Recommanded" value is calculated based on memory usages mentioned
above, and will be adjusted accodingly to be no less than the value provided
by kdump_get_arch_recommend_size.

Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Pingfan Liu <piliu@redhat.com>
2021-05-19 15:27:43 +08:00
Kairui Song
6137956f79 kdumpctl: fix check_config error when kdump.conf is empty
Kdump scirpt already have default values for core_collector, path in
many other place. Empty kdump.conf still works. Fix this corner case and
fix the error message.

Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Pingfan Liu <piliu@redhat.com>
2021-04-28 18:05:12 +08:00
Kelvin Fan
75bdcb7399 Write to /var/lib/kdump if $KDUMP_BOOTDIR not writable
The `/boot` directory on some operating systems might be read-only.
If we cannot write to `$KDUMP_BOOTDIR` when generating the kdump
initrd, attempt to place the generated initrd at `/var/lib/kdump`
instead.

Signed-off by: Kelvin Fan <kelvinfan001@gmail.com>
Acked-by: Kairui Song <kasong@redhat.com>
2021-04-19 16:11:17 +08:00
Pingfan Liu
596fa0a07f kdumpctl: enable secure boot on ppc64le LPARs
On ppc64le LPAR, secure-boot is a little different from bare metal,
Where
  host secure boot: /ibm,secure-boot/os-secureboot-enforcing DT property exists
while
  guest secure boot: /ibm,secure-boot >= 2

Make kexec-tools adapt to LPAR

Signed-off-by: Pingfan Liu <piliu@redhat.com>
Acked-by: Kairui Song <kasong@redhat.com>
2021-02-23 09:45:54 +08:00
Kairui Song
02202aa70f logger: source the logger file individually
Sourcing logger file in kdump-lib.sh will leak kdump helper to dracut,
because module-setup.sh will source kdump-lib.sh. This will make kdump's
function override dracut's ones, and lead to unexpected behaviours.

So include kdump-logger.sh individually and only source it where it really
needed. for module-setup.sh, simply use dracut's logger helper is good
enough so just source kdump-logger.sh in kdump only scripts.

Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Lianbo Jiang <lijiang@redhat.com>
2021-01-20 14:13:44 +08:00
Pingfan Liu
0bd0c5b9f1 kdumpctl: fix a variable expansion in check_fence_kdump_config()
Both $ipaddrs and $node can hold multiple strings, so use "" to brace them.

Signed-off-by: Pingfan Liu <piliu@redhat.com>
Acked-by: Kairui Song <kasong@redhat.com>
2021-01-06 13:28:46 +08:00
Kairui Song
4464bcf8f3 kdump-lib.sh: Use a more generic helper to detect omitted dracut module
Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Lianbo Jiang <lijiang@redhat.com>
2020-11-30 15:25:26 +08:00
Kairui Song
647aa56b53 Fix the watchdog drivers detection code
Currently the watchdog detection code is broken already, it
get the list of active watchdog drivers, then check if they are
set in the /etc/cmdline.d/* as preload module. But after we
switched to use squash module, /etc/cmdline.d/* is not directly visible.

So just detect whether current needed driver is installed.

Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Lianbo Jiang <lijiang@redhat.com>
2020-11-30 15:25:19 +08:00
Kairui Song
276de0f810 Remove a redundant nfs check
In check_fs_modified, is_nfs_dump_target is already called, the dump
target can't be nfs. No need to check here.

Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Lianbo Jiang <lijiang@redhat.com>
2020-11-30 15:25:06 +08:00
Kairui Song
d54e5bab0f kdumpctl: split the driver detection from fs dection function
The driver detection have nothing to do with fs detection, and currently
if the dump target is raw, the block driver detection is skipped which
is wrong. Just split it out and run the block driver detection when dump
target is fs or raw.

Also simplfied the code a bit.

Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Lianbo Jiang <lijiang@redhat.com>
2020-11-30 15:24:45 +08:00
Lianbo Jiang
cd85fe9165 Add code comments to help better understanding
Let's add some code comments to help better understanding, and
no code changes.

Signed-off-by: Lianbo Jiang <lijiang@redhat.com>
Acked-by: Baoquan He <bhe@redhat.com>
2020-11-12 13:59:21 +08:00
Kairui Song
b9a1f461a8 Fix error when using raw target with opalcore
Commit 08276e9 wrongly raise this warning message to error level, fix
this.

Fixes: 08276e9 ('Rework check_config and warn on any duplicated option')
Signed-off-by: Kairui Song <kasong@redhat.com>
2020-10-27 17:37:14 +08:00
Lianbo Jiang
88a8b94de9 kdumpctl: add the '-d' option to enable the kexec loading debugging messages
Currently, the kexec option '--debug/-d' is not enabled by default, which
means that users need to set it manually and wait for the next failure to
capture the additional information.

Therefore, let's enable the option '-d' for kexec loading by default.

Signed-off-by: Lianbo Jiang <lijiang@redhat.com>
Acked-by: Kairui Song <kasong@redhat.com>
2020-10-27 17:34:03 +08:00
Lianbo Jiang
3b743ae6ae enable the logger for kdump
Since the logger was introduced into kdump, let's enable it for kdump
so that we can output kdump messages according the log level and save
these messages for debugging.

Signed-off-by: Lianbo Jiang <lijiang@redhat.com>
Acked-by: Kairui Song <kasong@redhat.com>
2020-10-27 17:33:54 +08:00
Kairui Song
08276e9f7a Rework check_config and warn on any duplicated option
Instead of read and parse the kdump.conf multiple times, only read once
and use a single loop to handle the error check, which is faster.

Also check for any duplicated config otion, and error out if there are
duplicated ones.

Now it checks for following errors, most are unchanged from before:
 - Any duplicated config options. (New added)
 - Deprecated/Invalid kdump config option.
 - Duplicated kdump target, will have a different error message of
   other duplicated config options.
 - Duplicated --mount options in dracut_args.
 - Empty config values. All kdump configs should be in
   "<config_opt> <config_value>" format.
 - Check If raw target is used in fadump mode.

And removed detect of lines start with space, it will not break kdump
anyway.

The performance is measurable better than before for the check_config
function.

Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Pingfan Liu <piliu@redhat.com>
2020-10-27 17:07:54 +08:00
Kairui Song
a37f36ad4d Refactor kernel image and initrd detection code
kernel installation is not always in a fixed location /boot, there are
multiple different style of kernel installation, and initramfs location
changes with kernel. The two files should be detected together and adapt
to different style.

To do so we use a list of known installation destinations, and a list
of possible kernel image and initrd names. Iterate the two list to
detect the installation location of the two files. If GRUB is in use,
the BOOT_IMAGE= cmdline from GRUB will also be considered. And also
prefers user specified config if given.

Previous atomic workaround is no longer needed as the new detection
method can cover that case.

Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Dave Young <dyoung@redhat.com>
2020-08-27 11:29:17 +08:00
Pingfan Liu
f96172d353 kdumpctl: exit if either pre.d or post.d is missing
It is hard to detect the time that /etc/kdump is removed. And this failure
may cause out-of-date kdump.initrd.  To keep things simple, just exit if
/etc/kdump/pre.d and post.d does not exist.

Signed-off-by: Pingfan Liu <piliu@redhat.com>
Acked-by: Kairui Song <kasong@redhat.com>
2020-07-30 16:47:10 +08:00
Pingfan Liu
25824d64cd kdumpctl: detect modification of scripts by its directory's timestamp
Checking modification against a file can not detect a removing file in
"/etc/kdump/post.d/ /etc/kdump/pre.d/".  Hence it also needs the
modified time of directory to detect such changes.

Signed-off-by: Pingfan Liu <piliu@redhat.com>
Acked-by: Kairui Song <kasong@redhat.com>
2020-07-20 16:18:42 +08:00
Lianbo Jiang
073646998f Revert "kdump-lib: switch to the kexec_file_load() syscall on x86_64 by default"
This reverts commit 6a20bd5447.

Let's restore the logic of secureboot status check, and remove the
option 'KDUMP_FILE_LOAD=on|off'. We will use the option KEXEC_ARGS="-s"
to enable the kexec file load later, which can avoid failures when
the secureboot is enabled.

Signed-off-by: Lianbo Jiang <lijiang@redhat.com>
Acked-by: Dave Young <dyoung@redhat.com>
2020-07-01 17:07:46 +08:00
Kairui Song
a29de38da5 Always wrap up call to dracut get_persistent_dev function
Dracut get_persistent_dev function don't recognize UUID= or LABEL=
format, so caller should conver it to the path to the block device
before calling it. There is already such a helper
"kdump_get_persistent_dev", just move it to kdump-lib.sh and rename
it to reuse it,

Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Dave Young <dyoung@redhat.com>
2020-06-22 19:58:08 +08:00
onitsuka.shinic@fujitsu.com
bdd57a5864 kdumpctl: Check the update of the binary and script files in /etc/kdump/{pre.d,post.d}
This patch adds the binary and script files in /etc/kdump/{pre.d,post.d}
to modified checklist in order to update kdump initramfs when one adds
new scripts or binaries or removes the existing ones under
/etc/kdump/{pre.d, post.d}.

Signed-off-by: Shinichi Onitsuka <onitsuka.shinic@fujitsu.com>
Acked-by: Pingfan Liu <piliu@redhat.com>
2020-06-11 12:59:15 +08:00
Kairui Song
61e016939c User get_mount_info to replace findmnt calls
Use get_mount_info so that fstab is used as a failback when look for
mount info.

Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Pingfan Liu <piliu@redhat.com>
2020-05-22 16:14:02 +08:00
Kairui Song
70deeb474b Allow calling mkdumprd from kdumpctl even if targat not mounted
Ignore mount check in kdumpctl, mkdumprd will still fail building and
exit if target is not mounted.

Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Pingfan Liu <piliu@redhat.com>
2020-05-22 16:13:49 +08:00
Kairui Song
0624148414 Add a is_mounted helper
Use is_mounted helper instaed of calling findmnt directly or checking if
"mount" value is empty.

If findmnt looks for fstab as well, some non mounted entry will also
return value. Required to support non-mounted target.

Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Pingfan Liu <piliu@redhat.com>
2020-05-22 16:13:24 +08:00
Kairui Song
43ea36b3e8 Introduce get_kdump_mntpoint_from_target and fix duplicated /
User a helper to get the path to mount dump target in kdump kernel, and
fix duplicated '/' in the mount path problem.

Fixes: bz1785371
Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Dave Young <dyoung@redhat.com>
2020-05-22 16:13:02 +08:00
Kairui Song
c1c7f004c8 Remove is_dump_target_configured
It's basically same with is_user_configured_dump_target and only have
one caller. And the name is confusing, the dump target is always
configured, it's either user configured or path based.

Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Lianbo Jiang <lijiang@redhat.com>
2020-03-30 22:05:31 +08:00
Kairui Song
632c369ec2 kdumpctl: fix driver change detection on latest Fedora
Now modinfo will return "(builtin)" instead of empty string for builtin
module. Sync the code logic.

Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Pingfan Liu <piliu@redhat.com>
2020-03-23 10:24:57 +08:00
Kairui Song
2fbcdf41e3 kdumpctl: check hostonly-kernel-modules.txt for kernel module
Since Dracut commit a0d9ad6 loaded-kernel-modules is renamed to
hostonly-kernel-modules and contains all hostonly modules. So check
hostonly-kernel-modules instead for module change.

Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Pingfan Liu <piliu@redhat.com>
2020-03-18 15:11:43 +08:00
Hari Bathini
e3f2f926dd powerpc: enable the scripts to capture dump on POWERNV platform
With FADump support added on POWERNV paltform, enable the scripts to
capture /proc/vmcore. Also, if CONFIG_OPAL_CORE is enabled, OPAL core
is preserved and exported on POWERNV platform. So, offload OPAL core,
if it is available.

Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Acked-by: Kairui Song <kasong@redhat.com>
2020-02-06 22:13:06 +08:00
Lianbo Jiang
6a20bd5447 kdump-lib: switch to the kexec_file_load() syscall on x86_64 by default
UEFI Secure boot is a signature verification mechanism, designed to
prevent malicious code being loaded and executed at the early boot
stage. This makes sure that code executed is trusted by firmware.

Previously, with kexec_file_load() interface, kernel prevents unsigned
kernel image from being loaded if secure boot is enabled. So kdump will
detect whether secure boot is enabled firstly, then decide which interface
is chosen to execute, kexec_load() or kexec_file_load(). Otherwise unsigned
kernel loading will fail if secure boot enabled, and kexec_file_load() is
entered.

Now, the implementation of kexec_file_load() is adjusted in below commit.
With this change, if CONFIG_KEXEC_SIG_FORCE is not set, unsigned kernel
still has a chance to be allowed to load under some conditions.

commit 99d5cadfde2b ("kexec_file: split KEXEC_VERIFY_SIG into KEXEC_SIG
and KEXEC_SIG_FORCE")

And in the current Fedora, the CONFIG_KEXEC_SIG_FORCE is not set, only the
CONFIG_KEXEC_SIG and CONFIG_BZIMAGE_VERIFY_SIG are set on x86_64 by default.
It's time to spread kexec_file_load() onto all systems of x86_64, including
Secure-boot platforms and legacy platforms. Please refer to the following
form.

.----------------------------------------------------------------------.
| .                    |     signed kernel     |    unsigned kernel    |
|    .      types      |-----------------------|-----------------------|
|       .              |Secure boot|  Legacy   |Secure boot|  Legacy   |
|          .           |-----------|-----------|-----------|-----------|
| options     .        | prev| now | prev| now |     |     | prev| now |
|                .     |(file|(file|(only|(file| prev| now |(only|(file|
|                    . |load)|load)|load)|load)|     |     |load)|load)|
|----------------------|-----|-----|-----|-----|-----|-----|-----|-----|
|KEXEC_SIG=y           |     |     |     |     |     |     |     |     |
|SIG_FORCE is not set  |succ |succ |succ |succ |  X  |  X  |succ |succ |
|BZIMAGE_VERIFY_SIG=y  |     |     |     |     |     |     |     |     |
|----------------------|-----|-----|-----|-----|-----|-----|-----|-----|
|KEXEC_SIG=y           |     |     |     |     |     |     |     |     |
|SIG_FORCE is not set  |     |     |     |     |     |     |     |     |
|BZIMAGE_VERIFY_SIG is |fail |fail |succ |fail |  X  |  X  |succ |fail |
|not set               |     |     |     |     |     |     |     |     |
|----------------------|-----|-----|-----|-----|-----|-----|-----|-----|
|KEXEC_SIG=y           |     |     |     |     |     |     |     |     |
|SIG_FORCE=y           |succ |succ |succ |fail |  X  |  X  |succ |fail |
|BZIMAGE_VERIFY_SIG=y  |     |     |     |     |     |     |     |     |
|----------------------|-----|-----|-----|-----|-----|-----|-----|-----|
|KEXEC_SIG=y           |     |     |     |     |     |     |     |     |
|SIG_FORCE=y           |     |     |     |     |     |     |     |     |
|BZIMAGE_VERIFY_SIG is |fail |fail |succ |fail |  X  |  X  |succ |fail |
|not set               |     |     |     |     |     |     |     |     |
|----------------------|-----|-----|-----|-----|-----|-----|-----|-----|
|KEXEC_SIG is not set  |     |     |     |     |     |     |     |     |
|SIG_FORCE is not set  |     |     |     |     |     |     |     |     |
|BZIMAGE_VERIFY_SIG is |fail |fail |succ |succ |  X  |  X  |succ |succ |
|not set               |     |     |     |     |     |     |     |     |
 ----------------------------------------------------------------------
Note:
[1] The 'X' indicates that the 1st kernel(unsigned) can not boot when the
    Secure boot is enabled.

Hence, in this patch, if on x86_64, let's use the kexec_file_load() only.
See if anything wrong happened in this case, in Fedora firstly for the
time being.

Signed-off-by: Lianbo Jiang <lijiang@redhat.com>
Acked-by: Kairui Song <kasong@redhat.com>
2020-02-06 21:57:14 +08:00
Hari Bathini
0a9aabaadd kdumpctl: make reload fail proof
When large amount of memory, about 1TB, is removed with DLPAR memory
remove operation, kdump reload could fail due to race condition with
device tree property update. In such scenario, the subsequent kdump
reload requests would also fail as reload() only proceeds if current
load status is active. Since the possibility of this race condition
couldn't be wished away due to the nature of the scenario, workaround
it by proceeding to load even if current load status is not active as
long as kdump service is active, which kdump udev rules already check
for.

Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Acked-by: Kairui Song <kasong@redhat.com>
2019-11-12 13:22:52 +08:00
Pingfan Liu
72ed97683f kdumpctl: bail out immediately if host key verification failed
In kdump.conf, if sshkey points to an invalid ssh key, 'kdumpctl restart'
can bail out immediately instead of retry.

Signed-off-by: Pingfan Liu <piliu@redhat.com>
Acked-by: Kairui Song <kasong@redhat.com>
2019-10-22 15:14:37 +08:00
Pingfan Liu
e07fc3e071 kdumpctl: echo msg when waiting for connection
Print some message during the long wait period to reflect the process.
The message will look like:

  Network dump target is not usable, waiting for it to be ready
  ...

Signed-off-by: Pingfan Liu <piliu@redhat.com>
Acked-by: Kairui Song <kasong@redhat.com>
2019-09-24 13:17:16 +08:00
Pingfan Liu
680c0d3414 kdumpctl: distinguish the failed reason of ssh
On a host with ipaddr not ready before kdump service, ssh return errno 255.
While if no ssh-key, ssh also return errno 255. For both of cases, the
current kdump code promote user to run 'kdumpctl propagate'. This confuses
user who already installs ssh-key.

In order to tell these two cases from each other, the ssh warning message
should be involved, and parsed.

For the no ssh-key case , warning message is "Permission denied" or "No
such file or directory". For the other, warning message is "Network
Unreachable"

This patch also does a slight change to enlarge the timeout from 60s to
180s. This value can meet test at the time being

Signed-off-by: Pingfan Liu <piliu@redhat.com>
Acked-by: Kairui Song <kasong@redhat.com>
2019-09-02 17:06:21 +08:00
Pingfan Liu
c1a06343df kdumpctl: wait a while for network ready if dump target is ssh
If dump target is ipv6 address, a host should have ipv6 address ready
before starting kdump service. Otherwise, kdump service fails to start due
to the failure "ssh dump_server_ip mkdir -p $SAVE_PATH".
And user can see message like:
 "Could not create root@2620:52:0:10da:46a8:42ff:fe23:3272/var/crash"

I observe a long period (about 30s) on some machine before they got ipv6
address dynamiclly, which is never seen on ipv4 host.

Hence kdump service has a dependency on ipv6 address. But there is no good
way to resolve it. One way is asking user to run the cmd "nmcli connection
modify eth0 ipv6.may-fail false". But this will block systemd until ipv6
address is ready. Despite doing so, kdump can try its best (wait 1 minutes
after it starts up) before failure.

How to implement the wait is arguable. It will involve too many technique
details if explicitly waiting on ipv6 address, instead, just lean on 'ssh'
return value to see the availability of network.

Signed-off-by: Pingfan Liu <piliu@redhat.com>
Acked-by: Kairui Song <kasong@redhat.com>
2019-08-12 16:13:08 +08:00
Kairui Song
f0fa5c8e91 kdumpctl: check for ssh path availability when rebuild
Currently kdumpctl rebuild will simply rebuild the initramfs, and
only perform basic config syntax check. But it should also check if the
target path is available when using SSH target, else kdump may fail.
is second kernel. kdumpctl rebuild should cover this case, and create
the path if it doesn't exist.

This patch make rebuild and restart behaves the same, rebuild is
now equal to restart, except it won't check config change or reload
kdump resource.

Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Dave Young <dyoung@redhat.com>
2019-05-27 16:13:29 +08:00
Kairui Song
43c26b7312 kdumpctl: Check kdump.conf for error when rebuild is called
Although "kdumpctl rebuild" is introduced to help user rebuild the
initramfs without modifying the kdump.conf, if the kdump.conf is
modified and "kdumpctl rebuild" is called, a initramfs with a faulty
kdump.conf will be built.

Kdump will refuse to load the initramfs when restarted, but kdumpctl
reload may load the faulty initramfs. So need to make sure the faulty
build won't be generate in the first place.

Check for kdump.conf error before building the initramfs to ensure such
failure won't happen.

Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Dave Young <dyoung@redhat.com>
2019-05-27 13:57:55 +08:00
Kairui Song
2efc0f1854 kdumpctl: don't always rebuild when extra_modules is set
We don't necessarily have to always rebuild the initramfs when
extra_modules is set. Instead, just detect if any module is updated,
and only rebuild initramfs if found any updated kernel module.

Tested with in-tree kernel modules, out-of-tree kernel modules, weak
modules, all worked as expected.

Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Dave Young <dyoung@redhat.com>
2019-05-20 17:01:25 +08:00
Kairui Song
30913fd667 kdumpctl: follow symlink when checking for modified files
Previously only the symlink's timestamp is used for checking if file are
modified, this will not trigger a rebuild if the symlink target it
modified.

So check both symlink timestamp and symlink target timestamp, rebuild
the initramfs on both symlink changed and target changed.

Also give a proper error message if the file doesn't exist.

Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Dave Young <dyoung@redhat.com>
2019-05-20 16:56:31 +08:00
Kairui Song
75d9132417 Get rid of duplicated strip_comments when reading config
When reading kdump configs, a single parsing should be enough and this
saves a lot of duplicated striping call which speed up the total load
speed.

Speed up about 2 second when building and 0.1 second for reload in my
tests.

Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Dave Young <dyoung@redhat.com>
2019-05-20 16:56:28 +08:00
Lianbo Jiang
9529191d95 earlykdump: provide a prompt message after the rebuilding of kdump initramfs.
Early kdump inherits the settings of normal kdump, so any changes that
caused normal kdump rebuilding also require rebuilding the system initramfs
to make sure that the changes take effect for early kdump.

Therefore, when the early kdump is enabled, provide a prompt message after
the rebuilding of kdump initramfs is completed.

Signed-off-by: Lianbo Jiang <lijiang@redhat.com>
Acked-by: Dave Young <dyoung@redhat.com>
2019-05-20 16:56:19 +08:00
Kairui Song
1c1159a586 kdumpctl: Detect block device driver change for initramfs rebuild
Previous we rebuild the initramfs when kenrel load module list changed,
but this is not very stable as some async services may load/unload
kernel modules, and cause unnecessary initramfs rebuild.

Instead, it's better to just check if the module required to dump to
the dump target is loaded or not, and rebuild if not loaded. This
avoids most false-positives, and ensure local target change is always
covered.

Currently only local fs dump target is covered, because this check
requires the dump target to be mounted when building the initramfs,
this guarantee that the module is in the loaded kernel module list,
else we may still get some false positive.

dracut-install could be leveraged to combine the modalias list with
kernel loaded module list as a more stable module list in the initramfs,
but upstream dracut change need to be done first.

Passed test on a KVM VM, changing the storage between SATA/USB/VirtIO
will trigger initramfs rebuild and didn't notice any false-positive.
Also passed test on my laptop with no false-positive.

Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Dave Young <dyoung@redhat.com>
2019-05-08 17:51:18 +08:00
Kairui Song
09f50350d9 Revert "kdumpctl: Rebuild initramfs if loaded kernel modules changed"
This reverts commit 6b479b6572.

Check initramfs rebuild by looking at if there is any change of load
kernel modules list is not very stable after all. Previously we are
counting on udev to settle before kdump is started to ensure all modules
is ready, but actually any service may cause a kernel module load, even
after udev is settled.

The previous commit is trying to workaround an issue that VM created
with disk snapshot may fail in the kdump initramfs. The better fix is to
not include the kdump initramfs in the disk snapshot at all, as the
kdump initramfs is not generated for a generic use. And With new added
"kdumpctl reload" command, admins could rebuild the image easily, and
should rebuild the initramfs on hardware change manually.

Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Dave Young <dyoung@redhat.com>
2019-05-06 17:54:26 +08:00
Kairui Song
594ac119c5 kdumpctl: add rebuild support
Use "kdumpctl rebuild" to rebuild the image directly. This could help
admins to rebuild kdump image directly.

Also merge fadump related initramfs backup/restore into setup_initrd,
and do permission only when actually trying to rebuild the image.

Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Dave Young <dyoung@redhat.com>
2019-04-05 02:02:43 +08:00
Hari Bathini
689fca5af3 fadump: leverage kernel support to re-regisgter FADump
With kernel commit 0823c68b054b ("powerpc/fadump: re-register firmware-
assisted dump if already registered") support is enabled to re-register
when FADump is alredy registered. Leverage that option in kdump scripts.

Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Acked-by: Kairui Song <kasong@redhat.com>
2019-03-07 13:33:17 +08:00