Commit Graph

1689 Commits

Author SHA1 Message Date
Coiby Xu
311b5b100b update kernel crashkernel in posttrans RPM scriptlet when updating kexec-tools
When doing in-place upgrading using leapp on x86_64, kdumpcl can't
acquire instance lock when running in %post RPM scriplet on x86_64,
  localhost upgrade[1306]: /bin/kdumpctl: line 49: /var/lock/kdump: No such file or directory
  localhost upgrade[1306]: kdump: Create file lock failed

and running "touch /var/lock/dkump" also fails with
"No such file or directory". Thus kdumpctl can't be run in %post
scriptlet. But kdumpctl can be run in %posttrans RPM scriplet.

Besides, it's better to update crashkernel after the kernel has been
updated. So let's update kernel crashkernel in the %posttrans
scriptlet which will be run in the end of a transaction i.e. after
the kernel has been updated.

Note for %posttrans scriptlet, "$1 == 1" means both installing a new
package and upgrading a package.

[1] https://github.com/apptainer/singularity/issues/2386#issuecomment-474747054

Reported-by: Jie Li <jieli@redhat.com>
Signed-off-by: Coiby Xu <coxu@redhat.com>
Reviewed-by: Philipp Rudo <prudo@redhat.com>
2022-02-18 08:56:59 +08:00
Tao Liu
2bbc7512a2 kdump-lib.sh: Check the output of blkid with sed instead of eval
Previously the output of blkid is not checked. If the output
is empty, the eval will report the following error message:

    /lib/kdump/kdump-lib.sh: eval: line 925: syntax error near unexpected token `;'
    /lib/kdump/kdump-lib.sh: eval: line 925: `; echo $TYPE'

For example, we can observe such a failing when blkid is invoked
against a lvm thinpool block device:

    $ blkid -u filesystem,crypto -o export -- "/dev/block/253\:2"
    $ echo $?
    2
    $ udevadm info /dev/block/253\:2|grep S\:
    S: mapper/vg00-thinpoll_tmeta

In this patch, we will use sed instead of eval, to output the
fstype of block device if any.

Signed-off-by: Tao Liu <ltao@redhat.com>
Reviewed-by: Philipp Rudo <prudo@redhat.com>
2022-02-17 15:34:53 +08:00
Coiby Xu
59fcb8ae5b Release 2.0.23-5
Signed-off-by: Coiby Xu <coxu@redhat.com>
2022-02-14 12:07:08 +08:00
Coiby Xu
41b8f9528c fix incorrect usage of _get_all_kernels_from_grubby
It's found that the kernel cmdline crashkernel=auto doesn't get updated
when upgrading kexec-tools. This happens because _get_all_kernels_from_grubby
is called with no argument by reset_crashkernel_after_update. When retrieving
all kernel paths on the system, "grubby --info ALL" should be used. Fix this
error by passing "ALL" argument.

Fixes: 0adb0f4 ("try to reset kernel crashkernel when kexec-tools updates the default crashkernel value")

Reported-by: Jie Li <jieli@redhat.com>
Signed-off-by: Coiby Xu <coxu@redhat.com>
Reviewed-by: Tao Liu <ltao@redhat.com>
2022-02-14 10:34:55 +08:00
Coiby Xu
5111c01334 fix the mistake of swapping function parameters of read_proc_environ_var
_is_osbuild fails because it expects the 1st and 2nd function parameter
to be the environment variable and environ file path respectively. Fix
it by swapping the parameters in read_proc_environ_var.

Note the osbuild environ file path is defined in _OSBUILD_ENVIRON_PATH
so _is_osbuild can be unit-tested by overwriting _OSBUILD_ENVIRON_PATH.

Fixes: 6a3ce83 ("fix the error of parsing the container environ variable for osbuild")
Signed-off-by: Coiby Xu <coxu@redhat.com>
Reviewed-by: Pingfan Liu <piliu@redhat.com>
2022-02-08 10:42:40 +08:00
Coiby Xu
d6298a1dec Release 2.0.23-4 2022-01-26 15:10:50 +08:00
Coiby Xu
2df55984f6 fix broken kdump_get_arch_recommend_size
shellcheck finds the following problem,
$ shellcheck kdump-lib.sh
In kdump-lib.sh line 876:
        get_recommend_size "$sys_mem" "$ck_cmdline"
                                       ^---------^ SC2154: ck_cmdline is referenced but not assigned (did you mean '_ck_cmdline'?).

s/ck_cmdline/_ck_cmdline to fix kdump_get_arch_recommend_size.

Note s/sys_mem/_sys_mem as well to make the changes consistent.

Fixes: 105c016 ("factor out kdump_get_arch_recommend_crashkernel")
Signed-off-by: Coiby Xu <coxu@redhat.com>
Acked-by: Tao Liu <ltao@redhat.com>
2022-01-26 14:33:50 +08:00
Coiby Xu
c67a836cde remove the upper bound of 102400T for the range in default crashkernel
This patch makes the default crashkernel value consistent with previous
one.

Fixes: 105c016 ("factor out kdump_get_arch_recommend_crashkernel")
Signed-off-by: Coiby Xu <coxu@redhat.com>
Reviewed-by: Philipp Rudo <prudo@redhat.com>
2022-01-26 08:33:21 +08:00
Coiby Xu
6a3ce83a60 fix the error of parsing the container environ variable for osbuild
The environment variable entries in /proc/[pid]/environ are separated by
null bytes instead of by spaces. Update the sed regex to fix this issue.

Note that,
  1. this patch also fixes a issue which is kdumpctl would try to reset
     crashkernel even osbuild has provided custom crashkernel value.
  2. kernel hook 92-crashkernel.install installed by kexec-tools is
     guaranteed to be ran by kernel-install. kexec-tools doesn't recommend
     kernel so there is no guarantee kernel is installed after kexec-tools.
     But dnf invokes kernel-install in the posttrans scriptlet (of kernel-core)
     which is always ran after all packages including kexec-tools and kernel
     in a dnf transaction.
  3. To be able to do unit tests, the logic of reading environment variable
     has been extracted as a separate function.

Fixes: ddd428a ("set up kernel crashkernel for osbuild in kernel hook")
Signed-off-by: Coiby Xu <coxu@redhat.com>
Reviewed-by: Pingfan Liu <piliu@redhat.com>
Reviewed-by: Philipp Rudo <prudo@redhat.com>
2022-01-26 08:32:06 +08:00
Philipp Rudo
ca5a33855f s390: handle R_390_PLT32DBL reloc entries in machine_apply_elf_rel()
Resolves: bz2025860
Upstream: git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git

commit 186e7b0752d8fce1618fa37519671c834c46340e
Author: Alexander Egorenkov <egorenar@linux.ibm.com>
Date:   Wed Dec 15 18:48:53 2021 +0100

    s390: handle R_390_PLT32DBL reloc entries in machine_apply_elf_rel()

    Starting with gcc 11.3, the C compiler will generate PLT-relative function
    calls even if they are local and do not require it. Later on during linking,
    the linker will replace all PLT-relative calls to local functions with
    PC-relative ones. Unfortunately, the purgatory code of kexec/kdump is
    not being linked as a regular executable or shared library would have been,
    and therefore, all PLT-relative addresses remain in the generated purgatory
    object code unresolved. This in turn lets kexec-tools fail with
    "Unknown rela relocation: 0x14 0x73c0901c" for such relocation types.

    Furthermore, the clang C compiler has always behaved like described above
    and this commit should fix the purgatory code built with the latter.

    Because the purgatory code is no regular executable or shared library,
    contains only calls to local functions and has no PLT, all R_390_PLT32DBL
    relocation entries can be resolved just like a R_390_PC32DBL one.

    * https://refspecs.linuxfoundation.org/ELF/zSeries/lzsabi0_zSeries/x1633.html#AEN1699

    Relocation entries of purgatory code generated with gcc 11.3
    ------------------------------------------------------------

    $ readelf -r purgatory/purgatory.o

    Relocation section '.rela.text' at offset 0x6e8 contains 27 entries:
      Offset          Info           Type           Sym. Value    Sym. Name + Addend
    00000000000c  000300000013 R_390_PC32DBL     0000000000000000 .data + 2
    00000000001a  001000000014 R_390_PLT32DBL    0000000000000000 sha256_starts + 2
    000000000030  001100000014 R_390_PLT32DBL    0000000000000000 sha256_update + 2
    000000000046  001200000014 R_390_PLT32DBL    0000000000000000 sha256_finish + 2
    000000000050  000300000013 R_390_PC32DBL     0000000000000000 .data + 102
    00000000005a  001300000014 R_390_PLT32DBL    0000000000000000 memcmp + 2
    ...
    000000000118  001600000014 R_390_PLT32DBL    0000000000000000 setup_arch + 2
    00000000011e  000300000013 R_390_PC32DBL     0000000000000000 .data + 2
    00000000012c  000f00000014 R_390_PLT32DBL    0000000000000000 verify_sha256_digest + 2
    000000000142  001700000014 R_390_PLT32DBL    0000000000000000
    post_verification[...] + 2

    Relocation entries of purgatory code generated with gcc 11.2
    ------------------------------------------------------------

    $ readelf -r purgatory/purgatory.o

    Relocation section '.rela.text' at offset 0x6e8 contains 27 entries:
      Offset          Info           Type           Sym. Value    Sym. Name + Addend
    00000000000e  000300000013 R_390_PC32DBL     0000000000000000 .data + 2
    00000000001c  001000000013 R_390_PC32DBL     0000000000000000 sha256_starts + 2
    000000000036  001100000013 R_390_PC32DBL     0000000000000000 sha256_update + 2
    000000000048  001200000013 R_390_PC32DBL     0000000000000000 sha256_finish + 2
    000000000052  000300000013 R_390_PC32DBL     0000000000000000 .data + 102
    00000000005c  001300000013 R_390_PC32DBL     0000000000000000 memcmp + 2
    ...
    00000000011a  001600000013 R_390_PC32DBL     0000000000000000 setup_arch + 2
    000000000120  000300000013 R_390_PC32DBL     0000000000000000 .data + 122
    000000000130  000f00000013 R_390_PC32DBL     0000000000000000 verify_sha256_digest + 2
    000000000146  001700000013 R_390_PC32DBL     0000000000000000 post_verification[...] + 2

    Corresponding s390 kernel discussion:
    * https://lore.kernel.org/linux-s390/20211208105801.188140-1-egorenar@linux.ibm.com/T/#u

    Signed-off-by: Alexander Egorenkov <egorenar@linux.ibm.com>
    Reported-by: Tao Liu <ltao@redhat.com>
    Suggested-by: Philipp Rudo <prudo@redhat.com>
    Reviewed-by: Philipp Rudo <prudo@redhat.com>
    [hca@linux.ibm.com: changed commit message as requested by Philipp Rudo]
    Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
    Signed-off-by: Simon Horman <horms@verge.net.au>

v2:
   - Moved patch 601 -> 401

Signed-off-by: Philipp Rudo <prudo@redhat.com>
Acked-by: Coiby Xu <coxu@redhat.com>
2022-01-26 08:25:43 +08:00
Tao Liu
99de77bba7 Revert "Remove trace_buf_size and trace_event from the kernel bootparameters of the kdump kernel"
There is a mechanism to keep memory consumption minimum, i.e. equal
to trace_buf_size=1, until tracing by ftrace is actually started:

    tracing: keep ring buffer to minimum size till used
    73c5162aa3

Since ftrace is usually never used in the kdump 2nd kernel, the kdump
2nd kernel behaves in the same way with or without trace_buf_size=1.
So the issue which the patch want to solve never exists. Let's revert
the patch for better maintainance and avoid confusion.

ref link: https://bugzilla.redhat.com/show_bug.cgi?id=2034501#c20

This reverts commit f39000f.

Signed-off-by: Tao Liu <ltao@redhat.com>
Acked-by: Coiby Xu <coxu@redhat.com>
2022-01-26 08:23:11 +08:00
Kairui Song
748eb3a2a6 spec: only install mkfadumprd for ppc
fadump is a ppc only feature, mkfadumprd is only needed for fadump, drop
it for other arch.

Reviewed-by: Philipp Rudo <prudo@redhat.com>
Signed-off-by: Kairui Song <kasong@tencent.com>
2022-01-24 13:00:14 +08:00
Tao Liu
aa9e70349b selftest: Add early kdump test
This patch will introduce early kdump test.

It reuses the code of nfs kdump test, in order to setup 2 seperated VMs,
one(the client) for trigger the early kdump and crash, the other(the server)
for saving vmcore and check if vmcore exists. In order to minimize the
repetted code, a soft link is made to copy the same server side code.

Signed-off-by: Tao Liu <ltao@redhat.com>
Acked-by: Coiby Xu <coxu@redhat.com>
2022-01-24 11:18:00 +08:00
Tao Liu
f4ab396574 selftest: run-test.sh: wait for subprocess instead of kill it
When run tests with 2 VMs, for example nfs/ssh kdump tests, client VM will do the
crash and dump, server VM will do vmcore saving and if-vmcore-exists
check.

Previously, when client VM finishes running, run-test.sh will kill the lead background
process, and then check if server VM has outputted "TEST PASSED" or "TEST FAILED" string.
However it didn't wait for server VM to finish. As a result, the server VM's final
outputs are not collected and checked, leaving the test result as "TEST RESULT NOT FOUND"
sometimes.

For example, the following is the pstree status of $(jobs -p) before it
gets killed. We can see the server VM is still running:

run-test.sh,172455 /root/kexec-tools/tests/scripts/run-test.sh --console nfs-early-kdump
  └─run-test.sh,172457 /root/kexec-tools/tests/scripts/run-test.sh --console...
      └─timeout,172480 --foreground 10m /root/kexec-tools/tests/scripts/run-qemu...
          └─qemu-system-x86,172481 -enable-kvm -cpu host -nodefaults...
              ├─{qemu-system-x86},172489
              ├─{qemu-system-x86},172492
              ├─{qemu-system-x86},172493
              ├─{qemu-system-x86},172628
              └─{qemu-system-x86},172629

In this patch, we will wait for $(jobs -p) to finish, in order to get
the complete output of test results.

Signed-off-by: Tao Liu <ltao@redhat.com>
Acked-by: Coiby Xu <coxu@redhat.com>
2022-01-24 11:17:52 +08:00
Fedora Release Engineering
bb380a92fa - Rebuilt for https://fedoraproject.org/wiki/Fedora_36_Mass_Rebuild
Signed-off-by: Fedora Release Engineering <releng@fedoraproject.org>
2022-01-20 14:26:44 +00:00
Pingfan Liu
c480be7ccf spec: add hostname.rpm into Recommends list
kexec-tools runs hostname binary in the case of fence_kdump. Since this
is a trival dependency and should not block the kexec-tools installation
if non-existent, using weak-dependency to resolve it.

Signed-off-by: Pingfan Liu <piliu@redhat.com>
Acked-by: Tao Liu <ltao@redhat.com>
2022-01-18 14:29:09 +08:00
Pingfan Liu
72ef6929a6 move variable FENCE_KDUMP_SEND from kdump-lib.sh to kdump-lib-initramfs.sh
Since kdump-lib-initramfs.sh is included by kdump-lib.sh, and
FENCE_KDUMP_SEND is used by both 1st and 2nd kernel, moving
FENCE_KDUMP_SEND from kdump-lib.sh to kdump-lib-initramfs.sh.

Signed-off-by: Pingfan Liu <piliu@redhat.com>
Acked-by: Tao Liu <ltao@redhat.com>
2022-01-18 14:22:24 +08:00
Coiby Xu
1e569fd8a8 Release 2.0.23-2 2022-01-13 15:00:59 +08:00
Tao Liu
7de4a0d6c8 Set zstd as recommented for kexec-tools
This patch will make zstd as recommended instead of required for
kexec-tools. If zstd command/package is unavaliable, it can failback to invoke
gzip when making kdump initramfs.

Fixes: 0311f6e ("Set zstd as the default compression method for kdump initrd")

Signed-off-by: Tao Liu <ltao@redhat.com>
Acked-by: Coiby Xu <coxu@redhat.com>
2022-01-11 10:11:21 +08:00
Coiby Xu
ae0cbdf34a fix "kdump: Invalid kdump config option auto_reset_crashkernel" error
kdumpctl only accepts a specified set of options. Add
auto_reset_crashkernel to this set.

Fixes: 73ced7f ("introduce the auto_reset_crashkernel option to kdump.conf")
Signed-off-by: Coiby Xu <coxu@redhat.com>
Acked-by: Tao Liu <ltao@redhat.com>
2022-01-07 12:20:37 +08:00
Coiby Xu
d5c31605f3 use grep -s to suppress error messages about nonexistent or unreadable files
When a file doesn't exist or isn't readable, grep complains as follows,

grep: /proc/cmdline: No such file or directory
grep: /etc/kernel/cmdline: No such file or directory

/proc/cmdline doesn't exist when installing package for an OS image and
/etc/kernel/cmdline may not exist if osbuild doesn't want set custom
kernel cmdline.

Use "-s" to suppress the error messages.

Fixes: 0adb0f4 ("try to reset kernel crashkernel when kexec-tools updates the default crashkernel value")
Fixes: ddd428a ("set up kernel crashkernel for osbuild in kernel hook")
Signed-off-by: Coiby Xu <coxu@redhat.com>
Acked-by: Tao Liu <ltao@redhat.com>
2022-01-07 12:20:21 +08:00
Tao Liu
2bd59ee156 kdump-lib.sh: Escape '|' for 'failure_action|default' in is_dump_to_rootfs
The '|' in 'failure_action|default' should be replaced with '\|' when
passed to kdump_get_conf_val function. Because '|' needs to be escaped
to mean OR operation in sed regex, otherwise it will consider
'failure_action|default' as a whole string.

Fixes: ab1ef78 ("kdump-lib.sh: use kdump_get_conf_val to read config values")

v1 -> v2:
Rephased the commit message.
Replaced " with '.

Signed-off-by: Tao Liu <ltao@redhat.com>
Acked-by: Coiby Xu <coxu@redhat.com>
2022-01-06 12:23:12 +08:00
Tao Liu
0311f6e25b Set zstd as the default compression method for kdump initrd
zstd has better compression ratio and time consumption balance.
When no customized compression method specified in kdump.conf,
we will use zstd as the default compression method.

**The test method:

I installed kexec-tools with and without the patch, executing the following
command for 4 times, and calculate the averange time:

$ rm -f /boot/initramfs-*kdump.img && time kdumpctl rebuild && \
  ls -ail /boot/initramfs-*kdump.img

**The test result:

Bare metal x86_64 machine:
        dracut with squash module
         zlib     lzo      xz       lz4        zstd
real     10.6282  11.0398  11.395   8.6424    10.1676
user      9.8932  11.9072  14.2304  2.8286     8.6468
sys       3.523    3.4626   3.6028  3.5        3.4942
size of
kdump.img 30575616 31419392 27102208 36666368 29236224

        dracut without squash module
        zlib      lzo      xz       lz4        zstd
real     9.509    19.4876  11.6724  9.0338    10.267
user    10.6028   14.516   17.8662  4.0476     9.0936
sys      2.942     2.9184   3.0662  2.9232     3.0662
size of
kdump.img 19247949 19958120 14505056 21112544 17007764

PowerVM hosted ppc64le VM:
        dracut with squash module | dracut without sqaush module
         zlib        zstd         |  zlib          zstd
real     10.6742     10.7572      |   9.7676       10.5722
user     18.754      19.8338      |  20.7932       13.179
sys       1.8358      1.864       |   1.637         1.663
                                  |
size of                           |
kdump.img 36917248   35467264     |  21441323      19007108

**discussion

zstd has a better compression ratio and time consumption balance.

v1 -> v2:
Use kdump_get_conf_val() to get dracut_args values of kdump.conf

v2 -> v3:
Attached testing benchmark

v3 -> v4:
Re-measured and re-attached the testing benchmark of x86_64 and ppc64le.
Changed regex '.*[[:space:]]' to '(^|[[:space:]])'

v4 -> v5:
Attacked lzo/xz/lz4 testing benchmark.

v5 -> v6:
Add zstd as required in kexec-tools.spec

Hello Coiby, you may use "RELEASE=34 make test-run", for
CONFIG_RD_ZSTD is enabled since fc-cloud-34

Acked-by: Coiby Xu <coxu@redhat.com>
Signed-off-by: Tao Liu <ltao@redhat.com>
2022-01-06 08:16:27 +08:00
Coiby Xu
0e162120b6 update crashkernel-howto
Update crashkernel-howto since crashkernel.default has been removed. The
documentation is also simplified as a result.

Reviewed-by: Pingfan Liu <piliu@redhat.com>
Reviewed-by: Philipp Rudo <prudo@redhat.com>
Signed-off-by: Coiby Xu <coxu@redhat.com>
2022-01-05 09:40:24 +08:00
Coiby Xu
ddd428a1d0 set up kernel crashkernel for osbuild in kernel hook
osbuild is a tool to build OS images. It uses bwrap to install packages
inside a sandbox/container. Since the kernel package recommends
kexec-tools which in turn recommends grubby, the installation order would
be grubby -> kexec-tools -> kernel. So we can use the kernel hook
92-crashkernel.install provided by kexec-tools to set up kernel
crashkernel for the target OS image. But in osbuild's case, there is no
current running kernel and running `uname -r` in the container/sandbox
actually returns the host kernel release. To set up kernel crashkernel for
the OS image built by osbuild, a different logic is needed.

We will check if kernel hook is running inside the osbuild container
then set up kernel crashkernel only if osbuild hasn't specified a
custome value. osbuild exposes [1] the container=bwrap-osbuild environment
variable. According to [2], the environment variable is not inherited down
the process tree, so we need to check /proc/1/environ to detect this
environment variable to tell if the kernel hook is running inside a
bwrap-osbuild container. After that we need to know if osbuild wants to use
custom crashkernel value. This is done by checking if /etc/kernel/cmdline
has crashkernel set [3]. /etc/kernel/cmdline is written before packages
are installed.

[1] https://github.com/osbuild/osbuild/pull/926
[2] https://systemd.io/CONTAINER_INTERFACE/
[3] https://bugzilla.redhat.com/show_bug.cgi?id=2024976#c5

Reviewed-by: Pingfan Liu <piliu@redhat.com>
Reviewed-by: Philipp Rudo <prudo@redhat.com>
Signed-off-by: Coiby Xu <coxu@redhat.com>
2022-01-05 09:40:24 +08:00
Coiby Xu
5e8c751c39 reset kernel crashkernel for the special case where the kernel is updated right after kexec-tools
When kexec-tools updates the default crashkernel value, it will try to
reset the existing installed kernels including the currently running
kernel. So the running kernel could have different kernel cmdline
parameters from /proc/cmdline. When installing a kernel after updating
kexec-tools, /usr/lib/kernel/install.d/20-grub.install would be called
by kernel-install [1] which would use /proc/cmdline to set up new kernel's
cmdline. To address this special case, reset the new kernel's crashkernel
and fadump value to the value that would be used by running kernel after
rebooting by the installation hook. One side effect of this commit is it
would reset the installed kernel's crashkernel even currently running kernel
don't use the default crashkernel value after rebooting. But I think this
side effect is a benefit for the user.

The implementation depends on kernel-install which run the scripts in
/usr/lib/kernel/install.d passing the following arguments,

  add KERNEL-VERSION $BOOT/MACHINE-ID/KERNEL-VERSION/ KERNEL-IMAGE [INITRD-FILE ...]

An concrete example is given as follows,
  add 5.11.12-300.fc34.x86_64 /boot/e986846f63134c7295458cf36300ba5b/5.11.12-300.fc34.x86_64 /lib/modules/5.11.12-300.fc34.x86_64/vmlinuz

kernel-install could be started by the kernel package's RPM scriplet [2].
As mentioned in previous commit "try to reset kernel crashkernel when
kexec-tools updates the default crashkernel value", kdumpctl has difficulty
running in RPM scriptlet fore CoreOS. But rpm-ostree ignores all kernel hooks,
there is no need to disable the kernel hook for CoreOS/Atomic/Silverblue. But a
collaboration between rpm-ostree and kexec-tools is needed [3] to take care
of this special case.

Note the crashkernel.default support is dropped.

[1] https://www.freedesktop.org/software/systemd/man/kernel-install.html
[2] https://src.fedoraproject.org/rpms/kernel/blob/rawhide/f/kernel.spec#_2680
[3] https://github.com/coreos/rpm-ostree/issues/2894

Reviewed-by: Pingfan Liu <piliu@redhat.com>
Reviewed-by: Philipp Rudo <prudo@redhat.com>
Signed-off-by: Coiby Xu <coxu@redhat.com>
2022-01-05 09:40:24 +08:00
Coiby Xu
0adb0f4a8c try to reset kernel crashkernel when kexec-tools updates the default crashkernel value
kexec-tools could update the default crashkernel value.
When auto_reset_crashkernel=yes, reset kernel to new crashkernel
value in the following two cases,
 - crashkernel=auto is found in the kernel cmdline
 - the kernel crashkernel was previously set by kexec-tools i.e.
   the kernel is using old default crashkernel value

To tell if the user is using a custom value for the kernel crashkernel
or not, we assume the user would never use the default crashkernel value
as custom value. When kexec-tools gets updated,
 1. save the default crashkernel value of the older package to
    /tmp/crashkernel (for POWER system, /tmp/crashkernel_fadump is saved
    as well).
 2. If auto_reset_crashkernel=yes, iterate all installed kernels.
    For each kernel, compare its crashkernel value with the old
    default crashkernel and reset it if yes

The implementation makes use of two RPM scriptlets [2],
 - %pre is run before a package is installed so we can use it to save
   old default crashkernel value
 - %post is run after a package installed so we can use it to try to reset
   kernel crashkernel

There are several problems when running kdumpctl in the RPM scripts
for CoreOS/Atomic/Silverblue, for example, the lock can't be acquired by
kdumpctl, "rpm-ostree kargs" can't be run and etc.. So don't enable this
feature for CoreOS/Atomic/Silverblue.

Note latest shellcheck (0.8.0) gives false positives about the
associative array as of this commit. And Fedora's shellcheck is 0.7.2
and can't even correctly parse the shell code because of the associative
array.

[1] https://github.com/koalaman/shellcheck/issues/2399
[2] https://docs.fedoraproject.org/en-US/packaging-guidelines/Scriptlets/

Reviewed-by: Pingfan Liu <piliu@redhat.com>
Reviewed-by: Philipp Rudo <prudo@redhat.com>
Signed-off-by: Coiby Xu <coxu@redhat.com>
2022-01-05 09:40:24 +08:00
Coiby Xu
73ced7f451 introduce the auto_reset_crashkernel option to kdump.conf
This option will determine whether to reset kernel crashkernel
to new default value or not when kexec-tools updates the default
crashkernel value and existing kernels using the old default kernel
crashkernel value. Default to yes.

Reviewed-by: Pingfan Liu <piliu@redhat.com>
Reviewed-by: Philipp Rudo <prudo@redhat.com>
Signed-off-by: Coiby Xu <coxu@redhat.com>
2022-01-05 09:40:24 +08:00
Coiby Xu
140da74a34 rewrite reset_crashkernel to support fadump and to used by RPM scriptlet
Rewrite kdumpctl reset-crashkernel KERNEL_PATH as
kdumpctl reset-crashkernel [--fadump=[on|off|nocma]]  [--kernel=path_to_kernel] [--reboot]

This interface would reset a specific kernel to the default crashkernel value
given the kernel path. And it also supports grubby's syntax so there are the
following special cases,
 - if --kernel not specified,
    - use KDUMP_KERNELVER if it's defined in /etc/sysconfig/kdump
    - otherwise use current running kernel, i.e. `uname -r`
 - if --kernel=DEFAULT, the default boot kernel is chosen
 - if --kernel=ALL, all kernels would have its crashkernel reset to the
   default value and the /etc/default/grub is updated as well

--fadump=[on|off|nocma] toggles fadump on/off for the kernel provided
in KERNEL_PATH. If --fadump is omitted, the dump mode is determined by
parsing the kernel command line for the kernel(s) to update.

CoreOS/Atomic/Silverblue needs to be treated as a special case because,
 - "rpm-ostree kargs" is used to manage kernel command line parameters
    so --kernel doesn't make sense and there is no need to find current
    running kernel
 - "rpm-ostree kargs" itself would prompt the user to reboot the system
   after modify the kernel command line parameter
 - POWER is not supported so we can assume the dump mode is always kdump

This interface will also be called by kexec-tools RPM scriptlets [1]
to reset crashkernel.

Note the support of crashkenrel.default is dropped.

[1] https://docs.fedoraproject.org/en-US/packaging-guidelines/Scriptlets/

Reviewed-by: Pingfan Liu <piliu@redhat.com>
Reviewed-by: Philipp Rudo <prudo@redhat.com>
Signed-off-by: Coiby Xu <coxu@redhat.com>
2022-01-05 09:40:24 +08:00
Coiby Xu
12ecbce359 fix incorrect usage of rpm-ostree to update kernel command line parameters
CoreOS/Atomic/Silverblue use "rpm-ostree kargs" to manage kernel command
line parameters.

Fixes: 86130ec ("kdumpctl: Add kdumpctl reset-crashkernel")

Reviewed-by: Pingfan Liu <piliu@redhat.com>
Reviewed-by: Philipp Rudo <prudo@redhat.com>
Signed-off-by: Coiby Xu <coxu@redhat.com>
2022-01-05 09:40:24 +08:00
Coiby Xu
945cbbd59b add helper functions to get kernel path by kernel release and the path of current running kernel
grubby --info=kernel-path or --add-kernel=kernel-path accepts a kernel
path (e.g. /boot/vmlinuz-5.14.14-200.fc34.x86_64) instead of kernel release
(e.g 5.14.14-200.fc34.x86_64). So we need to know the kernel path given
a kernel release. Although for Fedora/RHEL, the kernel path is
"/boot/vmlinuz-<KERNEL_RELEASE>", a path kernel could also be
/boot/<machine-id>/<KERNEL_RELEASE>/vmlinuz. So the most reliable way to
find the kernel path given a kernel release is to use "grubby --info".

For osbuild, a kernel path may not yet exist but it's valid for
"grubby --update-kernel=KERNEL_PATH". For example, "grubby -info" may
output something as follows,

index=0
kernel="/var/cache/osbuild-worker/osbuild-store/tmp/tmp2prywdy5object/tree/boot/vmlinuz-5.15.10-100.fc34.x86_64"
args="ro no_timer_check net.ifnames=0 console=tty1 console=ttyS0,115200n8"
root="UUID=76a22bf4-f153-4541-b6c7-0332c0dfaeac"
initrd="/var/cache/osbuild-worker/osbuild-store/tmp/tmp2prywdy5object/tree/boot/initramfs-5.15.10-100.fc34.x86_64.img"

There is no need to check if path like
/var/cache/osbuild-worker/osbuild-store/tmp/tmp2prywdy5object/tree/boot/vmlinuz-5.15.10-100.fc34.x86_64
physically exists.

Note these helper functions doesn't support CoreOS/Atomic/Silverblue
since grubby isn't used by them.

Reviewed-by: Pingfan Liu <piliu@redhat.com>
Reviewed-by: Philipp Rudo <prudo@redhat.com>
Signed-off-by: Coiby Xu <coxu@redhat.com>
2022-01-05 09:40:24 +08:00
Coiby Xu
3d2079c31c add helper functions to get dump mode
Add a helper function to get dump mode. The dump mode would be
 - fadump if fadump=on or fadump=nocma
 - kdump if fadump=off or empty fadump

Otherwise return 1.

Also add another helper function to return a kernel's dump mode.

Reviewed-by: Pingfan Liu <piliu@redhat.com>
Reviewed-by: Philipp Rudo <prudo@redhat.com>
Signed-off-by: Coiby Xu <coxu@redhat.com>
2022-01-05 09:40:24 +08:00
Coiby Xu
fb9e6838ab add a helper function to read kernel cmdline parameter from grubby --info
This helper function will be used to retrieve the value of kernel
cmdline parameters including crashkernel, fadump, swiotlb and etc.

Suggested-by: Philipp Rudo <prudo@redhat.com>
Reviewed-by: Pingfan Liu <piliu@redhat.com>
Signed-off-by: Coiby Xu <coxu@redhat.com>
2022-01-05 09:40:24 +08:00
Coiby Xu
796d0f6fd2 provide kdumpctl get-default-crashkernel for kdump_anaconda_addon and RPM scriptlet
Provide "kdumpctl get-default-crashkernel" for kdump_anaconda_addon
so crashkernel.default isn't needed.

When fadump is on, kdump_anaconda_addon would need to specify the dump
mode, i.e. "kdumpctl get-default-crashkernel fadump".

This interface would also be used by RPM scriptlet [1] to fetch default
crashkernel value.

[1] https://docs.fedoraproject.org/en-US/packaging-guidelines/Scriptlets/

Reviewed-by: Pingfan Liu <piliu@redhat.com>
Reviewed-by: Philipp Rudo <prudo@redhat.com>
Signed-off-by: Coiby Xu <coxu@redhat.com>
2022-01-05 09:40:24 +08:00
Coiby Xu
105c01691a factor out kdump_get_arch_recommend_crashkernel
Factor out kdump_get_arch_recommend_crashkernel to prepare for
kdump-anaconda-plugin for example to retrieve the default crashkernel
value.

Note the support of crashkenrel.default is dropped.

Reviewed-by: Pingfan Liu <piliu@redhat.com>
Reviewed-by: Philipp Rudo <prudo@redhat.com>
Signed-off-by: Coiby Xu <coxu@redhat.com>
2022-01-05 09:40:24 +08:00
Coiby Xu
34d27c4c30 update default crashkernel value
It has been decided to increase default crashkernel value to reduce the
possibility of OOM.

Fixes: 7b7ddab ("kdump-lib.sh: kdump_get_arch_recommend_size uses crashkernel.default")

Reviewed-by: Pingfan Liu <piliu@redhat.com>
Reviewed-by: Philipp Rudo <prudo@redhat.com>
Signed-off-by: Coiby Xu <coxu@redhat.com>
2022-01-05 09:40:13 +08:00
Kairui Song
546c81a205 kdumpctl: remove some legacy code
It seems the save_core function and vmcore detection was used a long
time ago when kdump shares same userspace in first and second kernel.
It's now heavily deprecated (only support cp, hardcoded path, dumpoops
no longer exists) and not used.

Now vmcore will never show up in first kernel for both kdump and fadump
case, and kdumpctl is only used in first kernel, so just remove them.

Signed-off-by: Kairui Song <kasong@tencent.com>
Acked-by: Coiby Xu <coxu@redhat.com>
2021-12-31 11:37:19 +08:00
Tao Liu
004daebeff dracut-early-kdump-module-setup.sh: install xargs and kdump-lib-initramfs.sh
For earlykdump, kdump-lib-initramfs.sh is sourced by kdump-lib.sh,
however it is not installed in dracut-early-kdump-module-setup.sh. Same
as xargs, which is used by kdump-lib.sh. Otherwise earlykdump will report
file not found errors.

Fixes: a5faa052d4
       ("kdump-lib-initramfs.sh: prepare to be a POSIX compatible lib")
Fixes: 4f01cb1b0a
       ("kdump-lib.sh: fix variable quoting issue")

Signed-off-by: Tao Liu <ltao@redhat.com>
Acked-by: Coiby Xu <coxu@redhat.com>
2021-12-27 09:16:19 +08:00
Pingfan Liu
163c02970e ppc64/ppc64le: drop cpu online rule in 40-redhat.rules in kdump initramfs
Onlining secondary cpus breaks kdump completely on KVM on Power hosts
Though we use maxcpus=1 by default but 40-redhat.rules will bring up all
possible cpus by default.

Thus before we get the kernel fix and the systemd rule fix let's remove
the cpu rule in 40-redhat.rules for ppc64/ppc64le kdump initramfs.

This is back ported from RHEL, and original credit goes to Dave Young
<dyoung@redhat.com>

Signed-off-by: Pingfan Liu <piliu@redhat.com>
Acked-by: Tao Liu <ltao@redhat.com>
2021-12-23 15:40:23 +08:00
Coiby Xu
f0892eeceb kdump/ppc64: suppress the error message "Could not find a registered notification tool" from servicelog_notify
When kexec-tools is newly installed, kdump migration action hasn't
registered and the following error could occur,
  INF dnf.rpm: Could not find a registered notification tool with the specified command ('/usr/lib/kdump/kdump-migrate-action.sh').

"servicelog_notify --list" could list registered notification tools for
a command but it outputs the above error as well. So simply redirect the
error to /dev/null when running "servicelog_notify --remove".

Fixes: commit 146f662622
       ("kdump/ppc64: migration action registration clean up")

Acked-by: Tao Liu <ltao@redhat.com>
Acked-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Coiby Xu <coxu@redhat.com>
2021-12-03 08:29:04 +08:00
Coiby Xu
c3c8df3745 add keytuils as a weak dependency for POWER
When secureboot is enabled, kdumpctl needs to use keyctl to add/remove
a key to/from the .ima keyring.

Fixes: commit 596fa0a07f
       ("kdumpctl: enable secure boot on ppc64le LPARs")

Signed-off-by: Coiby Xu <coxu@redhat.com>
2021-11-18 15:10:40 +08:00
Pingfan Liu
8cc51f3ab9 Document/kexec-kdump-howto.txt: improve notes for kdump_pre and kdump_post scripts
Signed-off-by: Pingfan Liu <piliu@redhat.com>
Acked-by: Coiby Xu <coxu@redhat.com>
2021-11-16 13:30:29 +08:00
Pingfan Liu
c59dfe938e sysconfig: make kexec_file_load as default option on ppc64le
Signed-off-by: Pingfan Liu <piliu@redhat.com>
Acked-by: Coiby Xu <coxu@redhat.com>
2021-11-12 11:47:27 +08:00
Pingfan Liu
960a132c31 sysconfig: make kexec_file_load as default option on aarch64
Signed-off-by: Pingfan Liu <piliu@redhat.com>
Acked-by: Coiby Xu <coxu@redhat.com>
2021-11-12 11:46:31 +08:00
Tao Liu
9ffda5bc1c Enable zstd compression for makedumpfile in kexec-tools.spec
The Zstandard (zstd) compression method is not enabled:

    $ makedumpfile -v
    makedumpfile: version 1.7.0 (released on 8 Nov 2021)
    lzo	        enabled
    snappy	enabled
    zstd	disabled

This patch will enable it when building kexec-tools rpm package.

Signed-off-by: Tao Liu <ltao@redhat.com>
Acked-by: Coiby Xu <coxu@redhat.com>
2021-11-11 16:36:13 +08:00
Coiby Xu
267a088b2a Release 2.0.23-1
Signed-off-by: Coiby Xu <coxu@redhat.com>
2021-11-08 14:42:45 +08:00
Coiby Xu
8b9948df33 Update makedumpfile to 1.7.0
Signed-off-by: Coiby Xu <coxu@redhat.com>
2021-11-08 14:42:22 +08:00
Coiby Xu
6936fbc1b2 fix broken extra_bins when installing multiple binaries
When there more than one binaries, quoting "$val" would make
dracut-install treat multiple binaries as one binary. Take
"extra_bins /usr/sbin/ping /usr/sbin/ip" as an example, the
following error would occur when building initrd,

dracut-install: ERROR: installing '/usr/sbin/ping /usr/sbin/ip'
dracut: FAILED: /usr/lib/dracut/dracut-install -D /var/tmp/dracut.ODrioZ/initramfs -a /usr/sbin/ping /usr/sbin/ip

Fix it by not quoting the variable and bypassing SC2086 shellcheck.

Fixes: commit 86538ca6e2
       ("bash scripts: fix variable quoting issue")

Acked-by: Tao Liu <ltao@redhat.com>
Signed-off-by: Coiby Xu <coxu@redhat.com>
2021-11-05 17:50:45 +08:00
Tao Liu
727251e52e mkdumprd: drop mountaddr/mountproto nfs mount options
nfs service will append extra mount options to kernel mount options.
Such as mountaddr/mountproto options. These options only represent
current mounting details of the 1st kernel, but may not appropriate
for the 2nd kernel for the same reason as commit
d4f04afa47 ("mkdumprd: drop some nfs
		mount options when reading from kernel"). This patch will remove
these options.

Signed-off-by: Tao Liu <ltao@redhat.com>
Acked-by: Coiby Xu <coxu@redhat.com>
2021-11-02 10:55:11 +08:00
Coiby Xu
294c965ca3 selftest: kill VM reliably by recursively kill children processes
qemu is launched in nested subprocess and can't be killed by simply
killing the job ids,

        PID    Command
    2269634    │  ├─ sshd: root [priv]
    2269637    │  │  └─ sshd: root@pts/0
    2269638    │  │     └─ -bash
    2269744    │  │        └─ make test-run V=1
    2273117    │  │           └─ /bin/bash /root/kexec-tools-300/tests/scripts/run-test.sh
    2273712    │  │              ├─ /bin/bash /root/kexec-tools-300/tests/scripts/run-test.sh
    2273714    │  │              │  └─ /bin/bash /root/kexec-tools-300/tests/scripts/run-test.sh
    2273737    │  │              │     └─ timeout --foreground 10m /root/kexec-tools-300/tests/scripts/run-qemu -nodefaults -nographic -smp 2 -m 768M -monitor no
    2273738    │  │              │        └─ /usr/bin/qemu-system-x86_64 -enable-kvm -cpu host -nodefaults -nographic -smp 2 -m 768M -monitor none -serial stdio
    2273746    │  │              │           ├─ /usr/bin/qemu-system-x86_64 -enable-kvm -cpu host -nodefaults -nographic -smp 2 -m 768M -monitor none -serial std
    2273797    │  │              ├─ /bin/bash /root/kexec-tools-300/tests/scripts/run-test.sh
    2273798    │  │              │  └─ /bin/bash /root/kexec-tools-300/tests/scripts/run-test.sh
    2273831    │  │              │     └─ timeout --foreground 10m /root/kexec-tools-300/tests/scripts/run-qemu -nodefaults -nographic -smp 2 -m 768M -monitor no
    2273832    │  │              │        └─ /usr/bin/qemu-system-x86_64 -enable-kvm -cpu host -nodefaults -nographic -smp 2 -m 768M -monitor none -serial stdio
    2273840    │  │              │           ├─ /usr/bin/qemu-system-x86_64 -enable-kvm -cpu host -nodefaults -nographic -smp 2 -m 768M -monitor none -serial std

This led to the error "qemu-system-x86_64: can't bind ip=0.0.0.0 to
socket: Address already in use".

This patch will kill qemu by killing all the children of the job id.

Signed-off-by: Coiby Xu <coxu@redhat.com>
Acked-by: Kairui Song <kasong@redhat.com>
2021-10-15 19:20:25 +08:00