Commit Graph

1629 Commits

Author SHA1 Message Date
Philipp Rudo ea00b7db43 kdumpctl: Move temp file in get_kernel_size to global temp dir
Others will need to use a temporary files, too. In order to avoid
potential clashes of multiple trap handlers move the local temp file
into a global temp dir.

While at it make sure that the trap handler returns the correct exit
code.

Signed-off-by: Philipp Rudo <prudo@redhat.com>
Reviewed-by: Pingfan Liu <piliu@redhat.com>
Reviewed-by: Coiby Xu <coxu@redhat.com>
2023-05-16 09:21:13 +08:00
Philipp Rudo 81d89c885f kdumpctl: Move get_kernel_size to kdumpctl
The function is only used in do_estimate. Move it to kdumpctl to
prevent confusion.

Signed-off-by: Philipp Rudo <prudo@redhat.com>
Reviewed-by: Pingfan Liu <piliu@redhat.com>
Reviewed-by: Coiby Xu <coxu@redhat.com>
2023-05-16 09:20:59 +08:00
Philipp Rudo 0ff44ca6e8 kdumpctl: fix is_dracut_mod_omitted
The function is pretty broken right now. To start with the -o/--omit
option allows a quoted, space separated list of modules. But using 'set'
breaks quotation and thus only considers the first element in the list.
Furthermore dracut uses getopt internally. This means that it is also
possible to pass the list via --omit=.

Fix the function by making use of getopt for parsing the dracut_args.
While at it also add a test cases to cover the functions.

Signed-off-by: Philipp Rudo <prudo@redhat.com>
Reviewed-by: Coiby Xu <coxu@redhat.com>
2023-04-17 14:49:51 +08:00
Philipp Rudo f81e6ca8da kdump-lib: move is_dracut_mod_omitted to kdumpctl
The function is only used in kdumpctl. Thus move it there to keep
kdump-lib small and simple.

Signed-off-by: Philipp Rudo <prudo@redhat.com>
Reviewed-by: Coiby Xu <coxu@redhat.com>
2023-04-17 14:49:51 +08:00
Philipp Rudo 9eb39cda3c kdump-lib: remove get_nmcli_connection_apath_by_ifname
The function isn't used anywhere. Thus remove it to keep kdump-lib small
and simple.

Signed-off-by: Philipp Rudo <prudo@redhat.com>
Reviewed-by: Coiby Xu <coxu@redhat.com>
2023-04-17 14:49:51 +08:00
Philipp Rudo 62c41e5343 kdump-lib: remove get_nmcli_field_by_conpath
The function isn't used anywhere. Thus remove it to keep kdump-lib small
and simple.

Signed-off-by: Philipp Rudo <prudo@redhat.com>
Reviewed-by: Coiby Xu <coxu@redhat.com>
2023-04-17 14:49:51 +08:00
Philipp Rudo f9d8cabfd1 dracut-module-setup: remove dead source_ifcfg_file
With the NetworkManager rewrite this function in no longer used. This
also allows to remove a lot of dead code in kdump-lib.

Signed-off-by: Philipp Rudo <prudo@redhat.com>
Reviewed-by: Coiby Xu <coxu@redhat.com>
2023-04-17 14:49:51 +08:00
Philipp Rudo 258d285c63 kdump-lib-initramfs: remove is_fs_dump_target
The function isn't used anywhere. Thus remove it to keep
kdump-lib-initramfs small and simple.

Signed-off-by: Philipp Rudo <prudo@redhat.com>
Reviewed-by: Lichen Liu <lichliu@redhat.com>
Reviewed-by: Coiby Xu <coxu@redhat.com>
2023-04-17 14:49:51 +08:00
Philipp Rudo ca306cd403 kdump-lib-initramfs: harden is_mounted
If the device/mountpoint for findmnt is omitted findmnt will list all
mounted filesystems. In that case it will always return "true". So
explicitly check if an argument was passed to prevent false-positives.

Signed-off-by: Philipp Rudo <prudo@redhat.com>
Reviewed-by: Coiby Xu <coxu@redhat.com>
2023-04-17 14:49:51 +08:00
Coiby Xu 12d9eff9dc Show how much time kdump has waited for the network to be ready
Relates: https://bugzilla.redhat.com/show_bug.cgi?id=2151504

Currently, when the network isn't ready, kdump would repeatedly print
the same info,

    [   29.537230] kdump[671]: Bad kdump network destination: 192.123.1.21
    [   30.559418] kdump[679]: Bad kdump network destination: 192.123.1.21
    [   31.580189] kdump[687]: Bad kdump network destination: 192.123.1.21

This is not user-friendly and users may think kdump has got stuck. So
also show much time has waited for the network to be ready,

    [   29.546258] kdump[673]: Waiting for network to be ready (50s / 10min)
    ...
    [   32.608967] kdump[697]: Waiting for network to be ready (56s / 10min)

Note kdump_get_ip_route no longer prints an error message and it's up to
the caller to determine the log level and print relevant messages. And
kdump_collect_netif_usage aborts when kdump_get_ip_route fails.

Reported-by: Martin Pitt <mpitt@redhat.com>
Signed-off-by: Coiby Xu <coxu@redhat.com>
Reviewed-by: Philipp Rudo <prudo@redhat.com>
2023-04-15 06:39:17 +08:00
Coiby Xu df6f25ff20 Tell nmcli to not escape colon when getting the path of connection profile
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2151504

When a NetworManager connection profile contains a colon in the name,
"nmcli --get-values UUID,FILENAME" by default would escape the colon
because a colon is also used for separating the values. In this case,
99kdumpbase fails to get the correct connection profile path,
	kdumpctl[5439]: cp: cannot stat '/run/NetworkManager/system-connections/static-52\\\:54\\\:01.nmconnection': No such file or directory
	kdumpctl[5440]: sed: can't read /tmp/1977-DRACUT_KDUMP_NM/ifcfg-static-52-54-01: No such file or directory
	kdumpctl[5449]: dracut-install: ERROR: installing '/tmp/1977-DRACUT_KDUMP_NM/ifcfg-static-52-54-01' to '/etc/NetworkManager/system-connections/ifcfg-static-52-54-01'

As a result, dumping vmcore to a remote nfs would fail.

In our case of getting connection profile path, there is no need to escape the
colon so pass "-escape no" to nmcli,

	[root@localhost ~]# nmcli --get-values UUID,FILENAME c show
	659e09c1-a6bd-3549-9be4-a07a1a9a8ffd:/etc/NetworkManager/system-connections/aa\:bb.nmconnection

	[root@localhost ~]# nmcli -escape no --get-values UUID,FILENAME c show
	659e09c1-a6bd-3549-9be4-a07a1a9a8ffd:/etc/NetworkManager/system-connections/aa:bb.nmconnection

Suggested-by: Beniamino Galvani <bgalvani@redhat.com>
Reported-by: Martin Pitt <mpitt@redhat.com>
Signed-off-by: Coiby Xu <coxu@redhat.com>
Reviewed-by: Philipp Rudo <prudo@redhat.com>
2023-04-14 20:22:49 +08:00
Lichen Liu d619b6dabe kdumpctl: lower the log level in reset_crashkernel_for_installed_kernel
Although upgrading the kernel with `rpm -Uvh` is not recommended, the
kexec-tools plugin prints confusing error logs when a customer upgrades the
kernel through it.

```
kdump: kernel 5.14.0-80.el9.x86_64 doesn't exist
kdump: Couldn't find current running kernel
```

Not finding the currently running kernel will only make kdump unable to copy the
grub entry parameters to the newly installed kernel, so lower the log level.

Signed-off-by: Lichen Liu <lichliu@redhat.com>
Reviewed-by: Coiby Xu <coxu@redhat.com>
2023-04-10 12:20:15 +08:00
Coiby Xu 70c7598ef0 Install nfsv4-related drivers when users specify nfs dumping via dracut_args
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2140721

Currently, if users specify dumping to nfsv4 target via
  dracut_args --mount "<NFS-server-ip>:/var/crash /mnt nfs defaults"
it fails with the following errors,
    [    5.159760] mount[446]: mount.nfs: Protocol not supported
    [    5.164502] systemd[1]: mnt.mount: Mount process exited, code=exited, status=32/n/a
    [    5.167616] systemd[1]: mnt.mount: Failed with result 'exit-code'.
    [FAILED] Failed to mount /mnt.

This is because nfsv4-releted drivers are not installed to kdump initrd.
mkdumprd calls dracut with "--hostonly-mode strict". If nfsv4-related
drivers aren't loaded before calling dracut, they won't be installed.
When users specify nfs dumping via dracut_args, kexec-tools won't mount
the nfs fs beforehand hence nfsv4-related drivers won't be installed.
Note dracut only installs the nfs driver i.e. nfsv3 driver for "--mount
... nfs". So also install nfsv4-related drivers when users specify nfs
dumping via dracut_args. Since nfs_layout_nfsv41_files depends on nfsv4,
the nfsv4 driver will be installed automatically.

As for the reason why we support nfs dumping via dracut_args instead of
asking user to use the nfs directive, please refer to commit 74c6f464
("Support special mount information via 'dracut_args'").

Fixes: 4eedcae5 ("dracut-module-setup.sh: don't include multipath-hostonly")
Reported-by: rcheerla@redhat.com
Signed-off-by: Coiby Xu <coxu@redhat.com>
Reviewed-by: Philipp Rudo <prudo@redhat.com>
2023-03-21 15:55:57 +08:00
Philipp Rudo d9dfea12da sysconfig: add zfcp.allow_lun_scan to KDUMP_COMMANDLINE_REMOVE on s390
Probing unnecessary I/O devices wastes memory and in extreme cases can
cause the crashkernel to run OOM. That's why the s390-tools maintain
their own module, 95zdev-kdump [1], that disables auto LUN scanning and
only configures zfcp devices that can be used as dump target. So remove
zfcp.allow_lun_scan from the kernel command line to prevent that we
accidentally overwrite the default set by the module.

[1] https://github.com/ibm-s390-linux/s390-tools/blob/master/zdev/dracut/95zdev-kdump/module-setup.sh

Signed-off-by: Philipp Rudo <prudo@redhat.com>
Reviewed-by: Coiby Xu <coxu@redhat.com>
2023-03-13 15:30:00 +08:00
Coiby Xu 12e6cd2b76 Use the correct command to get architecture
`uname -m` was used by mistake. As a result, kexec-tools failed to
update crashkernel=auto during in-place upgrade from RHEL8 to RHEL9.

`uname -m` should be used to get architecture instead.

Fixes: 5951b5e2 ("Don't try to update crashkernel when bootloader is not installed")

Signed-off-by: Coiby Xu <coxu@redhat.com>
Reviewed-by: Lichen Liu <lichliu@redhat.com>
2023-02-21 11:33:06 +08:00
Coiby Xu b41cab7099 Release 2.0.26-3
Signed-off-by: Coiby Xu <coxu@redhat.com>
2023-01-30 17:39:35 +08:00
Philipp Rudo d4e877214c kdumpctl: make do_estimate more robust
At the beginning of do_estimate it currently checks whether the
TARGET_INITRD exists and if not fails with an error message. This not
only requires the user to manually trigger the build of the initrd but
also ignores all cases where the TARGET_INITRD exists but need to be
rebuild. For example when there were changes to kdump.conf or when the
system switches from kdump to fadump. All these changes will impact the
outcome of do_estimate. Thus properly check whether the initrd needs to
be rebuild and if it does trigger the rebuild automatically.

To do so move the check whether the TARGET_INITRD has fadump enabled to
is_system_modified and call this function. With this force_(no_)rebuild
options in kdump.conf are ignored to avoid unnecessary rebuilds.

While at it cleanup check_system_modified and rename it to
is_system_modified. Furthermore move printing the info that the initrd
gets rebuild to rebuild_initrd to avoid every caller has the same line.

Signed-off-by: Philipp Rudo <prudo@redhat.com>
Reviewed-by: Coiby Xu <coxu@redhat.com>
2023-01-30 17:37:23 +08:00
Philipp Rudo 37577b93ed kdumpctl: refractor check_rebuild
check_rebuild uses a bunch of local variables to store the result of the
different checks performed. At the end of the function it then evaluates
which check failed to print an appropriate info and trigger a rebuild if
needed. This not only makes the function hard to read but also requires
all checks to be executed even if an earlier one already determined that
the initrd needs to be rebuild. Thus refractor check_rebuild such that
it only checks whether the initrd needs to rebuild and trigger the
rebuild by the caller (if needed). While at it rename the function to
need_initrd_rebuild.

Furthermore also move setup_initrd to the caller so it is more consisted
with the other users of the function.

Signed-off-by: Philipp Rudo <prudo@redhat.com>
Reviewed-by: Coiby Xu <coxu@redhat.com>
2023-01-30 17:37:23 +08:00
Philipp Rudo 5eefcf2e94 kdumpctl: cleanup 'stop'
Like for 'start' move the printing of the error message to the calling
function. This not only makes the code more consistent to 'start' but
also prevents 'kdumpctl restart' to call 'start' in case 'stop' has
failed. This doesn't impact the case when 'kdumpctl restart' is run
without any crash kernel being loaded as kexec will still return success
in that case.

Signed-off-by: Philipp Rudo <prudo@redhat.com>
Reviewed-by: Coiby Xu <coxu@redhat.com>
2023-01-30 17:37:23 +08:00
Philipp Rudo 33b307af20 kdumpctl: cleanup 'start'
The function has many block of the kind

if ! cmd; then
  derror "Starting kdump: [FAILED]"
  return 1
fi

This duplicates code and makes the function hard to read. Thus move the
block to the calling function.

Signed-off-by: Philipp Rudo <prudo@redhat.com>
Reviewed-by: Coiby Xu <coxu@redhat.com>
2023-01-30 17:37:23 +08:00
Philipp Rudo 0f6ad91be8 kdump-lib: fix prepare_cmdline
A recently added unit test found that prepare_cmdline has several
problems. For example an empty remove list will remove all spaces or
when the cmdline contains a parameter with quoted values containing
spaces will only remove the beginning up to the first space. Furthermore
the old design requires lots of subshells and pipes.

This patch rewrites prepare_cmdline in a way that makes the unit test
happy and tries to use as many bash built-ins as possible.

Signed-off-by: Philipp Rudo <prudo@redhat.com>
Reviewed-by: Coiby Xu <coxu@redhat.com>
2023-01-30 17:37:23 +08:00
Philipp Rudo d55a056558 kdumpctl: move aws workaround to kdump-lib
Move the workaround for aws graviton cpus from load_kdump to
prepare_cmdline. This (1) makes the workaround available also for other
callers of prepare_cmdline (although not needed at the moment) and (2)
makes it easier to fix the problems found by the unit test included
earlier as all changes to the cmdline are done at one place now.

Signed-off-by: Philipp Rudo <prudo@redhat.com>
Reviewed-by: Coiby Xu <coxu@redhat.com>
2023-01-30 17:37:23 +08:00
Philipp Rudo 269c26972d unit tests: add tests for prepare_cmdline
prepare_cmdline is totally broken. For example if the remove list ($2)
is empty it removes all white spaces or if a parameter has a quoted
value containing a white space it only removes the first part of the
parameter up to the first space. Thus add a test case that shows what the
function should do in order to fix it in subsequent patches.

Signed-off-by: Philipp Rudo <prudo@redhat.com>
Reviewed-by: Coiby Xu <coxu@redhat.com>
2023-01-30 17:37:23 +08:00
Philipp Rudo 88919b73f0 kdump-lib: always specify version in is_squash_available
is_squash_available is only used in dracut-module-setup.sh and mkdumprd.
Neither of the two scripts calls prepare_kdump_bootinfo which determines
and sets KDUMP_KERNELVER. Thus KDUMP_KERNELVER is only non-zero if it
explicitly specified by the user in /etc/sysconfig/kdump (and the file
gets sourced, which is not the case for drachu-module-setup.sh).

In theory this can even lead to bugs. For example consider the case when
a debug kernel is running. In that case kdumpctl will try to use the
non-debug version of the kernel while is_squash_available will make its
decision based on the debug version. So in case the debug kernel has
squash available but the non-debug kernel doesn't mkdumprd will try to
add it nevertheless.

Thus factor out the kernel version detection from prepare_kdump_bootinfo
and make use of the new function when checking for the availability of
those kernel modules.

Signed-off-by: Philipp Rudo <prudo@redhat.com>
Reviewed-by: Coiby Xu <coxu@redhat.com>
2023-01-30 17:37:23 +08:00
Philipp Rudo 383cb62202 mkfadumprd: drop unset globals from debug output
mkfadumprd doesn't call prepare_kdump_bootinfo from kdump-lib.sh. Thus
both KDUMP_KERNELVER and DEFAULT_INITRD are always empty. Simply remove
them from the debug print.

Signed-off-by: Philipp Rudo <prudo@redhat.com>
Reviewed-by: Coiby Xu <coxu@redhat.com>
2023-01-30 17:37:23 +08:00
Philipp Rudo b9fd7a4076 kdumpctl: merge check_current_{kdump,fadump}_status
Both functions are almost identical. The only differences are (1) the
sysfs node the status is read from and (2) the fact the fadump version
doesn't verify if the file it's trying to read actually exists. Thus
merge the two functions and get rid of the check_current_status wrapper.

While at it rename the function to is_kernel_loaded which explains
better what the function does.

Finally, after moving FADUMP_REGISTER_SYS_NODE shellcheck can no longer
access the definition and starts complaining about it not being quoted.
Thus quote all uses of FADUMP_REGISTER_SYS_NODE.

Signed-off-by: Philipp Rudo <prudo@redhat.com>
Reviewed-by: Coiby Xu <coxu@redhat.com>
2023-01-30 17:37:23 +08:00
Philipp Rudo ed6936f9fc dracut-early-kdump: explicitly use bash
Currently dracut-early-kdump.sh claims to be POSIX compliant but it
sources kdump-lib.sh which uses bash-only syntax. Thus require bash for
dracut-early-kdump.sh as well.

Signed-off-by: Philipp Rudo <prudo@redhat.com>
Reviewed-by: Coiby Xu <coxu@redhat.com>
2023-01-30 17:37:23 +08:00
Philipp Rudo 0578245746 dracut-early-kdump: fix shellcheck findings
Fix the following issues found by shellcheck. For the second one disable
shellcheck as EARLY_KEXEC_ARGS can contain multiple arguments.

In dracut-early-kdump.sh line 9:
EARLY_KDUMP_KERNELVER=""
^-------------------^ SC2034: EARLY_KDUMP_KERNELVER appears unused. Verify use (or export if used externally).

In dracut-early-kdump.sh line 61:
    if $KEXEC $EARLY_KEXEC_ARGS $standard_kexec_args \
              ^---------------^ SC2086: Double quote to prevent globbing and word splitting.

For more information:
  https://www.shellcheck.net/wiki/SC2034
  https://www.shellcheck.net/wiki/SC2086

Signed-off-by: Philipp Rudo <prudo@redhat.com>
Reviewed-by: Coiby Xu <coxu@redhat.com>
2023-01-30 17:37:23 +08:00
Philipp Rudo d5faaee62b kdumpctl: simplify check_failure_action_config
With the deprecation of the 'default' option in kdump.conf
check_failure_action_config needed to track which option was used
(default or failure_action). This made the function quite complex.Thus
make option 'default' a true alias of 'failure_action' when parsing
kdump.conf and simplify check_failure_action_config.

Do the same simplifications for check_final_action_config as both
functions are basically identical.

Signed-off-by: Philipp Rudo <prudo@redhat.com>
Reviewed-by: Coiby Xu <coxu@redhat.com>
2023-01-30 17:37:23 +08:00
Coiby Xu f4c04c3d63 makedumpfile: Fix wrong exclusion of slab pages on Linux 6.2-rc1
Resolves: bz2155754
Upstream: https://github.com/makedumpfile/makedumpfile
Conflict: None

commit 5f17bdd2128998a3eeeb4521d136a192222fadb6
Author: Kazuhito Hagio <k-hagio-ab@nec.com>
Date:   Wed Dec 21 11:06:39 2022 +0900

    [PATCH] Fix wrong exclusion of slab pages on Linux 6.2-rc1

    * Required for kernel 6.2

    Kernel commit 130d4df57390 ("mm/sl[au]b: rearrange struct slab fields to
    allow larger rcu_head"), which is contained in Linux 6.2-rc1 and later,
    made the offset of slab.slabs equal to page.mapping's one.  As a result,
    "makedumpfile -d 8", which should exclude user data, excludes some slab
    pages wrongly because isAnon() returns true when slab.slabs is an odd
    number.  With such dumpfiles, crash can fail to start session with an
    error like this:

      # crash vmlinux dumpfile
      ...
      crash: page excluded: kernel virtual address: ffff8fa047ac2fe8 type: "xa_node shift"

    Make isAnon() check that the page is not slab to fix this.

    Signed-off-by: Kazuhito Hagio <k-hagio-ab@nec.com>

Reported-by: Baoquan He <bhe@redhat.com>
Acked-by: Baoquan He <bhe@redhat.com>
Signed-off-by: Coiby Xu <coxu@redhat.com>
2023-01-30 17:37:23 +08:00
Fedora Release Engineering 4ffc0113a6 Rebuilt for https://fedoraproject.org/wiki/Fedora_38_Mass_Rebuild
Signed-off-by: Fedora Release Engineering <releng@fedoraproject.org>
2023-01-19 14:23:04 +00:00
Coiby Xu 4895f2a2e9 Release 2.0.26-1
Signed-off-by: Coiby Xu <coxu@redhat.com>
2022-12-22 12:55:14 +08:00
Coiby Xu 5951b5e268 Don't try to update crashkernel when bootloader is not installed
Currently when using anaconda to install the OS, the following errors
occur,

    INF packaging: Configuring (running scriptlet for): kernel-core-5.14.0-70.el9.x86_64 ...
    INF dnf.rpm: grep: /boot/grub2/grubenv: No such file or directory
    grep: /boot/grub2/grubenv: No such file or directory
    grep: /boot/grub2/grubenv: No such file or directory
    grep: /boot/grub2/grubenv: No such file or directory
    ...
    INF packaging: Configuring (running scriptlet for): kexec-tools-2.0.23-9.el9.x86_64 ...
    INF dnf.rpm: grep: /boot/grub2/grubenv: No such file or directory
    grep: /boot/grub2/grubenv: No such file or directory
    grep: /boot/grub2/grubenv: No such file or directory

Or for s390, the following errors occur,

    INF packaging: Configuring (running scriptlet for): kernel-core-5.14.0-71.el9.s390x ...
    03:37:51,232 INF dnf.rpm: grep: /etc/zipl.conf: No such file or directory
    grep: /etc/zipl.conf: No such file or directory
    grep: /etc/zipl.conf: No such file or directory

    INF packaging: Configuring (running scriptlet for): kexec-tools-2.0.23-9_1.el9_0.s390x ...
    INF dnf.rpm: grep: /etc/zipl.conf: No such file or directory

This is because when anaconda installs the packages, bootloader hasn't
been installed and /boot/grub2/grubenv or /etc/zipl.conf doesn't exist.
So don't try to update crashkernel when bootloader isn't ready to avoid
the above errors.

Note this is the second attempt to fix this issue. Previously a file
/tmp/kexec_tools_package_install was created to avoid running the
related code thus to avoid the above errors but unfortunately that
approach has two issues a) somehow osbuild doesn't delete it for RHEL b)
this file could still exist if users manually remove kexec-tools.

Fixes: e218128 ("Only try to reset crashkernel for osbuild during package install")
Reported-by: Jan Stodola <jstodola@redhat.com>
Signed-off-by: Coiby Xu <coxu@redhat.com>
Reviewed-by: Philipp Rudo <prudo@redhat.com>
2022-12-22 10:33:00 +08:00
Coiby Xu bc101086e2 dracut-module-setup.sh: also install the driver of physical NIC for
Hyper-V VM with accelerated networking

Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2151842

Currently, vmcore dumping to remote fs fails on Azure Hyper-V VM with
accelerated networking because it uses a physical NIC for accrelarated
networking [1]. In this case, the driver for this physical NIC should be
installed as well.

[1] https://learn.microsoft.com/en-us/azure/virtual-network/accelerated-networking-overview

Fixes: a65dde2d ("Reduce kdump memory consumption by only installing needed NIC drivers")

Reported-by: Xiaoqiang Xiong <xxiong@redhat.com>
Signed-off-by: Coiby Xu <coxu@redhat.com>
Reviewed-by: Philipp Rudo <prudo@redhat.com>
2022-12-20 14:04:49 +08:00
Coiby Xu 3b22cce1cb dracut-module-setup.sh: skip installing driver for the loopback
interface

Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2151500

Currently, kdump initrd fails to be built when dumping vmcore to
localhost via ssh or nfs,

  kdumpctl[3331]: Cannot get driver information: Operation not supported
  kdumpctl[1991]: dracut: Failed to get the driver of lo
  dracut[2020]: Failed to get the driver of lo
  kdumpctl[1775]: kdump: mkdumprd: failed to make kdump initrd
  kdumpctl[1775]: kdump: Starting kdump: [FAILED]
  systemd[1]: kdump.service: Main process exited, code=exited, status=1/FAILURE
  systemd[1]: kdump.service: Failed with result 'exit-code'.
  systemd[1]: Failed to start Crash recovery kernel arming.
  systemd[1]: kdump.service: Consumed 1.710s CPU time.

This is because the loopback interface is used for transferring vmcore and
ethtool can't get the driver of the loopback interface. In fact, once
COFNIG_NET is enabled, the loopback device is enabled and there is no driver
for the loopback device. So skip installing driver for the loopback device.
The loopback interface is implemented in linux/drivers/net/loopback.c
and always has the name "lo". So we can safely tell if a network
interface is the loopback interface by its name.

Fixes: a65dde2d ("Reduce kdump memory consumption by only installing needed NIC drivers")
Reported-by: Martin Pitt <mpitt@redhat.com>
Reported-by: Rich Megginson <rmeggins@redhat.com>
Reviewed-by: Lichen Liu <lichliu@redhat.com>
Reviewed-by: Philipp Rudo <prudo@redhat.com>
Signed-off-by: Coiby Xu <coxu@redhat.com>
2022-12-20 14:03:12 +08:00
Coiby Xu bc4196afc1
Release 2.0.25-4
Signed-off-by: Coiby Xu <coxu@redhat.com>
2022-12-07 17:47:58 +08:00
Coiby Xu b45896c620 dracut-module-setup.sh: stop overwriting dracut's trap handler
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2149246

Latest Workstation live x86_64 image has an excess increase of ~300 MB
in size. This is because kdumpbase module's trap handler overwrites
dracut's handler and DRACUT_TMPDIR which has three unpacked initramfs
files fails to be cleaned up. This patch moves kdumpbase module's
temporary folder under DRACUT_TMPDIR and lets dracut's trap handler do
the cleanup instead.

Fixes: d25b1ee3 ("Add functions to copy NetworkManage connection profiles to the initramfs")
Signed-off-by: Coiby Xu <coxu@redhat.com>
Reviewed-by: Philipp Rudo <prudo@redhat.com>
2022-12-07 17:32:32 +08:00
Hari Bathini a833624fe5 fadump: avoid status check while starting in fadump mode
With kernel commit 607451ce0aa9b ("powerpc/fadump: register for fadump
as early as possible"), 'kdumpctl start' prematurely returns with the
below message:

    "Kdump already running: [WARNING]"

instead of setting default initrd with dump capture capability as
required for fadump. Skip status check in fadump mode to avoid this
problem.

Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Reviewed-by: Philipp Rudo <prudo@redhat.com>
2022-12-07 09:43:41 +08:00
Hari Bathini 4a2dcab26a fadump: add a kernel install hook to clean up fadump initramfs
Kdump service will create fadump initramfs when needed, but it won't
clean up the fadump initramfs on kernel uninstall. So create a kernel
install hook to do the clean up job.

Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Reviewed-by: Philipp Rudo <prudo@redhat.com>
2022-12-07 09:42:29 +08:00
Hari Bathini 25411da966 fadump: fix default initrd backup and restore logic
In case of fadump, default initrd is rebuilt with dump capturing
capability, as the same initrd is used for booting production kernel
as well as capture kernel.

The original initrd file is backed up with a checksum, to restore
it as the default initrd when fadump is disabled. As the checksum
file is not kernel version specific, switching between different
kernel versions and kdump/fadump dump mode breaks the default initrd
backup/restore logic. Fix this by having a kernel version specific
checksum file.

Also, if backing up initrd fails, retaining the checksum file isn't
useful. Remove it.

Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Reviewed-by: Philipp Rudo <prudo@redhat.com>
2022-12-07 09:42:29 +08:00
Hari Bathini f98bd5895e fadump: use 'zstd' as the default compression method
If available, use 'zstd' compression method to optimize the size of
the initrd built with fadump support. Also, 'squash+zstd' is not
preferred because more disk space is consumed with 'squash+zstd' due
to the additional binaries needed for fadump with squash case.

Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Acked-by: Tao Liu <ltao@redhat.com>
Reviewed-by: Philipp Rudo <prudo@redhat.com>
2022-12-07 09:42:29 +08:00
Coiby Xu e96e441b36 Release 2.0.25-3
Signed-off-by: Coiby Xu <coxu@redhat.com>
2022-11-25 17:47:20 +08:00
Lichen Liu 5eb77ee3fa kdumpctl: Optimize _find_kernel_path_by_release regex string
Currently _find_kernel_path_by_release uses grubby and grep to
find the kernel path, if both the normal kernel and it's debug
varient exist, the grep will give more than one kernel strings.

```
kernel="/boot/vmlinuz-5.14.0-139.kpq0.el9.s390x+debug"
kernel="/boot/vmlinuz-5.14.0-139.kpq0.el9.s390x"
```

This will cause an error when installing debug kernel.

```
The param "/boot/vmlinuz-5.14.0-139.kpq0.el9.s390x+debug
/boot/vmlinuz-5.14.0-139.kpq0.el9.s390x" is incorrect
```

Fixes: 945cbbd ("add helper functions to get kernel path by kernel release and the path of current running kernel")

Signed-off-by: Lichen Liu <lichliu@redhat.com>
Reviewed-by: Philipp Rudo <prudo@redhat.com>
2022-11-25 17:27:15 +08:00
Pingfan Liu 0414386cb8 unit tests: adapt check_config to gen-kdump-conf.sh
Now, kdump.conf is generated by gen-kdump-conf.sh, hence adapting
check_config to run that script firstly then check the generated file.

Signed-off-by: Pingfan Liu <piliu@redhat.com>
Reviewed-by: Coiby Xu <coxu@redhat.com>
2022-11-25 17:27:14 +08:00
Pingfan Liu 787b041aab kdump.conf: use a simple generator script to maintain
This commit has the same motivation as the commit 677da8a "sysconfig:
use a simple generator script to maintain".

At present, only the kdump.conf generated for s390x has a slight
difference from the other arches, where the core_collector asks the
makedumpfile to use "-c" option to compress dump data by each page using
zlib, which is more efficient than lzo on s390x.

Signed-off-by: Pingfan Liu <piliu@redhat.com>
Reviewed-by: Philipp Rudo <prudo@redhat.com>
2022-11-25 17:16:09 +08:00
Coiby Xu 523cda8f34 Don't run kdump_check_setup_iscsi in a subshell in order to collect needed
network interfaces

Currently, dumping to iSCSI target fails because the global array
(unique_netifs) that stores the network interfaces needed by kdump is
empty. The root cause is change of the array made in a subshell (a child
process) is inaccessible to the parent process. So don't run
kdump_check_setup_iscsi in a subshell.

Fixes: 63c3805c ("Set up kdump network by directly copying NM connection profile to initrd")
Signed-off-by: Coiby Xu <coxu@redhat.com>
Reviewed-by: Pingfan Liu <piliu@redhat.com>
2022-11-25 13:54:11 +08:00
Coiby Xu b5577c163a Simplify setup_znet by copying connection profile to initrd
/usr/lib/udev/ccw_init [1] shipped by s390utils extracts the values of
SUBCHANNELS, NETTYPE and LAYER2 from /etc/sysconfig/network-scripts/ifcfg-*
or /etc/NetworkManager/system-connections/*.nmconnection to activate znet
network device. If the connection profile is copied to initrd,
there is no need to set up the "rd.znet" dracut cmdline parameter.

There are two cases addressed by this commit,
 1. znet network interface is a slave of bonding/teaming/vlan/bridging
    network. The connection profile has been copied to initrd by
    kdump_copy_nmconnection_file and it contains the info needed by
    ccw_init.
 2. znet network interface is a slave of bonding/teaming/vlan/bridging
    network. The corresponding ifcfg-*/*.nmconnection file may not contain
    info like SUBCHANNELS [2]. In this case, copy the ifcfg-*/*.nmconnection
    file that has this info to the kdump initrd. Also to prevent the copied
    connection profile from being chosen by NM, set
    connection.autoconnect=false for this connection profile.

With this implementation, there is also no need to check if znet is
used beforehand.

Note
1. ccw_init doesn't care if SUBCHANNELS, NETTYPE and LAYER2 comes from
   an active NM profile or not. If an inactive NM profile contains this
   info, it needs to be copied to the kdump initrd as well.
2. "rd.znet_ifname=$_netdev:${SUBCHANNELS}" is no longer needed needed
   because now there is no renaming of s390x network interfaces when
   reusing NetworkManager profiles. rd.znet_ifname was introduced in
   commit ce0305d ("Add a new option 'rd.znet_ifname' in order to use it
   in udev rules") to address the special case of non-persistent
   MAC address by renaming a network interface by SUBCHANNELS.

[1] https://src.fedoraproject.org/rpms/s390utils/blob/rawhide/f/ccw_init
[2] https://bugzilla.redhat.com/show_bug.cgi?id=2064708

Signed-off-by: Coiby Xu <coxu@redhat.com>
Reviewed-by: Thomas Haller <thaller@redhat.com>
Reviewed-by: Philipp Rudo <prudo@redhat.com>
2022-11-23 06:39:27 +08:00
Coiby Xu 9792994f2f Wait for the network to be truly ready before dumping vmcore
nm-wait-online-initrd.service installed by dracut's 35-networkmanager
module calls nm-online with "-s" which means it returns immediately when
NetworkManager logs "startup complete". Thus it doesn't truly wait for
network connectivity to be established [1]. Wait for the network to be
truly ready before dumping vmcore. There are two benefits brought by
this approach,
  - ssh/nfs dumping won't fail because of that the network is not
   ready e.g. [2][3]
  - users don't need to use workarounds like rd.net.carrier.timeout to
    make sure the network is ready

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1485712
[2] https://bugzilla.redhat.com/show_bug.cgi?id=1909014
[3] https://bugzilla.redhat.com/show_bug.cgi?id=2035451

Signed-off-by: Coiby Xu <coxu@redhat.com>
Reviewed-by: Thomas Haller <thaller@redhat.com>
Reviewed-by: Philipp Rudo <prudo@redhat.com>
2022-11-23 06:39:27 +08:00
Coiby Xu 568623e69a Address the cases where a NIC has a different name in kdump kernel
A NIC may get a different name in the kdump kernel from 1st kernel
in cases like,
 - kernel assigned network interface names are not persistent e.g. [1]
 - there is an udev rule to rename the NIC in the 1st kernel but the
   kdump initrd may not have that rule e.g. [2]

If NM tries to match a NIC with a connection profile based on NIC name
i.e. connection.interface-name, it will fail the above bases. A simple
solution is to ask NM to match a connection profile by MAC address.
Note we don't need to do this for user-created NICs like vlan, bridge and
bond.

An remaining issue is passing the name of a NIC via the kdumpnic dracut
command line parameter which requires passing ifname=<interface>:<MAC> to
have fixed NIC name. But we can simply drop this requirement. kdumpnic
is needed because kdump needs to get the IP by NIC name and use the IP
to created a dumping folder named "{IP}-{DATE}". We can simply pass the
IP to the kdump kernel directly via a new dracut command line parameter
kdumpip instead. In addition to the benefit of simplifying the code,
there are other three benefits brought by this approach,
  - make use of whatever network to transfer the vmcore. Because  as long
    as we have the network to we don't care which NIC is active.
  - if obtained IP in the kdump kernel is different from the one in the
    1st kernel. "{IP}-{DATE}" would better tell where the dumped vmcore
    comes from.
  - without passing ifname=<interface>:<MAC> to kdump initrd, the
    issue of there are two interfaces with the same MAC address for
    Azure Hyper-V NIC SR-IOV [3] is resolved automatically.

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1121778
[2] https://bugzilla.redhat.com/show_bug.cgi?id=810107
[3] https://bugzilla.redhat.com/show_bug.cgi?id=1962421

Signed-off-by: Coiby Xu <coxu@redhat.com>
Reviewed-by: Thomas Haller <thaller@redhat.com>
Reviewed-by: Philipp Rudo <prudo@redhat.com>
2022-11-23 06:39:27 +08:00
Coiby Xu a65dde2d10 Reduce kdump memory consumption by only installing needed NIC drivers
Even after having asked NM to stop managing a unneeded NIC, a NIC driver
may still waste memory. For example, mlx5_core uses a substantial amount
of memory during driver initialization,

======== Report format module_summary: ========
Module mlx5_core using 350.2MB (89650 pages), peak allocation 367.4MB (94056 pages)
Module squashfs using 13.1MB (3360 pages), peak allocation 13.1MB (3360 pages)
Module overlay using 2.1MB (550 pages), peak allocation 2.2MB (555 pages)
Module dns_resolver using 0.9MB (219 pages), peak allocation 5.2MB (1338 pages)
Module mlxfw using 0.7MB (172 pages), peak allocation 5.3MB (1349 pages)
======== Report format module_summary END ========

======== Report format module_top: ========
Top stack usage of module mlx5_core:
  (null) Pages: 89650 (peak: 94056)
    ret_from_fork (0xffffda088b4165f8) Pages: 60007 (peak: 60007)
      kthread (0xffffda088b4bd7e4) Pages: 60007 (peak: 60007)
        worker_thread (0xffffda088b4b48d0) Pages: 60007 (peak: 60007)
          process_one_work (0xffffda088b4b3f40) Pages: 60007 (peak: 60007)
            work_for_cpu_fn (0xffffda088b4aef00) Pages: 53906 (peak: 53906)
              local_pci_probe (0xffffda088b9e1e44) Pages: 53906 (peak: 53906)
                probe_one mlx5_core (0xffffda084f899cc8) Pages: 53518 (peak: 53518)
                  mlx5_init_one mlx5_core (0xffffda084f8994ac) Pages: 49756 (peak: 49756)
                    mlx5_function_setup.constprop.0 mlx5_core (0xffffda084f899100) Pages: 44434 (eak: 44434)
                      mlx5_satisfy_startup_pages mlx5_core (0xffffda084f8a4f24) Pages: 44434 (peak: 44434)
                    mlx5_function_setup.constprop.0 mlx5_core (0xffffda084f899078) Pages: 5285 (peak: 5285)
                      mlx5_cmd_init mlx5_core (0xffffda084f89e414) Pages: 4818 (peak: 4818)
                        mlx5_alloc_cmd_msg mlx5_core (0xffffda084f89aaa0) Pages: 4403 (peak: 4403)

This memory consumption is completely unnecessary when kdump doesn't need
this NIC. Only install needed NIC drivers to prevent this kind of waste.

Note
1. this patch depends on [1] to ask dracut to not install NIC drivers.
2. "ethtool -i" somehow fails to get the vlan driver
3. team.ko doesn't depend on the team mode drivers so we need to install
   the team mode drivers manually.

[1] https://github.com/dracutdevs/dracut/pull/1789

Signed-off-by: Coiby Xu <coxu@redhat.com>
Reviewed-by: Thomas Haller <thaller@redhat.com>
Reviewed-by: Philipp Rudo <prudo@redhat.com>
2022-11-23 06:39:27 +08:00