Commit Graph

1597 Commits

Author SHA1 Message Date
Coiby Xu 5951b5e268 Don't try to update crashkernel when bootloader is not installed
Currently when using anaconda to install the OS, the following errors
occur,

    INF packaging: Configuring (running scriptlet for): kernel-core-5.14.0-70.el9.x86_64 ...
    INF dnf.rpm: grep: /boot/grub2/grubenv: No such file or directory
    grep: /boot/grub2/grubenv: No such file or directory
    grep: /boot/grub2/grubenv: No such file or directory
    grep: /boot/grub2/grubenv: No such file or directory
    ...
    INF packaging: Configuring (running scriptlet for): kexec-tools-2.0.23-9.el9.x86_64 ...
    INF dnf.rpm: grep: /boot/grub2/grubenv: No such file or directory
    grep: /boot/grub2/grubenv: No such file or directory
    grep: /boot/grub2/grubenv: No such file or directory

Or for s390, the following errors occur,

    INF packaging: Configuring (running scriptlet for): kernel-core-5.14.0-71.el9.s390x ...
    03:37:51,232 INF dnf.rpm: grep: /etc/zipl.conf: No such file or directory
    grep: /etc/zipl.conf: No such file or directory
    grep: /etc/zipl.conf: No such file or directory

    INF packaging: Configuring (running scriptlet for): kexec-tools-2.0.23-9_1.el9_0.s390x ...
    INF dnf.rpm: grep: /etc/zipl.conf: No such file or directory

This is because when anaconda installs the packages, bootloader hasn't
been installed and /boot/grub2/grubenv or /etc/zipl.conf doesn't exist.
So don't try to update crashkernel when bootloader isn't ready to avoid
the above errors.

Note this is the second attempt to fix this issue. Previously a file
/tmp/kexec_tools_package_install was created to avoid running the
related code thus to avoid the above errors but unfortunately that
approach has two issues a) somehow osbuild doesn't delete it for RHEL b)
this file could still exist if users manually remove kexec-tools.

Fixes: e218128 ("Only try to reset crashkernel for osbuild during package install")
Reported-by: Jan Stodola <jstodola@redhat.com>
Signed-off-by: Coiby Xu <coxu@redhat.com>
Reviewed-by: Philipp Rudo <prudo@redhat.com>
2022-12-22 10:33:00 +08:00
Coiby Xu bc101086e2 dracut-module-setup.sh: also install the driver of physical NIC for
Hyper-V VM with accelerated networking

Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2151842

Currently, vmcore dumping to remote fs fails on Azure Hyper-V VM with
accelerated networking because it uses a physical NIC for accrelarated
networking [1]. In this case, the driver for this physical NIC should be
installed as well.

[1] https://learn.microsoft.com/en-us/azure/virtual-network/accelerated-networking-overview

Fixes: a65dde2d ("Reduce kdump memory consumption by only installing needed NIC drivers")

Reported-by: Xiaoqiang Xiong <xxiong@redhat.com>
Signed-off-by: Coiby Xu <coxu@redhat.com>
Reviewed-by: Philipp Rudo <prudo@redhat.com>
2022-12-20 14:04:49 +08:00
Coiby Xu 3b22cce1cb dracut-module-setup.sh: skip installing driver for the loopback
interface

Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2151500

Currently, kdump initrd fails to be built when dumping vmcore to
localhost via ssh or nfs,

  kdumpctl[3331]: Cannot get driver information: Operation not supported
  kdumpctl[1991]: dracut: Failed to get the driver of lo
  dracut[2020]: Failed to get the driver of lo
  kdumpctl[1775]: kdump: mkdumprd: failed to make kdump initrd
  kdumpctl[1775]: kdump: Starting kdump: [FAILED]
  systemd[1]: kdump.service: Main process exited, code=exited, status=1/FAILURE
  systemd[1]: kdump.service: Failed with result 'exit-code'.
  systemd[1]: Failed to start Crash recovery kernel arming.
  systemd[1]: kdump.service: Consumed 1.710s CPU time.

This is because the loopback interface is used for transferring vmcore and
ethtool can't get the driver of the loopback interface. In fact, once
COFNIG_NET is enabled, the loopback device is enabled and there is no driver
for the loopback device. So skip installing driver for the loopback device.
The loopback interface is implemented in linux/drivers/net/loopback.c
and always has the name "lo". So we can safely tell if a network
interface is the loopback interface by its name.

Fixes: a65dde2d ("Reduce kdump memory consumption by only installing needed NIC drivers")
Reported-by: Martin Pitt <mpitt@redhat.com>
Reported-by: Rich Megginson <rmeggins@redhat.com>
Reviewed-by: Lichen Liu <lichliu@redhat.com>
Reviewed-by: Philipp Rudo <prudo@redhat.com>
Signed-off-by: Coiby Xu <coxu@redhat.com>
2022-12-20 14:03:12 +08:00
Coiby Xu bc4196afc1
Release 2.0.25-4
Signed-off-by: Coiby Xu <coxu@redhat.com>
2022-12-07 17:47:58 +08:00
Coiby Xu b45896c620 dracut-module-setup.sh: stop overwriting dracut's trap handler
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2149246

Latest Workstation live x86_64 image has an excess increase of ~300 MB
in size. This is because kdumpbase module's trap handler overwrites
dracut's handler and DRACUT_TMPDIR which has three unpacked initramfs
files fails to be cleaned up. This patch moves kdumpbase module's
temporary folder under DRACUT_TMPDIR and lets dracut's trap handler do
the cleanup instead.

Fixes: d25b1ee3 ("Add functions to copy NetworkManage connection profiles to the initramfs")
Signed-off-by: Coiby Xu <coxu@redhat.com>
Reviewed-by: Philipp Rudo <prudo@redhat.com>
2022-12-07 17:32:32 +08:00
Hari Bathini a833624fe5 fadump: avoid status check while starting in fadump mode
With kernel commit 607451ce0aa9b ("powerpc/fadump: register for fadump
as early as possible"), 'kdumpctl start' prematurely returns with the
below message:

    "Kdump already running: [WARNING]"

instead of setting default initrd with dump capture capability as
required for fadump. Skip status check in fadump mode to avoid this
problem.

Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Reviewed-by: Philipp Rudo <prudo@redhat.com>
2022-12-07 09:43:41 +08:00
Hari Bathini 4a2dcab26a fadump: add a kernel install hook to clean up fadump initramfs
Kdump service will create fadump initramfs when needed, but it won't
clean up the fadump initramfs on kernel uninstall. So create a kernel
install hook to do the clean up job.

Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Reviewed-by: Philipp Rudo <prudo@redhat.com>
2022-12-07 09:42:29 +08:00
Hari Bathini 25411da966 fadump: fix default initrd backup and restore logic
In case of fadump, default initrd is rebuilt with dump capturing
capability, as the same initrd is used for booting production kernel
as well as capture kernel.

The original initrd file is backed up with a checksum, to restore
it as the default initrd when fadump is disabled. As the checksum
file is not kernel version specific, switching between different
kernel versions and kdump/fadump dump mode breaks the default initrd
backup/restore logic. Fix this by having a kernel version specific
checksum file.

Also, if backing up initrd fails, retaining the checksum file isn't
useful. Remove it.

Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Reviewed-by: Philipp Rudo <prudo@redhat.com>
2022-12-07 09:42:29 +08:00
Hari Bathini f98bd5895e fadump: use 'zstd' as the default compression method
If available, use 'zstd' compression method to optimize the size of
the initrd built with fadump support. Also, 'squash+zstd' is not
preferred because more disk space is consumed with 'squash+zstd' due
to the additional binaries needed for fadump with squash case.

Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Acked-by: Tao Liu <ltao@redhat.com>
Reviewed-by: Philipp Rudo <prudo@redhat.com>
2022-12-07 09:42:29 +08:00
Coiby Xu e96e441b36 Release 2.0.25-3
Signed-off-by: Coiby Xu <coxu@redhat.com>
2022-11-25 17:47:20 +08:00
Lichen Liu 5eb77ee3fa kdumpctl: Optimize _find_kernel_path_by_release regex string
Currently _find_kernel_path_by_release uses grubby and grep to
find the kernel path, if both the normal kernel and it's debug
varient exist, the grep will give more than one kernel strings.

```
kernel="/boot/vmlinuz-5.14.0-139.kpq0.el9.s390x+debug"
kernel="/boot/vmlinuz-5.14.0-139.kpq0.el9.s390x"
```

This will cause an error when installing debug kernel.

```
The param "/boot/vmlinuz-5.14.0-139.kpq0.el9.s390x+debug
/boot/vmlinuz-5.14.0-139.kpq0.el9.s390x" is incorrect
```

Fixes: 945cbbd ("add helper functions to get kernel path by kernel release and the path of current running kernel")

Signed-off-by: Lichen Liu <lichliu@redhat.com>
Reviewed-by: Philipp Rudo <prudo@redhat.com>
2022-11-25 17:27:15 +08:00
Pingfan Liu 0414386cb8 unit tests: adapt check_config to gen-kdump-conf.sh
Now, kdump.conf is generated by gen-kdump-conf.sh, hence adapting
check_config to run that script firstly then check the generated file.

Signed-off-by: Pingfan Liu <piliu@redhat.com>
Reviewed-by: Coiby Xu <coxu@redhat.com>
2022-11-25 17:27:14 +08:00
Pingfan Liu 787b041aab kdump.conf: use a simple generator script to maintain
This commit has the same motivation as the commit 677da8a "sysconfig:
use a simple generator script to maintain".

At present, only the kdump.conf generated for s390x has a slight
difference from the other arches, where the core_collector asks the
makedumpfile to use "-c" option to compress dump data by each page using
zlib, which is more efficient than lzo on s390x.

Signed-off-by: Pingfan Liu <piliu@redhat.com>
Reviewed-by: Philipp Rudo <prudo@redhat.com>
2022-11-25 17:16:09 +08:00
Coiby Xu 523cda8f34 Don't run kdump_check_setup_iscsi in a subshell in order to collect needed
network interfaces

Currently, dumping to iSCSI target fails because the global array
(unique_netifs) that stores the network interfaces needed by kdump is
empty. The root cause is change of the array made in a subshell (a child
process) is inaccessible to the parent process. So don't run
kdump_check_setup_iscsi in a subshell.

Fixes: 63c3805c ("Set up kdump network by directly copying NM connection profile to initrd")
Signed-off-by: Coiby Xu <coxu@redhat.com>
Reviewed-by: Pingfan Liu <piliu@redhat.com>
2022-11-25 13:54:11 +08:00
Coiby Xu b5577c163a Simplify setup_znet by copying connection profile to initrd
/usr/lib/udev/ccw_init [1] shipped by s390utils extracts the values of
SUBCHANNELS, NETTYPE and LAYER2 from /etc/sysconfig/network-scripts/ifcfg-*
or /etc/NetworkManager/system-connections/*.nmconnection to activate znet
network device. If the connection profile is copied to initrd,
there is no need to set up the "rd.znet" dracut cmdline parameter.

There are two cases addressed by this commit,
 1. znet network interface is a slave of bonding/teaming/vlan/bridging
    network. The connection profile has been copied to initrd by
    kdump_copy_nmconnection_file and it contains the info needed by
    ccw_init.
 2. znet network interface is a slave of bonding/teaming/vlan/bridging
    network. The corresponding ifcfg-*/*.nmconnection file may not contain
    info like SUBCHANNELS [2]. In this case, copy the ifcfg-*/*.nmconnection
    file that has this info to the kdump initrd. Also to prevent the copied
    connection profile from being chosen by NM, set
    connection.autoconnect=false for this connection profile.

With this implementation, there is also no need to check if znet is
used beforehand.

Note
1. ccw_init doesn't care if SUBCHANNELS, NETTYPE and LAYER2 comes from
   an active NM profile or not. If an inactive NM profile contains this
   info, it needs to be copied to the kdump initrd as well.
2. "rd.znet_ifname=$_netdev:${SUBCHANNELS}" is no longer needed needed
   because now there is no renaming of s390x network interfaces when
   reusing NetworkManager profiles. rd.znet_ifname was introduced in
   commit ce0305d ("Add a new option 'rd.znet_ifname' in order to use it
   in udev rules") to address the special case of non-persistent
   MAC address by renaming a network interface by SUBCHANNELS.

[1] https://src.fedoraproject.org/rpms/s390utils/blob/rawhide/f/ccw_init
[2] https://bugzilla.redhat.com/show_bug.cgi?id=2064708

Signed-off-by: Coiby Xu <coxu@redhat.com>
Reviewed-by: Thomas Haller <thaller@redhat.com>
Reviewed-by: Philipp Rudo <prudo@redhat.com>
2022-11-23 06:39:27 +08:00
Coiby Xu 9792994f2f Wait for the network to be truly ready before dumping vmcore
nm-wait-online-initrd.service installed by dracut's 35-networkmanager
module calls nm-online with "-s" which means it returns immediately when
NetworkManager logs "startup complete". Thus it doesn't truly wait for
network connectivity to be established [1]. Wait for the network to be
truly ready before dumping vmcore. There are two benefits brought by
this approach,
  - ssh/nfs dumping won't fail because of that the network is not
   ready e.g. [2][3]
  - users don't need to use workarounds like rd.net.carrier.timeout to
    make sure the network is ready

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1485712
[2] https://bugzilla.redhat.com/show_bug.cgi?id=1909014
[3] https://bugzilla.redhat.com/show_bug.cgi?id=2035451

Signed-off-by: Coiby Xu <coxu@redhat.com>
Reviewed-by: Thomas Haller <thaller@redhat.com>
Reviewed-by: Philipp Rudo <prudo@redhat.com>
2022-11-23 06:39:27 +08:00
Coiby Xu 568623e69a Address the cases where a NIC has a different name in kdump kernel
A NIC may get a different name in the kdump kernel from 1st kernel
in cases like,
 - kernel assigned network interface names are not persistent e.g. [1]
 - there is an udev rule to rename the NIC in the 1st kernel but the
   kdump initrd may not have that rule e.g. [2]

If NM tries to match a NIC with a connection profile based on NIC name
i.e. connection.interface-name, it will fail the above bases. A simple
solution is to ask NM to match a connection profile by MAC address.
Note we don't need to do this for user-created NICs like vlan, bridge and
bond.

An remaining issue is passing the name of a NIC via the kdumpnic dracut
command line parameter which requires passing ifname=<interface>:<MAC> to
have fixed NIC name. But we can simply drop this requirement. kdumpnic
is needed because kdump needs to get the IP by NIC name and use the IP
to created a dumping folder named "{IP}-{DATE}". We can simply pass the
IP to the kdump kernel directly via a new dracut command line parameter
kdumpip instead. In addition to the benefit of simplifying the code,
there are other three benefits brought by this approach,
  - make use of whatever network to transfer the vmcore. Because  as long
    as we have the network to we don't care which NIC is active.
  - if obtained IP in the kdump kernel is different from the one in the
    1st kernel. "{IP}-{DATE}" would better tell where the dumped vmcore
    comes from.
  - without passing ifname=<interface>:<MAC> to kdump initrd, the
    issue of there are two interfaces with the same MAC address for
    Azure Hyper-V NIC SR-IOV [3] is resolved automatically.

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1121778
[2] https://bugzilla.redhat.com/show_bug.cgi?id=810107
[3] https://bugzilla.redhat.com/show_bug.cgi?id=1962421

Signed-off-by: Coiby Xu <coxu@redhat.com>
Reviewed-by: Thomas Haller <thaller@redhat.com>
Reviewed-by: Philipp Rudo <prudo@redhat.com>
2022-11-23 06:39:27 +08:00
Coiby Xu a65dde2d10 Reduce kdump memory consumption by only installing needed NIC drivers
Even after having asked NM to stop managing a unneeded NIC, a NIC driver
may still waste memory. For example, mlx5_core uses a substantial amount
of memory during driver initialization,

======== Report format module_summary: ========
Module mlx5_core using 350.2MB (89650 pages), peak allocation 367.4MB (94056 pages)
Module squashfs using 13.1MB (3360 pages), peak allocation 13.1MB (3360 pages)
Module overlay using 2.1MB (550 pages), peak allocation 2.2MB (555 pages)
Module dns_resolver using 0.9MB (219 pages), peak allocation 5.2MB (1338 pages)
Module mlxfw using 0.7MB (172 pages), peak allocation 5.3MB (1349 pages)
======== Report format module_summary END ========

======== Report format module_top: ========
Top stack usage of module mlx5_core:
  (null) Pages: 89650 (peak: 94056)
    ret_from_fork (0xffffda088b4165f8) Pages: 60007 (peak: 60007)
      kthread (0xffffda088b4bd7e4) Pages: 60007 (peak: 60007)
        worker_thread (0xffffda088b4b48d0) Pages: 60007 (peak: 60007)
          process_one_work (0xffffda088b4b3f40) Pages: 60007 (peak: 60007)
            work_for_cpu_fn (0xffffda088b4aef00) Pages: 53906 (peak: 53906)
              local_pci_probe (0xffffda088b9e1e44) Pages: 53906 (peak: 53906)
                probe_one mlx5_core (0xffffda084f899cc8) Pages: 53518 (peak: 53518)
                  mlx5_init_one mlx5_core (0xffffda084f8994ac) Pages: 49756 (peak: 49756)
                    mlx5_function_setup.constprop.0 mlx5_core (0xffffda084f899100) Pages: 44434 (eak: 44434)
                      mlx5_satisfy_startup_pages mlx5_core (0xffffda084f8a4f24) Pages: 44434 (peak: 44434)
                    mlx5_function_setup.constprop.0 mlx5_core (0xffffda084f899078) Pages: 5285 (peak: 5285)
                      mlx5_cmd_init mlx5_core (0xffffda084f89e414) Pages: 4818 (peak: 4818)
                        mlx5_alloc_cmd_msg mlx5_core (0xffffda084f89aaa0) Pages: 4403 (peak: 4403)

This memory consumption is completely unnecessary when kdump doesn't need
this NIC. Only install needed NIC drivers to prevent this kind of waste.

Note
1. this patch depends on [1] to ask dracut to not install NIC drivers.
2. "ethtool -i" somehow fails to get the vlan driver
3. team.ko doesn't depend on the team mode drivers so we need to install
   the team mode drivers manually.

[1] https://github.com/dracutdevs/dracut/pull/1789

Signed-off-by: Coiby Xu <coxu@redhat.com>
Reviewed-by: Thomas Haller <thaller@redhat.com>
Reviewed-by: Philipp Rudo <prudo@redhat.com>
2022-11-23 06:39:27 +08:00
Coiby Xu 586fe410aa Reduce kdump memory consumption by not letting NetworkManager manage unneeded network interfaces
By default, NetworkManger will manage all the network interfaces and
try to set interface IFF_UP to get carrier state. Regardless of whether
the network interface is connected to a cable or not, the NIC driver
will allocate memory resources for e.g. ring buffers when setting IFF_UP.
This could be a waste of memory. For example it's found i40e consumes ~15GB
on a power machine. On this machine, i40e manages four interfaces but only
one interface is valid. This patch use "managed=false" to tell
NetworkManager to not manage network interfaces that are not needed by
kdump by putting 10-kdump-netif_allowlist.conf in the initramfs.

Signed-off-by: Coiby Xu <coxu@redhat.com>
Reviewed-by: Thomas Haller <thaller@redhat.com>
Reviewed-by: Philipp Rudo <prudo@redhat.com>
2022-11-23 06:39:27 +08:00
Coiby Xu 63c3805c48 Set up kdump network by directly copying NM connection profile to initrd
This patch setup kdump network by directly copying NM connection profile(s)
for different network setup including bond, bridge, vlan, and team. For
vlan network, rename phydev to parent_netif to improve code readability.

With the new approach, the related code to build up dracut cmdline
parameter such rd.route, ip and etc can be cleaned up. And there is no
need to setup dns when copying .nmconnection directly to initrd
either. Note the bootdev dracut command line parameter is only used by
dracut's 35network-legacy and network-manager doesn't use it, remove
related code as well.

Note
1. kdump_setup_vlan/bond/... are no longer called in subshells in order
   to modify global variables like unique_netifs
2. The original kdump_install_net is renamed to better reflect its
   current function

Signed-off-by: Coiby Xu <coxu@redhat.com>
Reviewed-by: Thomas Haller <thaller@redhat.com>
Reviewed-by: Philipp Rudo <prudo@redhat.com>
2022-11-23 06:39:27 +08:00
Coiby Xu 62355ebe5a Stop dracut 35network-manager from running nm-initrd-generator
kexec-tools depends on dracut's 35network-manager module which will
call nm-initrd-generator. We don't want nm-initrd-generator to generate
connection profiles since we  will copy them from 1st kernel to
kdump kernel initramfs. NetworkManager >= 1.35.2 won't generate connection
profiles if there's a connection dir with rd.neednet. For Fedora/RHEL,
this connection dir is /etc/NetworkManager/system-connections. For the
details, please refer to the NetworkManager commit 79885656d3
("initrd: don't add a connection if there's a connection dir with
rd.neednet") [1]. Before the release of NetworkManager >= 1.35.2, we
need to mask /usr/libexec/nm-initrd-generator.

[1] https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1010

Signed-off-by: Coiby Xu <coxu@redhat.com>
Reviewed-by: Thomas Haller <thaller@redhat.com>
Reviewed-by: Philipp Rudo <prudo@redhat.com>
2022-11-23 06:39:27 +08:00
Coiby Xu 6b586a9036 Apply the timeout configuration of nm-initrd-generator
nm-wait-online-initrd.service installed by dracut's 35-networkmanager
module calls nm-online with "-s" which means it returns immediately when
NetworkManager logs "startup complete" after certain timeouts are
reached. "startup complete" doesn't necessarily network connectivity has
been established. nm-initrd-generator has a set of timeouts that in most
of cases when applied, "startup-complete" means network connectivity has
been established. So apply it when setting up kdump network.

Suggested-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Coiby Xu <coxu@redhat.com>
Reviewed-by: Thomas Haller <thaller@redhat.com>
Reviewed-by: Philipp Rudo <prudo@redhat.com>
2022-11-23 06:39:27 +08:00
Coiby Xu 9dfcacf72d Determine whether IPv4 or IPv6 is needed
According to `man nm-online`,
  "By default, connections have the ipv4.may-fail and
  ipv6.may-fail properties set to yes; this means that
  NetworkManager waits for one of the two address families to
  complete configuration before considering the connection
  activated. If you need a specific address family configured
  before network-online.target is reached, set the corresponding
  may-fail property to no."

If a NIC has an IPv4 or IPv6 address, set the corresponding may-fail
property to no. Otherwise, dumping vmcore over IPv6 could fail because
only IPv4 network is ready or vice versa.

Also disable IPv6 if only IPv4 is used and vice versa.

Signed-off-by: Coiby Xu <coxu@redhat.com>
Reviewed-by: Thomas Haller <thaller@redhat.com>
Reviewed-by: Philipp Rudo <prudo@redhat.com>
2022-11-23 06:39:27 +08:00
Coiby Xu d25b1ee31c Add functions to copy NetworkManage connection profiles to the initramfs
Each network interface is manged by a NM connection. Given a list of
network interface names, copy the NetworkManager (NM) connection
profiles i.e. .nmconnection files to the kdump initramfs.

Before copying a connection file, clone it to automatically convert a
legacy ifcfg-*[1] file to a .nmconnection file and for the convenience of
editing the connection profile.

[1] https://fedoraproject.org/wiki/Changes/NetworkManager_keyfile_instead_of_ifcfg_rh

Signed-off-by: Coiby Xu <coxu@redhat.com>
Reviewed-by: Thomas Haller <thaller@redhat.com>
Reviewed-by: Philipp Rudo <prudo@redhat.com>
2022-11-23 06:39:27 +08:00
Coiby Xu b7e58619d1 Fix error for vlan over team network interface
6f9235887f ("module-setup.sh: enable
vlan on team interface") skips establishing teaming network by mistake.
Although it could use one of slave netifs to establish connection
to transfer vmcore to remote fs, it breaks the implicit assumption of
creating an identical network topology to the 1st kernel.

Fixes: 6f92358 ("module-setup.sh: enable vlan on team interface")
Signed-off-by: Coiby Xu <coxu@redhat.com>
Reviewed-by: Thomas Haller <thaller@redhat.com>
Reviewed-by: Philipp Rudo <prudo@redhat.com>
2022-11-23 06:39:27 +08:00
Coiby Xu a3da46d6c4 Skip reset_crashkernel_after_update during package install
Currently, kexec-tools tries to reset crashkernel when using anaconda to
install the system. But grubby isn't ready and complains that,
  10:34:17,014 INF packaging: Configuring (running scriptlet for): kexec-tools-2.0.23-9.el9.x86_64 1646034766 53ff7158f8808774f4e3c3c87e504aa7a6d677b537754dac86c87925c8f0a397
  10:34:17,205 INF dnf.rpm: grep: /boot/grub2/grubenv: No such file or directory
  grep: /boot/grub2/grubenv: No such file or directory
  grep: /boot/grub2/grubenv: No such file or directory

kexec-tools is supposed to update the kernel crashkernel parameter after
package upgrade. Unfortunately, the posttrans RPM scriptlet doesn't
distinguish between package install and upgrade. This patch skips
reset_crashkernel_after_update as similar to e218128e ("Only try to
reset crashkernel for osbuild during package install").

Reported-by: Jan Stodola <jstodola@redhat.com>
Signed-off-by: Coiby Xu <coxu@redhat.com>
2022-11-18 17:22:39 +08:00
Tao Liu 3ae8cf8876 Don't check fs modified when dump target is lvm2 thinp
When the dump target is lvm2 thinp, if we didn't mount
the dump target first, get_fs_type_from_target will get
empty output:

Before mount:
$ get_fs_type_from_target /dev/vg00/thinlv

After mount:
$ mount /dev/vg00/thinlv /mnt
$ get_fs_type_from_target /dev/vg00/thinlv
ext4

As a result, kdumpctl start will fail with:
$ kdumpctl start
kdump: Dump target is invalid
kdump: Starting kdump: [FAILED]

This patch fix the issue by bypassing check_fs_modified
when the dump target is lvm2 thinp.

Signed-off-by: Tao Liu <ltao@redhat.com>
Reviewed-by: Coiby Xu <prudo@redhat.com>
2022-11-11 10:29:02 +08:00
Coiby Xu cea74a7b3e tests: use .nmconnection to set up test network
F36 has dropped support on ifcfg and as a result current network tests
fails. Use .nmconnection to set up test network instead.

Signed-off-by: Coiby Xu <coxu@redhat.com>
Reviewed-by: Philipp Rudo <prudo@redhat.com>
2022-11-09 14:07:29 +08:00
Hari Bathini f33c99e347 fadump: preserve file modification time to help with hardlinking
With commit fa9201b2 ("fadump: isolate fadump initramfs image within
the default one"), initramfs image gets to hold two images, one for
production kernel boot purpose and the other for capture kernel boot.
Most files are common among the two images. Retain file modification
time to replace duplicate files with hardlinks and save space. Also,
avoid unnecessarily compressing fadump image that is decompressed
immediately anyway.

Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Reviewed-by: Philipp Rudo <prudo@redhat.com>
2022-11-09 14:07:29 +08:00
Hari Bathini 55b0dd03b3 fadump: do not use squash to reduce image size
With commit fa9201b2 ("fadump: isolate fadump initramfs image within
the default one"), initramfs image gets to hold two squash images, one
for production kernel boot purpose and the other for capture kernel
boot. Having separate images improved reliability for both production
kernel and capture kernel boot scenarios, but the size of initramfs
image became considerably larger.

Instead of having squash images, compressing $initdir without using
squash images reduced the size of initramfs image for fadump case by
around 30%. So, avoid using squash for fadump case.

Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Reviewed-by: Philipp Rudo <prudo@redhat.com>
2022-11-09 14:07:29 +08:00
Tao Liu 6d2c22bb81 selftest: Add lvm2 thin provision for kdump test
Signed-off-by: Tao Liu <ltao@redhat.com>
Reviewed-by: Philipp Rudo <prudo@redhat.com>
2022-11-01 12:20:34 +08:00
Tao Liu 68978f9241 selftest: Only iterate the .sh files for test execution
Previously, all files within $TESTCASEDIR/$test_case are regarded
as shell script files for testing. However there might be config
files under the directory. So let's only iterate the .sh files.

Signed-off-by: Tao Liu <ltao@redhat.com>
Reviewed-by: Philipp Rudo <prudo@redhat.com>
2022-11-01 12:20:34 +08:00
Tao Liu f11721077a Add dependency of dracut lvmthinpool-monitor module
The 80lvmthinpool-monitor module is needed for monitor and
autoextend the size of thin pool in 2nd kernel. The module was
integrated in dracut version 057.

If lvmthinpool-monitor module is not found, we will print a warning.
Because we don't want to block the kdump process when the thin pool
capacity is enough and no monitor-and-autoextend actually needed.

Signed-off-by: Tao Liu <ltao@redhat.com>
Reviewed-by: Philipp Rudo <prudo@redhat.com>
2022-11-01 12:20:34 +08:00
Tao Liu 10ca970940 lvm.conf should be check modified if lvm2 thinp enabled
lvm2 relies on /etc/lvm/lvm.conf to determine its behaviour. The
important configs such as thin_pool_autoextend_threshold and
thin_pool_autoextend_percent will be used during kdump in 2nd
kernel. So if the file is modified, the initramfs should be
rebuild to include the latest.

Signed-off-by: Tao Liu <ltao@redhat.com>
Reviewed-by: Philipp Rudo <prudo@redhat.com>
2022-11-01 12:20:34 +08:00
Tao Liu 0a5b71d123 Add lvm2 thin provision dump target checker
We need to check if a directory or a device is lvm2 thinp target.

First, we use get_block_dump_target() to convert dump path into
block device, then we check if the device is lvm2 thinp target by
cmd lvs.

is_lvm2_thinp_device is now located in kdump-lib-initramfs.sh, for it
will be used in 2nd kernel. is_lvm2_thinp_dump_target is located in
kdump-lib.sh, for it is only used in 1st kernel, and it has dependencies
which exist in kdump-lib.sh.

Signed-off-by: Tao Liu <ltao@redhat.com>
Reviewed-by: Philipp Rudo <prudo@redhat.com>
2022-11-01 12:20:34 +08:00
Coiby Xu 995ee24903 Release 2.0.25-2
Signed-off-by: Coiby Xu <coxu@redhat.com>
2022-10-27 16:00:11 +08:00
Coiby Xu fdad7d9869 Skip reading /etc/defaut/grub for s390x
Currently, updating kexec-tools on s390x gives the warning
sed: can't read /etc/default/grub: No such file or directory

This happens because s390x doesn't use GRUB and /etc/default/grub
doesn't exist. We need to skip both reading and writing to
/etc/default/grub.

Reported-by: Jie Li <jieli@redhat.com>
Reviewed-by: Philipp Rudo <prudo@redhat.com>
Signed-off-by: Coiby Xu <coxu@redhat.com>
2022-10-27 14:42:27 +08:00
Coiby Xu 6ce4b85bb3 Include the memory overhead cost of cryptsetup when estimating the memory requirement for LUKS-encrypted target
Currently, "kdumpctl estimate" neglects the memory overhead cost of
cryptsetup itself. Unfortunately, there is no golden formula to
calculate the overhead cost [1]. So estimate the overhead cost as 50M
for aarch64 and 20M for other architectures based on the following
empirical data,

| Overhead (M) | OS                                        | arch    |
| ------------ | ----------------------------------------- | ------- |
| 14.1         | RHEL-9.2.0-20220829.d.1                   | ppc64le |
| 14           | Fedora-37-20220830.n.0 Everything ppc64le | ppc64le |
| 17           | Fedora 36                                 | ppc64le |
| 8.8          | Fedora 35                                 | s390x   |
| 10.1         | Fedora-Rawhide-20220829.n.0, fc38         | s390x   |
| 42           | Fedora-Rawhide-20220829.n.0, fc38         | arch64  |
| 40           | F35                                       | arch64  |
| 42           | F36                                       | arch64  |
| 42           | Fedora-Rawhide-20220901.n.0               | arch64  |
| 10           | F35                                       | x86_64  |
| 10           | Fedora-Rawhide-20220901.n.0               | x86_64  |
| 11           | Fedora-Rawhide-20220901.n.0               | x86_64  |

[1] https://lore.kernel.org/cryptsetup/20220616044339.376qlipk5h2omhx2@Rk/T/#u

Fixes: e9e6a2c ("kdumpctl: Add kdumpctl estimate")
Signed-off-by: Coiby Xu <coxu@redhat.com>
Reviewed-by: Philipp Rudo <prudo@redhat.com>
2022-10-26 15:38:21 +08:00
Coiby Xu 50a8461fc7 Choosing the most memory-consuming key slot when estimating the
memory requirement for LUKS-encrypted target

When there are multiple key slots, "kdumpctl estimate" uses the least
memory-consuming key slot. For example, when there are two memory slots
created with --pbkdf-memory=1048576 (1G) and --pbkdf-memory=524288 (512M),
"kdumpctl estimate" thinks the extra memory requirement is only 512M.
This will of course lead to OOM if the user uses the more
memory-consuming key slot. Fix it by sorting in reverse order.

Fixes: e9e6a2c ("kdumpctl: Add kdumpctl estimate")
Signed-off-by: Coiby Xu <coxu@redhat.com>

Reviewed-by: Lichen Liu <lichliu@redhat.com>
Signed-off-by: Coiby Xu <coxu@redhat.com>
2022-10-26 15:34:08 +08:00
Coiby Xu 15122b3f98 Fix grep warnings "grep: warning: stray \ before -"
Latest grep (3.8) warnings about unneeded backslashes when building
kdump initrd [1],
    kdump: Rebuilding /boot/initramfs-6.0.0-0.rc5.a335366bad13.40.test.fc38.aarch64kdump.img
    grep: warning: stray \ before -
    grep: warning: stray \ before -
    grep: warning: stray \ before -
    grep: warning: stray \ before -
    grep: warning: stray \ before -

Some warnings can be avoided by using "sed -n" to remove grep and the
others can use the -- argument.

[1] https://s3.us-east-1.amazonaws.com/arr-cki-prod-datawarehouse-public/datawarehouse-public/2022/09/17/redhat:643020269/build_aarch64_redhat:643020269_aarch64/tests/4/results_0001/job.01/recipes/12617739/tasks/5/logs/taskout.log

Reported-by: Baoquan He <bhe@redhat.com>
Signed-off-by: Coiby Xu <coxu@redhat.com>
Suggested-by: Philipp Rudo <prudo@redhat.com>
Reviewed-by: Philipp Rudo <prudo@redhat.com>
2022-10-26 14:16:04 +08:00
Coiby Xu e218128e28 Only try to reset crashkernel for osbuild during package install
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2060319

Currently, kexec-tools tries to reset crashkernel when using anaconda to
install the system. But grubby isn't ready and complains that,
  10:33:17,631 INF packaging: Configuring (running scriptlet for): kernel-core-5.14.0-70.el9.x86_64 1645746534 03dcd32db234b72440ee6764d59b32347c5f0cd98ac3fb55beb47214a76f33b4
  10:34:16,696 INF dnf.rpm: grep: /boot/grub2/grubenv: No such file or directory
  grep: /boot/grub2/grubenv: No such file or directory

We only need to try resetting crashkernel for osbuild. Skip it for other
cases. To tell if it's package install instead of package upgrade, make
use of %pre to write a file /tmp/kexec-tools-install when "$1 == 1" [1].

[1] https://docs.fedoraproject.org/en-US/packaging-guidelines/Scriptlets/#_syntax

Reported-by: Jan Stodola <jstodola@redhat.com>
Signed-off-by: Coiby Xu <coxu@redhat.com>
Reviewed-by: Lichen Liu <lichenliu@redhat.com>
2022-10-20 13:54:10 +08:00
Coiby Xu a7ead187a4 Prefix reset-crashkernel-{for-installed_kernel,after-update} with underscore
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2048690

To indicate they are for internal use only, underscore them.

Reported-by: rcheerla@redhat.com
Signed-off-by: Coiby Xu <coxu@redhat.com>
Reviewed-by: Lichen Liu <lichenliu@redhat.com>
2022-10-20 13:54:10 +08:00
Tao Liu fc1c79ffd2 Seperate dracut and dracut-squash compressor for zstd
Previously kexec-tools will pass "--compress zstd" to dracut. It
will make dracut to decide whether: a) call mksquashfs to make a
zstd format squash-root.img, b) call cmd zstd to make a initramfs.

Since dracut(>= 057) has decoupled the compressor for dracut and
dracut-squash, So in this patch, we will pass the compressor seperately.

Note:

The is_squash_available && !dracut_has_option --squash-compressor
&& !is_zsdt_command_available case is left unprocessed on purpose.

Actually, the situation when we want to call zstd compression is:
1) If squash function OK, we want dracut to invoke mksquashfs to make
a zstd format squash-root.img within initramfs.
2) If squash function is not OK, and cmd zstd presents, we want dracut
to invoke cmd zstd to make a zstd format initramfs.

is_zstd_command_available check can handle case 2 completely.

However, for the is_squash_available check, it cannot handle case 1
completely. It only checks if the kernel supports squashfs, it doesn't
check whether the squash module has been added by dracut when making
initramfs. In fact, in kexec-tools we are unable to do the check,
there are multiple ways to forbit dracut to load a module, such as
"dracut -o module" and "omit_dracutmodules in dracut.conf".

When squash dracut module is omitted, is_squash_available check will
still pass, so "--compress zstd" will be appended to dracut cmdline,
and it will call cmd zstd to do the compression. However cmd zstd may
not exist, so it fails.

The previous "--compress zstd" is ambiguous, after the intro of
"--squash-compressor", "--squash-compressor" only effect for
mksquashfs and "--compress" only effect for specific cmd.

So for the is_squash_available && !dracut_has_option
--squash-compressor && !is_zsdt_command_available case, we just leave
it to be handled the default way.

Reviewed-by: Philipp Rudo <prudo@redhat.com>
Signed-off-by: Tao Liu <ltao@redhat.com>
2022-10-20 12:26:37 +08:00
Tao Liu bea6143178 Fix the sync issue for dump_fs
Previously the sync for dump_fs is problematic, it always
return success according to man 2 sync. So it cannot detect
the error of the dump target is full and not all of vmcore
data been written back the disk, which will leave the vmcore
imcomplete and report misleading log as "saving vmcore
complete".

In this patch, we will use "sync -f vmcore" instead, which
will return error if syncfs on the dump target fails. In
this way, vmcore sync related failures, such as autoextend
of lvm2 thinpool fails, can be detected and handled properly.

Signed-off-by: Tao Liu <ltao@redhat.com>
Reviewed-by: Coiby Xu <coxu@redhat.com>
2022-10-08 15:46:11 +08:00
Tao Liu c743881ae6 virtiofs support for kexec-tools
This patch add virtiofs support for kexec-tools by introducing a new option
for /etc/kdump.conf:

virtiofs myfs

Where myfs is a variable tag name specified in qemu cmdline
"-device vhost-user-fs-pci,tag=myfs".

The patch covers the following cases:
1) Dumping VM's vmcore to a virtiofs shared directory;
2) When the VM's rootfs is a virtiofs shared directory and dumping the
   VM's vmcore to its subdirectory, such as /var/crash;
3) The combination of case 1 & 2: The VM's rootfs is a virtiofs shared
   directory and dumping the VM's vmcore to another virtiofs shared
   directory.

Case 2 & 3 need dracut >= 057, otherwise VM cannot boot from virtiofs
shared rootfs. But it is not the issue of kexec-tools.

Reviewed-by: Philipp Rudo <prudo@redhat.com>
Signed-off-by: Tao Liu <ltao@redhat.com>
2022-09-29 12:22:49 +08:00
Hari Bathini d905d49c08 fadump: avoid non-debug kernel use for fadump case
Since commit c5bdd2d8f1 ("kdump-lib: use non-debug kernels first"),
non-debug kernel is preferred, over the debug variant, as dump capture
kernel to reduce memory consumption. This works alright for kdump as
the capture kernel is loaded using kexec.

In case of fadump, regular boot loader is used to load the capture
kernel. So, the default kernel needs to be used as capture kernel as
well. But with commit c5bdd2d8f1, initrd of a different kernel is
made dump capture capable, breaking fadump's ability to capture dump
properly. Fix this by sticking with the debug variant in case of
fadump.

Fixes: c5bdd2d8f1 ("kdump-lib: use non-debug kernels first")

Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Acked-by: Lichen Liu <lichliu@redhat.com>
Acked-by: Coiby Xu <coxu@redhat.com>
2022-09-23 14:23:38 +08:00
Lichen Liu 4d52b7d548 mkdumprd: Improve error messages on non-existing NFS target directories
When kdump is configured with a NFS location, and the remote directory does
not exist, kdump.service fails with a confusing error message.

    kdumpctl[2172]: kdump: Dump path "/tmp/mkdumprd.ftWhOF/target/dumps"
    does not exist in dump target "10.111.113.2:/srv/kdump"

We just need to print the remote directory "dumps" in such case, because
"/tmp/mkdumprd.ftWhOF/target" is the local temporary mount point.

Signed-off-by: Lichen Liu <lichliu@redhat.com>
Reviewed-by: Coiby Xu<coxu@redhat.com>
2022-09-21 13:30:14 +08:00
Lichen Liu 4edcd9a400 kdumpctl: make the kdump.log root-readable-only
Decrease the risk that of leaking information that could potentially
be used to exploit the crash further (think location of keys).

Signed-off-by: Lichen Liu <lichliu@redhat.com>
Acked-by: Coiby Xu <coxu@redhat.com>
2022-09-06 20:21:31 +08:00
Kairui Song 677da8a59b sysconfig: use a simple generator script to maintain
These kdump.sysconfig.* files are almost identical with a bit difference
in several parameters, just use a simple script to generate them upon
packaging. This should make it easier to maintain, updating a comment or
param for a certain arch can be done in one place.

There are only some comment or empty option differences with the generated
version because some arch's sysconfig is not up-to-dated, this actually
fixes the issue, I used the following script to check these differences:

 # for arch in aarch64 i386 ppc64 ppc64le s390x x86_64; do
   ./gen-kdump-sysconfig.sh $arch > kdump.sysconfig.$arch.new
   git checkout HEAD^ kdump.sysconfig.$arch &>/dev/null
   echo "$arch:"
   diff kdump.sysconfig.$arch kdump.sysconfig.$arch.new; echo ""
   done; git reset;

Signed-off-by: Kairui Song <kasong@tencent.com>
Reviewed-by: Philipp Rudo <prudo@redhat.com>
2022-08-16 14:35:35 +08:00
Coiby Xu aa84244346 Release 2.0.25-1
Signed-off-by: Coiby Xu <coxu@redhat.com>
2022-08-03 20:14:23 +08:00