Commit Graph

236 Commits

Author SHA1 Message Date
Coiby Xu
9726838241
Try to install PHY and MDIO bus drivers explicitly
Resolves: https://issues.redhat.com/browse/RHEL-7028

Currently, nfs dumping fails on some machines that has a dedicated PHY
driver (dealing with the physical layer) or MDIO bus (connecting the MAC
to PHY devices) driver. This is because kexec-tools doesn't install
dedicated PHY or MDIO driver explicitly. Usually a NIC driver shouldn't
specify the dependency on the needed PHY or MDIO driver because it
shouldn't a NIC (medium access control, MAC) driver is for dealing with
the Data link layer and a PHY driver is for physical layer. So as long
as a MAC driver can talk to the PHY layer via APIs, it shouldn't care
which PHY driver or device it's talking to. So when the
dependency on a PHY driver or MDIO driver is not found by dracut's
instmods, the PHY or MDIO driver won't be installed.

This patch passes =drivers/net/phy and =drivers/net/mdio to dracut's
instmods which will only install in-use PHY or MDIO driver(s).

Note ideally we should find out which PHY driver is used by a NIC but
unfortunately currently no universal way can be found
(/sys/class/net/NIC_NAME/phydev/driver/module can be used to find the
 name of the PHY driver for some NICs but it doesn't exist for some NICs
like Qualcomm Atheros AR8031). So is it for a MDIO bus driver.
Fortunately currently no huge memory consumption is found for a PHY or
MDIO driver.

Fixes: a65dde2d ("Reduce kdump memory consumption by only installing needed NIC drivers")
Reported-by: Doreen Alongi <dalongi@redhat.com>
Signed-off-by: Coiby Xu <coxu@redhat.com>
2024-05-31 11:32:25 +08:00
Philipp Rudo
18522479e8
dracut-module-setup: Fix missing systemd/system.conf error
There is a bug report for RHEL10 about a grep error reading

  grep: /var/tmp/dracut.DiZuKp/initramfs/etc/systemd/system.conf*: No such file or directory

that shows up when rebuilding the initrd. This is caused by systemd
v255 that allows installing the default systemd config files to
/usr/lib/systemd instead of /etc/systemd [1][2] which is done for RHEL.
So unless a user manually adds /etc/systemd/system.conf the file no
longer exists.

However the test that requires the call to grep is somewhat wonky. IIUC
the test is there so we don't overwrite a setting the user might have
made. In my opinion this only makes sense as long as the timeout set is
larger than what we would set. But this part of the logic is missing.
So fix the error message by removing the test and add our config
unconditionally.

While at it rename the created drop-ins to 99-kdump.conf to follow
the recommended naming convention and to make sure that our value takes
precedence.

Note: In case the test is still needed we can fall back to use
'systemd-analyse cat-config' that automatically considers all potential
locations for the config and its drop-ins.

[1] 6495361c7d ("meson: add build option for install path of main config files")
[2] 6378f257e7 ("various: use new config loader instead of config_parse_config_file()")

Signed-off-by: Philipp Rudo <prudo@redhat.com>
2024-05-31 11:32:24 +08:00
Lichen Liu
ace7b80a5b
Fix potential-bashisms in monitor_dd_progress
Resolves: RHEL-29044
Upstream: Fedora
Conflict: None

commit a5c17afe7e871f7a2593d04b97e8e77fb434642c
Author: Coiby Xu <coxu@redhat.com>
Date:   Fri Jan 12 20:00:53 2024 +0800

    Fix potential-bashisms in monitor_dd_progress

    As suggested by Carl [1],
    > /usr/lib/dracut/modules.d/99kdumpbase/monitor_dd_progress has some
    > inconsistencies with other scripts in that directory.  It is missing the
    > .sh extension and is not executable.  The latter is resulting in an
    > rpmlint error.

    [1] https://bugzilla.redhat.com/show_bug.cgi?id=2239566#c2

    Suggested-by: Carl George <carl@redhat.com>
    Signed-off-by: Coiby Xu <coxu@redhat.com>
    Reviewed-by: Philipp Rudo <prudo@redhat.com>
    Reviewed-by: Dave Young <dyoung@redhat.com>

Signed-off-by: Lichen Liu <lichliu@redhat.com>
2024-04-29 10:26:51 +08:00
Lichen Liu
468336700d dracut-module-setup: Skip initrd-cleanup and initrd-parse-etc in kdump
When using multipath devices as the target for kdump, if user_friendly_name
is also specified, devices default to names like "mpath*", e.g., mpatha.
In dracut, we obtain a persistent device name via get_persistent_dev. However,
dracut currently believes using /dev/mapper/mpath* could cause issues, thus
alternatively names are used, here it's /dev/disk/by-uuid/<FS_UUID>.

During the kdump boot progress, the /dev/disk/by-uuid/<FS_UUID> will exist as
soon as one of the path devices exists, but it won't be usable by systemd,
since multipathd will claim that device as a path device. Then multipathd will
get stopped before it can create the multipath device.

Without user_friendly_name, /dev/mapper/<WWID> is considered a persistent
device name, avoiding the issue.

The exit of multipathd is due to two dependencies in the current dracut module
90multipath/multipathd.service, "Before=initrd-cleanup.service" and
"Conflicts=initrd-cleanup.service".

As per man 5 systemd.unit, if A.service has "Conflicts=B.service", starting
B.service will stop A.service.

This is useful during normal boot. However, we will never switch-root after
capturing vmcore in kdump.

We need to ensure that multipathd is not killed due to such dependency issue.
Without modifying multipathd.service, we add ConditionPathExists=!/proc/vmcore
to skip initrd-cleanup.service in kdump. This approach is beneficial as
it avoid the potential termination of other services that conflict with
initrd-cleanup.service. Also skip initrd-parse-etc.service as it will try to
start initrd-cleanup.service. Both of these services are used for switch root,
so they can be safely skipped in kdump.

Suggested-by: Benjamin Marzinski <bmarzins@redhat.com>
Suggested-by: Dave Young <dyoung@redhat.com>
Signed-off-by: Lichen Liu <lichliu@redhat.com>
Reviewed-by: Philipp Rudo <prudo@redhat.com>
2024-01-24 15:13:31 +08:00
Steffen Maier
73c9eb71e9 dracut-module-setup: remove old s390 network device config (#1937048)
Since the previous commit reworks znet configuration to be based on the
active system configuration, there is no dependency on any existing
persistent configuration any more. Hence the old code handling systems with
exactly one s390-specific nmconnection as persistent configuration can be
removed.

Migration of the old persistent device configuration mechanism with
nmconnections (or ifcfg) to zdev is handled independently in s390utils.
[https://github.com/steffen-maier/s390utils/pull/1/commits
 ("znet: migrate to consolidated persistent device config with zdev
   (#1937046,#1937048))"]

Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Reviewed-by: Philipp Rudo <prudo@redhat.com>
Reviewed-by: Coiby Xu <coxu@redhat.com>
2024-01-17 11:59:58 +08:00
Steffen Maier
0d90d580b4 dracut-module-setup: consolidate s390 network device config (#1937048)
This is a preparation for consolidating s390 network device config with
https://github.com/dracutdevs/dracut/pull/2534
  ("feat(znet): use zdev for consolidated device configuration")
https://github.com/steffen-maier/s390utils/pull/1/commits
  ("znet: migrate to consolidated persistent device config with zdev
    (#1937046,#1937048))"
  ("znet: clean up old deprecated persistent device config
    (#1937046,#1937048)").
With above consolidation, s390-specific low-level configuration information
will no longer be in NetworkManager connections (nor ifcfg files), but in
the persistent configuration database of chzdev from s390-tools.

Since the kdump dracut module here depends on the "znet" dracut module [1]
and "znet" will copy all persistent configuration into initrd as of above
commit, all s390-specific information would already be in the kdump initrd.
[1] 08de712528 ("Move some dracut module dependencies checks to
module-setup.sh"), 7148c0a30d ("add s390x netdev setup")

However, it is more appropriate and also removes the copy dependency from
"znet" to introduce the consolidated zdev mechanism for importing just the
required network device config from the current active system
configuration. It does not depend on any of the pull requests above.
It does not depend on any existing persistent configuration
and can replace the old function code. This is similar to dracut block
device dependency handling in s390-tools zdev/dracut/95zdev-kdump.

The old code only seems to work if there is exactly one s390-specific
nmconnection (or ifcfg file). Related commits:
b5577c163a ("Simplify setup_znet by copying connection profile to initrd"),
7d47251568 ("Iterate /sys/bus/ccwgroup/devices to tell if we should set up rd.znet"),
8b08b4f17b ("Set up s390 znet cmdline by "nmcli --get-values""),
ce0305d4f9 ("Add a new option 'rd.znet_ifname' in order to use it in udev rules"),
7148c0a30d ("add s390x netdev setup").

A bonding or teaming setup would have multiple following network
interfaces, each of which would need a low-level config if they're s390
channel-attached network devices. The new code should be able to handle
that by iterating the involved network interfaces. Chzdev only exports
something if it's a device type it deems itself responsible for.

Additional debugging output can be generated with e.g. dracut option
"--stdlog 5" (or short -L5). It shows the chzdev export result, the output
of chzdev export and import, and an overview of the resulting persistent
config within the initrd. On systems, which default to using dracut option
"--quiet", you might need an additional "--verbose" to counter "--quiet" so
-L5 has effect. Typically combined with "--debug" to get a shell trace from
building an initrd (Note: --debug does not increase the log levels).

Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Reviewed-by: Philipp Rudo <prudo@redhat.com>
Reviewed-by: Coiby Xu <coxu@redhat.com>
2024-01-17 11:59:58 +08:00
Coiby Xu
38d9990389 Use the same /etc/resolve.conf in kdump initrd if it's managed manually
Resolves: https://issues.redhat.com/browse/RHEL-11897

Previously fix 0177e248 ("Use the same /etc/resolve.conf in kdump initrd
if it's managed manually") is problematic,
   1) it generated .conf file unrecognized by NetowrkManager ;
   2) this .conf file was installed to current file system instead of to the kdump initrd;
   3) this incorrect .conf file prevented the starting of NetworkManager.

This patch fixes the above issues and also suppresses a harmless warning
when systemd-resolved.service doesn't exist,

    # systemctl -q is-enabled systemd-resolved
    Failed to get unit file state for systemd-resolved.service: No such file or directory

Fixes: 0177e248 ("Use the same /etc/resolve.conf in kdump initrd if it's managed manually")
Reported-by: Jie Li <jieli@redhat.com>
Cc: Dave Young <dyoung@redhat.com>
Reviewed-by: Dave Young <dyoung@redhat.com>
Signed-off-by: Coiby Xu <coxu@redhat.com>
2023-12-26 12:44:14 +08:00
Coiby Xu
0177e24832 Use the same /etc/resolve.conf in kdump initrd if it's managed manually
Resolves: https://issues.redhat.com/browse/RHEL-11897

Some users may choose to manage /etc/resolve.conf manually [1]
by setting dns=none or use a symbolic link resolve.conf [2].
In this case, network dumping will not work because DNS resolution
fails. Use the same /etc/resolve.conf in kdump initrd to fix this
problem.

[1] https://bugzilla.gnome.org/show_bug.cgi?id=690404
[2] https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/configuring_and_managing_networking/manually-configuring-the-etc-resolv-conf-file_configuring-and-managing-networking

Fixes: 63c3805c ("Set up kdump network by directly copying NM connection profile to initrd")
Reported-by: Curtis Taylor <cutaylor@redhat.com>
Cc: Jie Li <jieli@redhat.com>
Signed-off-by: Coiby Xu <coxu@redhat.com>
Reviewed-by: Dave Young <dyoung@redhat.com>
2023-11-24 15:44:02 +08:00
Philipp Rudo
f9d8cabfd1 dracut-module-setup: remove dead source_ifcfg_file
With the NetworkManager rewrite this function in no longer used. This
also allows to remove a lot of dead code in kdump-lib.

Signed-off-by: Philipp Rudo <prudo@redhat.com>
Reviewed-by: Coiby Xu <coxu@redhat.com>
2023-04-17 14:49:51 +08:00
Coiby Xu
12d9eff9dc Show how much time kdump has waited for the network to be ready
Relates: https://bugzilla.redhat.com/show_bug.cgi?id=2151504

Currently, when the network isn't ready, kdump would repeatedly print
the same info,

    [   29.537230] kdump[671]: Bad kdump network destination: 192.123.1.21
    [   30.559418] kdump[679]: Bad kdump network destination: 192.123.1.21
    [   31.580189] kdump[687]: Bad kdump network destination: 192.123.1.21

This is not user-friendly and users may think kdump has got stuck. So
also show much time has waited for the network to be ready,

    [   29.546258] kdump[673]: Waiting for network to be ready (50s / 10min)
    ...
    [   32.608967] kdump[697]: Waiting for network to be ready (56s / 10min)

Note kdump_get_ip_route no longer prints an error message and it's up to
the caller to determine the log level and print relevant messages. And
kdump_collect_netif_usage aborts when kdump_get_ip_route fails.

Reported-by: Martin Pitt <mpitt@redhat.com>
Signed-off-by: Coiby Xu <coxu@redhat.com>
Reviewed-by: Philipp Rudo <prudo@redhat.com>
2023-04-15 06:39:17 +08:00
Coiby Xu
df6f25ff20 Tell nmcli to not escape colon when getting the path of connection profile
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2151504

When a NetworManager connection profile contains a colon in the name,
"nmcli --get-values UUID,FILENAME" by default would escape the colon
because a colon is also used for separating the values. In this case,
99kdumpbase fails to get the correct connection profile path,
	kdumpctl[5439]: cp: cannot stat '/run/NetworkManager/system-connections/static-52\\\:54\\\:01.nmconnection': No such file or directory
	kdumpctl[5440]: sed: can't read /tmp/1977-DRACUT_KDUMP_NM/ifcfg-static-52-54-01: No such file or directory
	kdumpctl[5449]: dracut-install: ERROR: installing '/tmp/1977-DRACUT_KDUMP_NM/ifcfg-static-52-54-01' to '/etc/NetworkManager/system-connections/ifcfg-static-52-54-01'

As a result, dumping vmcore to a remote nfs would fail.

In our case of getting connection profile path, there is no need to escape the
colon so pass "-escape no" to nmcli,

	[root@localhost ~]# nmcli --get-values UUID,FILENAME c show
	659e09c1-a6bd-3549-9be4-a07a1a9a8ffd:/etc/NetworkManager/system-connections/aa\:bb.nmconnection

	[root@localhost ~]# nmcli -escape no --get-values UUID,FILENAME c show
	659e09c1-a6bd-3549-9be4-a07a1a9a8ffd:/etc/NetworkManager/system-connections/aa:bb.nmconnection

Suggested-by: Beniamino Galvani <bgalvani@redhat.com>
Reported-by: Martin Pitt <mpitt@redhat.com>
Signed-off-by: Coiby Xu <coxu@redhat.com>
Reviewed-by: Philipp Rudo <prudo@redhat.com>
2023-04-14 20:22:49 +08:00
Coiby Xu
bc101086e2 dracut-module-setup.sh: also install the driver of physical NIC for
Hyper-V VM with accelerated networking

Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2151842

Currently, vmcore dumping to remote fs fails on Azure Hyper-V VM with
accelerated networking because it uses a physical NIC for accrelarated
networking [1]. In this case, the driver for this physical NIC should be
installed as well.

[1] https://learn.microsoft.com/en-us/azure/virtual-network/accelerated-networking-overview

Fixes: a65dde2d ("Reduce kdump memory consumption by only installing needed NIC drivers")

Reported-by: Xiaoqiang Xiong <xxiong@redhat.com>
Signed-off-by: Coiby Xu <coxu@redhat.com>
Reviewed-by: Philipp Rudo <prudo@redhat.com>
2022-12-20 14:04:49 +08:00
Coiby Xu
3b22cce1cb dracut-module-setup.sh: skip installing driver for the loopback
interface

Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2151500

Currently, kdump initrd fails to be built when dumping vmcore to
localhost via ssh or nfs,

  kdumpctl[3331]: Cannot get driver information: Operation not supported
  kdumpctl[1991]: dracut: Failed to get the driver of lo
  dracut[2020]: Failed to get the driver of lo
  kdumpctl[1775]: kdump: mkdumprd: failed to make kdump initrd
  kdumpctl[1775]: kdump: Starting kdump: [FAILED]
  systemd[1]: kdump.service: Main process exited, code=exited, status=1/FAILURE
  systemd[1]: kdump.service: Failed with result 'exit-code'.
  systemd[1]: Failed to start Crash recovery kernel arming.
  systemd[1]: kdump.service: Consumed 1.710s CPU time.

This is because the loopback interface is used for transferring vmcore and
ethtool can't get the driver of the loopback interface. In fact, once
COFNIG_NET is enabled, the loopback device is enabled and there is no driver
for the loopback device. So skip installing driver for the loopback device.
The loopback interface is implemented in linux/drivers/net/loopback.c
and always has the name "lo". So we can safely tell if a network
interface is the loopback interface by its name.

Fixes: a65dde2d ("Reduce kdump memory consumption by only installing needed NIC drivers")
Reported-by: Martin Pitt <mpitt@redhat.com>
Reported-by: Rich Megginson <rmeggins@redhat.com>
Reviewed-by: Lichen Liu <lichliu@redhat.com>
Reviewed-by: Philipp Rudo <prudo@redhat.com>
Signed-off-by: Coiby Xu <coxu@redhat.com>
2022-12-20 14:03:12 +08:00
Coiby Xu
b45896c620 dracut-module-setup.sh: stop overwriting dracut's trap handler
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2149246

Latest Workstation live x86_64 image has an excess increase of ~300 MB
in size. This is because kdumpbase module's trap handler overwrites
dracut's handler and DRACUT_TMPDIR which has three unpacked initramfs
files fails to be cleaned up. This patch moves kdumpbase module's
temporary folder under DRACUT_TMPDIR and lets dracut's trap handler do
the cleanup instead.

Fixes: d25b1ee3 ("Add functions to copy NetworkManage connection profiles to the initramfs")
Signed-off-by: Coiby Xu <coxu@redhat.com>
Reviewed-by: Philipp Rudo <prudo@redhat.com>
2022-12-07 17:32:32 +08:00
Coiby Xu
523cda8f34 Don't run kdump_check_setup_iscsi in a subshell in order to collect needed
network interfaces

Currently, dumping to iSCSI target fails because the global array
(unique_netifs) that stores the network interfaces needed by kdump is
empty. The root cause is change of the array made in a subshell (a child
process) is inaccessible to the parent process. So don't run
kdump_check_setup_iscsi in a subshell.

Fixes: 63c3805c ("Set up kdump network by directly copying NM connection profile to initrd")
Signed-off-by: Coiby Xu <coxu@redhat.com>
Reviewed-by: Pingfan Liu <piliu@redhat.com>
2022-11-25 13:54:11 +08:00
Coiby Xu
b5577c163a Simplify setup_znet by copying connection profile to initrd
/usr/lib/udev/ccw_init [1] shipped by s390utils extracts the values of
SUBCHANNELS, NETTYPE and LAYER2 from /etc/sysconfig/network-scripts/ifcfg-*
or /etc/NetworkManager/system-connections/*.nmconnection to activate znet
network device. If the connection profile is copied to initrd,
there is no need to set up the "rd.znet" dracut cmdline parameter.

There are two cases addressed by this commit,
 1. znet network interface is a slave of bonding/teaming/vlan/bridging
    network. The connection profile has been copied to initrd by
    kdump_copy_nmconnection_file and it contains the info needed by
    ccw_init.
 2. znet network interface is a slave of bonding/teaming/vlan/bridging
    network. The corresponding ifcfg-*/*.nmconnection file may not contain
    info like SUBCHANNELS [2]. In this case, copy the ifcfg-*/*.nmconnection
    file that has this info to the kdump initrd. Also to prevent the copied
    connection profile from being chosen by NM, set
    connection.autoconnect=false for this connection profile.

With this implementation, there is also no need to check if znet is
used beforehand.

Note
1. ccw_init doesn't care if SUBCHANNELS, NETTYPE and LAYER2 comes from
   an active NM profile or not. If an inactive NM profile contains this
   info, it needs to be copied to the kdump initrd as well.
2. "rd.znet_ifname=$_netdev:${SUBCHANNELS}" is no longer needed needed
   because now there is no renaming of s390x network interfaces when
   reusing NetworkManager profiles. rd.znet_ifname was introduced in
   commit ce0305d ("Add a new option 'rd.znet_ifname' in order to use it
   in udev rules") to address the special case of non-persistent
   MAC address by renaming a network interface by SUBCHANNELS.

[1] https://src.fedoraproject.org/rpms/s390utils/blob/rawhide/f/ccw_init
[2] https://bugzilla.redhat.com/show_bug.cgi?id=2064708

Signed-off-by: Coiby Xu <coxu@redhat.com>
Reviewed-by: Thomas Haller <thaller@redhat.com>
Reviewed-by: Philipp Rudo <prudo@redhat.com>
2022-11-23 06:39:27 +08:00
Coiby Xu
568623e69a Address the cases where a NIC has a different name in kdump kernel
A NIC may get a different name in the kdump kernel from 1st kernel
in cases like,
 - kernel assigned network interface names are not persistent e.g. [1]
 - there is an udev rule to rename the NIC in the 1st kernel but the
   kdump initrd may not have that rule e.g. [2]

If NM tries to match a NIC with a connection profile based on NIC name
i.e. connection.interface-name, it will fail the above bases. A simple
solution is to ask NM to match a connection profile by MAC address.
Note we don't need to do this for user-created NICs like vlan, bridge and
bond.

An remaining issue is passing the name of a NIC via the kdumpnic dracut
command line parameter which requires passing ifname=<interface>:<MAC> to
have fixed NIC name. But we can simply drop this requirement. kdumpnic
is needed because kdump needs to get the IP by NIC name and use the IP
to created a dumping folder named "{IP}-{DATE}". We can simply pass the
IP to the kdump kernel directly via a new dracut command line parameter
kdumpip instead. In addition to the benefit of simplifying the code,
there are other three benefits brought by this approach,
  - make use of whatever network to transfer the vmcore. Because  as long
    as we have the network to we don't care which NIC is active.
  - if obtained IP in the kdump kernel is different from the one in the
    1st kernel. "{IP}-{DATE}" would better tell where the dumped vmcore
    comes from.
  - without passing ifname=<interface>:<MAC> to kdump initrd, the
    issue of there are two interfaces with the same MAC address for
    Azure Hyper-V NIC SR-IOV [3] is resolved automatically.

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1121778
[2] https://bugzilla.redhat.com/show_bug.cgi?id=810107
[3] https://bugzilla.redhat.com/show_bug.cgi?id=1962421

Signed-off-by: Coiby Xu <coxu@redhat.com>
Reviewed-by: Thomas Haller <thaller@redhat.com>
Reviewed-by: Philipp Rudo <prudo@redhat.com>
2022-11-23 06:39:27 +08:00
Coiby Xu
a65dde2d10 Reduce kdump memory consumption by only installing needed NIC drivers
Even after having asked NM to stop managing a unneeded NIC, a NIC driver
may still waste memory. For example, mlx5_core uses a substantial amount
of memory during driver initialization,

======== Report format module_summary: ========
Module mlx5_core using 350.2MB (89650 pages), peak allocation 367.4MB (94056 pages)
Module squashfs using 13.1MB (3360 pages), peak allocation 13.1MB (3360 pages)
Module overlay using 2.1MB (550 pages), peak allocation 2.2MB (555 pages)
Module dns_resolver using 0.9MB (219 pages), peak allocation 5.2MB (1338 pages)
Module mlxfw using 0.7MB (172 pages), peak allocation 5.3MB (1349 pages)
======== Report format module_summary END ========

======== Report format module_top: ========
Top stack usage of module mlx5_core:
  (null) Pages: 89650 (peak: 94056)
    ret_from_fork (0xffffda088b4165f8) Pages: 60007 (peak: 60007)
      kthread (0xffffda088b4bd7e4) Pages: 60007 (peak: 60007)
        worker_thread (0xffffda088b4b48d0) Pages: 60007 (peak: 60007)
          process_one_work (0xffffda088b4b3f40) Pages: 60007 (peak: 60007)
            work_for_cpu_fn (0xffffda088b4aef00) Pages: 53906 (peak: 53906)
              local_pci_probe (0xffffda088b9e1e44) Pages: 53906 (peak: 53906)
                probe_one mlx5_core (0xffffda084f899cc8) Pages: 53518 (peak: 53518)
                  mlx5_init_one mlx5_core (0xffffda084f8994ac) Pages: 49756 (peak: 49756)
                    mlx5_function_setup.constprop.0 mlx5_core (0xffffda084f899100) Pages: 44434 (eak: 44434)
                      mlx5_satisfy_startup_pages mlx5_core (0xffffda084f8a4f24) Pages: 44434 (peak: 44434)
                    mlx5_function_setup.constprop.0 mlx5_core (0xffffda084f899078) Pages: 5285 (peak: 5285)
                      mlx5_cmd_init mlx5_core (0xffffda084f89e414) Pages: 4818 (peak: 4818)
                        mlx5_alloc_cmd_msg mlx5_core (0xffffda084f89aaa0) Pages: 4403 (peak: 4403)

This memory consumption is completely unnecessary when kdump doesn't need
this NIC. Only install needed NIC drivers to prevent this kind of waste.

Note
1. this patch depends on [1] to ask dracut to not install NIC drivers.
2. "ethtool -i" somehow fails to get the vlan driver
3. team.ko doesn't depend on the team mode drivers so we need to install
   the team mode drivers manually.

[1] https://github.com/dracutdevs/dracut/pull/1789

Signed-off-by: Coiby Xu <coxu@redhat.com>
Reviewed-by: Thomas Haller <thaller@redhat.com>
Reviewed-by: Philipp Rudo <prudo@redhat.com>
2022-11-23 06:39:27 +08:00
Coiby Xu
586fe410aa Reduce kdump memory consumption by not letting NetworkManager manage unneeded network interfaces
By default, NetworkManger will manage all the network interfaces and
try to set interface IFF_UP to get carrier state. Regardless of whether
the network interface is connected to a cable or not, the NIC driver
will allocate memory resources for e.g. ring buffers when setting IFF_UP.
This could be a waste of memory. For example it's found i40e consumes ~15GB
on a power machine. On this machine, i40e manages four interfaces but only
one interface is valid. This patch use "managed=false" to tell
NetworkManager to not manage network interfaces that are not needed by
kdump by putting 10-kdump-netif_allowlist.conf in the initramfs.

Signed-off-by: Coiby Xu <coxu@redhat.com>
Reviewed-by: Thomas Haller <thaller@redhat.com>
Reviewed-by: Philipp Rudo <prudo@redhat.com>
2022-11-23 06:39:27 +08:00
Coiby Xu
63c3805c48 Set up kdump network by directly copying NM connection profile to initrd
This patch setup kdump network by directly copying NM connection profile(s)
for different network setup including bond, bridge, vlan, and team. For
vlan network, rename phydev to parent_netif to improve code readability.

With the new approach, the related code to build up dracut cmdline
parameter such rd.route, ip and etc can be cleaned up. And there is no
need to setup dns when copying .nmconnection directly to initrd
either. Note the bootdev dracut command line parameter is only used by
dracut's 35network-legacy and network-manager doesn't use it, remove
related code as well.

Note
1. kdump_setup_vlan/bond/... are no longer called in subshells in order
   to modify global variables like unique_netifs
2. The original kdump_install_net is renamed to better reflect its
   current function

Signed-off-by: Coiby Xu <coxu@redhat.com>
Reviewed-by: Thomas Haller <thaller@redhat.com>
Reviewed-by: Philipp Rudo <prudo@redhat.com>
2022-11-23 06:39:27 +08:00
Coiby Xu
62355ebe5a Stop dracut 35network-manager from running nm-initrd-generator
kexec-tools depends on dracut's 35network-manager module which will
call nm-initrd-generator. We don't want nm-initrd-generator to generate
connection profiles since we  will copy them from 1st kernel to
kdump kernel initramfs. NetworkManager >= 1.35.2 won't generate connection
profiles if there's a connection dir with rd.neednet. For Fedora/RHEL,
this connection dir is /etc/NetworkManager/system-connections. For the
details, please refer to the NetworkManager commit 79885656d3
("initrd: don't add a connection if there's a connection dir with
rd.neednet") [1]. Before the release of NetworkManager >= 1.35.2, we
need to mask /usr/libexec/nm-initrd-generator.

[1] https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1010

Signed-off-by: Coiby Xu <coxu@redhat.com>
Reviewed-by: Thomas Haller <thaller@redhat.com>
Reviewed-by: Philipp Rudo <prudo@redhat.com>
2022-11-23 06:39:27 +08:00
Coiby Xu
6b586a9036 Apply the timeout configuration of nm-initrd-generator
nm-wait-online-initrd.service installed by dracut's 35-networkmanager
module calls nm-online with "-s" which means it returns immediately when
NetworkManager logs "startup complete" after certain timeouts are
reached. "startup complete" doesn't necessarily network connectivity has
been established. nm-initrd-generator has a set of timeouts that in most
of cases when applied, "startup-complete" means network connectivity has
been established. So apply it when setting up kdump network.

Suggested-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Coiby Xu <coxu@redhat.com>
Reviewed-by: Thomas Haller <thaller@redhat.com>
Reviewed-by: Philipp Rudo <prudo@redhat.com>
2022-11-23 06:39:27 +08:00
Coiby Xu
9dfcacf72d Determine whether IPv4 or IPv6 is needed
According to `man nm-online`,
  "By default, connections have the ipv4.may-fail and
  ipv6.may-fail properties set to yes; this means that
  NetworkManager waits for one of the two address families to
  complete configuration before considering the connection
  activated. If you need a specific address family configured
  before network-online.target is reached, set the corresponding
  may-fail property to no."

If a NIC has an IPv4 or IPv6 address, set the corresponding may-fail
property to no. Otherwise, dumping vmcore over IPv6 could fail because
only IPv4 network is ready or vice versa.

Also disable IPv6 if only IPv4 is used and vice versa.

Signed-off-by: Coiby Xu <coxu@redhat.com>
Reviewed-by: Thomas Haller <thaller@redhat.com>
Reviewed-by: Philipp Rudo <prudo@redhat.com>
2022-11-23 06:39:27 +08:00
Coiby Xu
d25b1ee31c Add functions to copy NetworkManage connection profiles to the initramfs
Each network interface is manged by a NM connection. Given a list of
network interface names, copy the NetworkManager (NM) connection
profiles i.e. .nmconnection files to the kdump initramfs.

Before copying a connection file, clone it to automatically convert a
legacy ifcfg-*[1] file to a .nmconnection file and for the convenience of
editing the connection profile.

[1] https://fedoraproject.org/wiki/Changes/NetworkManager_keyfile_instead_of_ifcfg_rh

Signed-off-by: Coiby Xu <coxu@redhat.com>
Reviewed-by: Thomas Haller <thaller@redhat.com>
Reviewed-by: Philipp Rudo <prudo@redhat.com>
2022-11-23 06:39:27 +08:00
Coiby Xu
b7e58619d1 Fix error for vlan over team network interface
6f9235887f ("module-setup.sh: enable
vlan on team interface") skips establishing teaming network by mistake.
Although it could use one of slave netifs to establish connection
to transfer vmcore to remote fs, it breaks the implicit assumption of
creating an identical network topology to the 1st kernel.

Fixes: 6f92358 ("module-setup.sh: enable vlan on team interface")
Signed-off-by: Coiby Xu <coxu@redhat.com>
Reviewed-by: Thomas Haller <thaller@redhat.com>
Reviewed-by: Philipp Rudo <prudo@redhat.com>
2022-11-23 06:39:27 +08:00
Tao Liu
f11721077a Add dependency of dracut lvmthinpool-monitor module
The 80lvmthinpool-monitor module is needed for monitor and
autoextend the size of thin pool in 2nd kernel. The module was
integrated in dracut version 057.

If lvmthinpool-monitor module is not found, we will print a warning.
Because we don't want to block the kdump process when the thin pool
capacity is enough and no monitor-and-autoextend actually needed.

Signed-off-by: Tao Liu <ltao@redhat.com>
Reviewed-by: Philipp Rudo <prudo@redhat.com>
2022-11-01 12:20:34 +08:00
Tao Liu
c743881ae6 virtiofs support for kexec-tools
This patch add virtiofs support for kexec-tools by introducing a new option
for /etc/kdump.conf:

virtiofs myfs

Where myfs is a variable tag name specified in qemu cmdline
"-device vhost-user-fs-pci,tag=myfs".

The patch covers the following cases:
1) Dumping VM's vmcore to a virtiofs shared directory;
2) When the VM's rootfs is a virtiofs shared directory and dumping the
   VM's vmcore to its subdirectory, such as /var/crash;
3) The combination of case 1 & 2: The VM's rootfs is a virtiofs shared
   directory and dumping the VM's vmcore to another virtiofs shared
   directory.

Case 2 & 3 need dracut >= 057, otherwise VM cannot boot from virtiofs
shared rootfs. But it is not the issue of kexec-tools.

Reviewed-by: Philipp Rudo <prudo@redhat.com>
Signed-off-by: Tao Liu <ltao@redhat.com>
2022-09-29 12:22:49 +08:00
Baoquan He
1913ea9118 Checking the existence of 40-redhat.rules before modifying
Resolves: bz2106645

The code of commit 163c02970e takes effect in rhel firstly, later
pulled to Fedora. However, Fedora OS doesn't have 40-redhat.rules
in systemd-udev package. With this commit applied, a false positive
warning message can always been seen as below.

So fixing it by checking if 40-redhat.rules exists before handling.
With this change, the false warning is gone.

[root@ ~]# kdumpctl restart
kdump: kexec: unloaded kdump kernel
kdump: Stopping kdump: [OK]
kdump: No kdump initial ramdisk found.
kdump: Rebuilding /boot/initramfs-5.19.0-rc6+kdump.img
sed: can't read /var/tmp/dracut.NnAV2g/initramfs/usr/lib/udev/rules.d/40-redhat.rules: No such file or directory
kdump: kexec: loaded kdump kernel
kdump: Starting kdump: [OK]

Signed-off-by: Baoquan He <bhe@redhat.com>
Acked-by: Pingfan Liu <piliu@redhat.com>
2022-07-21 15:02:32 +08:00
Pingfan Liu
163c02970e ppc64/ppc64le: drop cpu online rule in 40-redhat.rules in kdump initramfs
Onlining secondary cpus breaks kdump completely on KVM on Power hosts
Though we use maxcpus=1 by default but 40-redhat.rules will bring up all
possible cpus by default.

Thus before we get the kernel fix and the systemd rule fix let's remove
the cpu rule in 40-redhat.rules for ppc64/ppc64le kdump initramfs.

This is back ported from RHEL, and original credit goes to Dave Young
<dyoung@redhat.com>

Signed-off-by: Pingfan Liu <piliu@redhat.com>
Acked-by: Tao Liu <ltao@redhat.com>
2021-12-23 15:40:23 +08:00
Coiby Xu
6936fbc1b2 fix broken extra_bins when installing multiple binaries
When there more than one binaries, quoting "$val" would make
dracut-install treat multiple binaries as one binary. Take
"extra_bins /usr/sbin/ping /usr/sbin/ip" as an example, the
following error would occur when building initrd,

dracut-install: ERROR: installing '/usr/sbin/ping /usr/sbin/ip'
dracut: FAILED: /usr/lib/dracut/dracut-install -D /var/tmp/dracut.ODrioZ/initramfs -a /usr/sbin/ping /usr/sbin/ip

Fix it by not quoting the variable and bypassing SC2086 shellcheck.

Fixes: commit 86538ca6e2
       ("bash scripts: fix variable quoting issue")

Acked-by: Tao Liu <ltao@redhat.com>
Signed-off-by: Coiby Xu <coxu@redhat.com>
2021-11-05 17:50:45 +08:00
Kairui Song
58d3e6db3a kdump-lib.sh: rework nmcli related functions
This fixes word splitting issue with nmcli args. Current kexec-tools
scripts won't call nmcli with correct arguments when there are space in
network interface name.

nmcli expects multiple parameters, but get_nmcli_value_by_field only
accepts two params and depends on shell word splitting to split the
_nm_show_cmd into multiple params, which is very fragile.
So switch the param order, simplified this function and now multiple
params can be used properly.

And get_nmcli_connection_show_cmd_by_ifname returns multiple
nmcli params in a single variable, it depend on shell word splitting to
split the words when calling nmcli. But this is very fragile and break
easily when there are any special character in the connection path.

This function is only introduced to get and cache the nmcli command
which contains the "connection name".

Actually only cache the "connection path" is enough. Callers should
just call get_nmcli_connection_apath_by_ifname to cache the path, and
a new helper get_nmcli_field_by_conpath is introduced here to get value
from nmcli. This way "connection path" can contain any character.

Also get rid of another nmcli_cmd usage in
get_nmcli_connection_apath_by_ifname which stores multiple params in a
single bash variable separated by space.

Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Philipp Rudo <prudo@redhat.com>
2021-09-15 23:11:37 +08:00
Kairui Song
b1c794a2cf dracut-kdump.sh: Use stat instead of ls to get vmcore size
ls output is fragile, so use stat instead.

Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Philipp Rudo <prudo@redhat.com>
2021-09-14 03:25:54 +08:00
Kairui Song
e7118d1de8 Merge kdump-error-handler.sh into kdump.sh
kdump-error-handler.sh does nothing except calling three functions,
it can be easily merged into kdump.sh by using a parameter to run the
error handling routine.

kdump-lib-initramfs.sh was created to hold the three shared functions
and related code, so by merging these two files, kdump-lib-initramfs.sh
can be simplified by a lot.

Following up commits will clean up kdump-lib-initramfs.sh.

Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Philipp Rudo <prudo@redhat.com>
2021-09-14 03:25:54 +08:00
Kairui Song
a5faa052d4 kdump-lib-initramfs.sh: prepare to be a POSIX compatible lib
Move all functions needed in the second kernel from kdump-lib.sh
to kdump-lib-initramfs.sh, and update shebang headers.

Now, kdump-lib-initramfs.sh is an independent lib script, no longer
depend on kdump-lib.sh, and kdump-lib.sh is no longer needed for
the second kernel.

In later commits, functions in kdump-lib-initramfs.sh will be reworked
to be POSIX compatible, kdump-lib.sh will contain bash only functions.

POSIX shell have very limited features, eg. `local` keyword doesn't
exist in POSIX but we rely on that heavily. So kdump-lib.sh will
use bash syntax and contain the most complex helper and codes.

kdump-lib-initramfs.sh will contain the minimum set of helpers,
and be shared by both the first and second kernel.

Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Philipp Rudo <prudo@redhat.com>
2021-09-14 03:25:46 +08:00
Kairui Song
0e4b66b1ab bash scripts: reformat with shfmt
This is a batch update done with:
shfmt -s -w mkfadumprd mkdumprd kdumpctl *-module-setup.sh

Clean up code style and reduce code base size, no behaviour change.

Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Philipp Rudo <prudo@redhat.com>
2021-09-14 03:25:29 +08:00
Kairui Song
4f75e16700 bash scripts: declare and assign separately
Declare and assign separately to avoid masking return values:
https://github.com/koalaman/shellcheck/wiki/SC2155

Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Philipp Rudo <prudo@redhat.com>
2021-09-14 03:25:29 +08:00
Kairui Song
a4648fc851 bash scripts: fix redundant exit code check
As suggested by:
https://github.com/koalaman/shellcheck/wiki/SC2181

Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Philipp Rudo <prudo@redhat.com>
2021-09-14 03:25:29 +08:00
Kairui Song
86538ca6e2 bash scripts: fix variable quoting issue
Fixed quoting issues found by shellcheck, no feature
change. This should fix many errors when there is space
in any shell variables, eg. dump target's name/path/id.

False positives are marked with "# shellcheck disable=SCXXXX", for
example, args are expected to split so it should not be quoted.

And replaced some `cut -d ' ' -fX` with `awk '{print $X}'` since cut
is fragile, and doesn't work well with any quoted strings that have
redundant space.

Following quoting related issues are fixed (check the link
for example code and what could go wrong):

https://github.com/koalaman/shellcheck/wiki/SC2046
https://github.com/koalaman/shellcheck/wiki/SC2053
https://github.com/koalaman/shellcheck/wiki/SC2068
https://github.com/koalaman/shellcheck/wiki/SC2086
https://github.com/koalaman/shellcheck/wiki/SC2206

Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Philipp Rudo <prudo@redhat.com>
2021-09-14 03:25:29 +08:00
Kairui Song
70978c00e5 bash scripts: replace '[ ]' with '[[ ]]' for bash scripts
kdumpctl, mkdumprd, *-module-setup.sh only target bash, since they
only run in first kernel and depend on dracut, and dracut depends
on bash. So use '[[ ]]' to replace '[ ]'.

This is a batch update done with following command:
`sed -i -e 's/\(\s\)\[\s\([^]]*\)\s\]/\1\[\[\ \2 \]\]/g' kdumpctl, mkdumprd, *-module-setup.sh`
and replaced [ ... -a ... ] with [[ ... ]] && [[ ... ]] manually.

See https://tldp.org/LDP/abs/html/testconstructs.html for more details
on '[[ ]]', it's more versatile, safer, and slightly faster than '[ ]'.

This will also help shfmt to clean up the code in later commits.

Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Philipp Rudo <prudo@redhat.com>
2021-09-14 03:25:29 +08:00
Kairui Song
54cc5c44be bash scripts: use $(...) notation instead of legacy ...
This is a batch update done with following command:

`sed -i -e 's/`\([^`]*\)`/\$(\1)/g' mkfadumprd mkdumprd \
 kdumpctl dracut-module-setup.sh dracut-fadump-module-setup.sh \
 dracut-early-kdump-module-setup.sh`

And manually converted some corner cases. This fixes
all related issues detected by shellcheck.
Make it easier to do clean up in later commits.

Check following link for reasons to switch to the new syntax:
https://github.com/koalaman/shellcheck/wiki/SC2006

Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Philipp Rudo <prudo@redhat.com>
2021-09-14 03:25:29 +08:00
Kairui Song
a416930706 bash scripts: always use "read -r"
This helps to strip spaces and avoid mangling backslashes:

https://github.com/koalaman/shellcheck/wiki/SC2162

Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Philipp Rudo <prudo@redhat.com>
2021-09-14 03:25:29 +08:00
Kairui Song
fdfad3102e bash scripts: get rid of unnecessary sed calls
Use bash builtin string substitution instead, as suggested by:
https://github.com/koalaman/shellcheck/wiki/SC2001

Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Philipp Rudo <prudo@redhat.com>
2021-09-14 03:25:29 +08:00
Kairui Song
c4d85142be bash scripts: get rid of expr and let
As suggested by:
https://github.com/koalaman/shellcheck/wiki/SC2219

Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Philipp Rudo <prudo@redhat.com>
2021-09-14 03:25:29 +08:00
Kairui Song
6d45257cc1 bash scripts: remove useless cat
Some `cat` calls are useless, remove them to make it cleaner.
See: https://github.com/koalaman/shellcheck/wiki/SC2002

Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Philipp Rudo <prudo@redhat.com>
2021-09-14 03:25:29 +08:00
Kairui Song
3b0157197b dracut-module-setup.sh: remove surrounding $() for subshell
Some functions are executed in subshell to avoid variable environment
pollution. But the surrounding $() is not needed, and it may lead to
executing output which is unexpected here.

See: https://github.com/koalaman/shellcheck/wiki/SC2091

Signed-off-by: Kairui Song <kasong@redhat.com>
Suggested-by: Coiby Xu <coxu@redhat.com>
2021-09-14 03:25:29 +08:00
Kairui Song
67e559a6b9 dracut-module-setup.sh: make iscsi check fail early if cd failed
As suggested by:
https://github.com/koalaman/shellcheck/wiki/SC2164

Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Philipp Rudo <prudo@redhat.com>
2021-09-14 03:25:29 +08:00
Kairui Song
3b2fa982bb dracut-module-setup.sh: fix a loop over ls issue
Iterating over ls output is fragile:
https://github.com/koalaman/shellcheck/wiki/SC2045

Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Philipp Rudo <prudo@redhat.com>
2021-09-14 03:25:29 +08:00
Kairui Song
dfe7555323 dracut-module-setup.sh: fix a ambiguous variable reference
Wrap the variable with {...}, else it may get interpreted as array due
to the '[' char next to it.

Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Philipp Rudo <prudo@redhat.com>
2021-09-14 03:25:29 +08:00
Kairui Song
da3ad9cbda dracut-module-setup.sh: use "*" to expend array as string
As suggested by:
https://github.com/koalaman/shellcheck/wiki/SC2199
The array is not quoted here but implicitly concatenate still happens,
could be harmless but shellcheck complains about it so fix it.

Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Philipp Rudo <prudo@redhat.com>
2021-09-14 03:25:29 +08:00
Kairui Song
49dd4fcdbb dracut-module-setup.sh: fix _bondoptions wrong references
Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Philipp Rudo <prudo@redhat.com>
2021-09-14 03:25:29 +08:00