Commit Graph

76 Commits

Author SHA1 Message Date
Tao Liu
c5aa460992 Introduce vmcore creation notification to kdump
Upstream: fedora
Resolves: RHEL-32060
Conflict: Yes, there are several conflicts. 1) Upstream have moved
          dracut-kdump.sh into kdump-utils/dracut/99kdumpbase/kdump.sh,
          so the targeting files are changed. 2) There are several
          patchsets([1] [2]) which not backported to rhel9, so some
          formating conflicts encountered. But there is no functional
          change been made for the patch backporting.

[1]: https://github.com/rhkdump/kdump-utils/pull/18/commits
[2]: https://github.com/rhkdump/kdump-utils/pull/33/commits

commit 88525ebf5e43cc86aea66dc75ec83db58233883b
Author: Tao Liu <ltao@redhat.com>
Date:   Thu Sep 5 15:49:07 2024 +1200

    Introduce vmcore creation notification to kdump

    Motivation
    ==========

    People may forget to recheck to ensure kdump works, which as a result, a
    possibility of no vmcores generated after a real system crash. It is
    unexpected for kdump.

    It is highly recommended people to recheck kdump after any system
    modification, such as:

    a. after kernel patching or whole yum update, as it might break something
       on which kdump is dependent, maybe due to introduction of any new bug etc.
    b. after any change at hardware level, maybe storage, networking,
       firmware upgrading etc.
    c. after implementing any new application, like which involves 3rd party modules
       etc.

    Though these exceed the range of kdump, however a simple vmcore creation
    status notification is good to have for now.

    Design
    ======

    Kdump currently will check any relating files/fs/drivers modified before
    determine if initrd should rebuild when (re)start. A rebuild is an
    indicator of such modification, and kdump need to be rechecked. This will
    clear the vmcore creation status specified in $VMCORE_CREATION_STATUS.

    Vmcore creation check will happen at "kdumpctl (re)start/status", and will
    report the creation success/fail status to users. A "success" status indicates
    previously there has been a vmcore successfully generated based on the current
    env, so it is more likely a vmcore will be generated later when real crash
    happens; A "fail" status indicates previously there was no vmcore
    generated, or has been a vmcore creation failed based on current env. User
    should check the 2nd kernel log or the kexec-dmesg.log for the failing reason.

    $VMCORE_CREATION_STATUS is used for recording the vmcore creation status of
    the current env. The format will be like:

       success 1718682002

    Which means, there has been a vmcore generated successfully at this
    timestamp for the current env.

    Usage
    =====

    [root@localhost ~]# kdumpctl restart
    kdump: kexec: unloaded kdump kernel
    kdump: Stopping kdump: [OK]
    kdump: kexec: loaded kdump kernel
    kdump: Starting kdump: [OK]
    kdump: Notice: No vmcore creation test performed!

    [root@localhost ~]# kdumpctl test

    [root@localhost ~]# kdumpctl status
    kdump: Kdump is operational
    kdump: Notice: Last successful vmcore creation on Tue Jun 18 16:39:10 CST 2024

    [root@localhost ~]# kdumpctl restart
    kdump: kexec: unloaded kdump kernel
    kdump: Stopping kdump: [OK]
    kdump: kexec: loaded kdump kernel
    kdump: Starting kdump: [OK]
    kdump: Notice: Last successful vmcore creation on Tue Jun 18 16:39:10 CST 2024

    The notification for kdumpctl (re)start/status can be disabled by
    setting VMCORE_CREATION_NOTIFICATION in /etc/sysconfig/kdump

    Signed-off-by: Tao Liu <ltao@redhat.com>

Signed-off-by: Tao Liu <ltao@redhat.com>
2024-10-08 18:23:12 +13:00
Coiby Xu
bf947239de Support setting up Open vSwitch (Ovs) Bridge network
Resolves: https://issues.redhat.com/browse/RHEL-33465
Conflict: C9S misses the following two commits,
          - 1397006 ("dracut-module-setup: Remove remove_cpu_online_rule() since PowerPC uses nr_cpus")
          - 73c9eb7 ("dracut-module-setup: remove old s390 network device config (#1937048)")

Upstream Status: git@github.com:rhkdump/kdump-utils.git

commit 224d3102c54749eae98bfa1af8932aade8e4d2da
Author: Coiby Xu <coxu@redhat.com>
Date:   Mon Apr 22 15:02:42 2024 +0800

    Support setting up Open vSwitch (Ovs) Bridge network

    Resolves: https://issues.redhat.com/browse/RHEL-33465

    This patch supports setting up an Ovs bridge in kdump initrd. An Ovs
    bridge is similar to a classic Linux bridge but we use ovs-vsctl to find
    out the Ethernet device (having the MAC address as the bridge) added to
    an Ovs bridge. Once we copy all the needed NetworkManager (NM) connection
    profiles to kdump initrd and all the necessary files, NM will create an Ovs bridge
    automatically in kdump initrd.

    In the case of OpenShift Container Platform (OCP),
    ovs-configuration.service [1] is responsible for setting up an Ovs bridge.
    In theory, we can also try to bring up the original physical network
    interface before ovs-configuration.service. But this approach is
    cumbersome because it breaks our assumption that we should bring up the
    same network in kdump intrd as in 1st kernel (establishing the same network
    in kdump initrd only needs to copy the needed NM connection profiles
    thus we don't need to learn how different network setup work under the
    hood).

    How to test this patch with the help of configure-ovs.sh?
    =========================================================

    1. Extract configure-ovs.sh from [2]

    2. Install necessary packages for configure-ovs.sh
        dnf install openvswitch -yq
        dnf install NetworkManager-ovs nmap-ncat -yq

        systemctl enable --now openvswitch

        # restart NM so the ovs plugin can be activated
        systemctl restart NetworkManager

    3. Assume the network interface used for creating an Ovs bridge is
       "ens2", use configure-ovs.sh to create an Ovs bridge,

        interface=ens2
        mkdir -p /etc/ovnk
        echo $interface > /etc/ovnk/iface_default_hint
        bash configure-ovs.sh OVNKubernetes

    4. (Optional) If you want to make the created Ovs bridge survive a
       reboot, simply make the created NM connections created by
       configure-ovs.sh persist,

        cp /run/NetworkManager/system-connections/ovs-* /etc/NetworkManager/system-connections/

    If you need to create an Ovs bridge on top of a bonding network, use the
    following commands for step 3,

        nmcli con add type bond ifname bond0
        nmcli con add type ethernet ifname eth0 master bond0
        nmcli con add type ethernet ifname eth1 master bond0

        echo bond0 > /etc/ovnk/iface_default_hint
        bash configure-ovs.sh OVNKubernetes

    Note
    1. For RHEL, openvswitch3.3 may be installed so we need to get the
       package name by "rpm -qf /usr/lib/systemd/system/openvswitch.service"

    2. For RHEL9, openvswitch package needs to installed from another repo,
        cat << 'EOF' > /etc/yum.repos.d/ovs.repo
        [rhosp-rhel-9-fdp-cdn]
        name=Red Hat Enterprise Linux Fast Datapath $releasever - $basearch cdn
        baseurl=http://rhsm-pulp.corp.redhat.com/content/dist/layered/rhel9/$basearch/fast-datapath/os/
        enabled=1
        gpgcheck=0
        EOF

        dnf install openvswitch3.3 -yq

    3.  We instruct ovsdb-server to ignore NM connection files changes by
        "--ovsdb-server-options='--disable-file-column-diff'". In the
        future, this may not be needed if we simply copy all active NM
        connection profiles to kdump initrd without changing them after
        coming up with different solutions for the following cases,
        1. Some environments like some Azure machine doesn't use persistent
           NIC name. Current solution is to modify a NM connection
           profile to match a device by MAC address, for details check
           commit 568623e)

        2. If a NIC has an IPv4 or IPv6 address, set the corresponding
           may-fail property to no. Otherwise, dumping vmcore over IPv6
           could fail because only IPv4 network is ready or vice versa. Current
           solution is to disable IPv6 if only IPv4 is used and vice versa,
           for details check commit 9dfcacf,

        3. Some NICs need longer connection.wait-device-timeout otherwise
           the connection will fail to be established (commit 6b586a9).

    [1] https://github.com/openshift/machine-config-operator/blob/master/templates/common/_base/units/ovs-configuration.service.yaml
    [2] https://github.com/openshift/machine-config-operator/blob/master/templates/common/_base/files/configure-ovs-network.yaml

    Signed-off-by: Coiby Xu <coxu@redhat.com>

Signed-off-by: Coiby Xu <coxu@redhat.com>
2024-07-29 17:25:44 +08:00
Lichen Liu
be56205d06 dracut: Disable ostree-prepare-root
Resolves: RHEL-35885

commit 42cdc05a8c99a2c0834377faca04b583404cb86f
Author: Colin Walters <walters@verbum.org>
Date:   Fri Jul 19 14:23:39 2024 -0400

    dracut: Disable ostree-prepare-root

    In some images such as the recent fedora/rhel bootc base image,
    the ostree dracut module is statically enabled:
    40df0eb382/tier-0/initramfs.yaml (L9)

    And also recently, we changed the ostree systemd unit
    to enter emergency.target if it fails in:
    05b3b66275

    These two things combine mean we'll fail before kdump gets
    a chance to run.

    For our use case we don't need ostree in the initrd.

    I tried to override this with `--omit=ostree` in our dracut
    invocation, but that causes an error (dracut doesn't let the
    cmdline override static config).

    For now, let's just mask the service in our initrd.

Signed-off-by: Lichen Liu <lichliu@redhat.com>
2024-07-24 15:42:33 +12:00
Coiby Xu
74e1022fae Install the driver of physical device for a SR-IOV virtual device
Related: https://issues.redhat.com/browse/RHEL-7028
Conflict: None

Upstream Status: git@github.com:rhkdump/kdump-utils.git

commit 7a8edc8de67dccae23b01461bc3b17c0ad42aa5f
Author: Coiby Xu <coxu@redhat.com>
Date:   Wed Sep 27 09:31:39 2023 +0800

    Install the driver of physical device for a SR-IOV virtual device

    Currently, network dumping failed over a NIC that is a Single Root I/O
    Virtualization (SR-IOV) virtual device. Usually the driver of the
    virtual device won't specify the dependency on the driver of the
    physical device. So to fix this issue, the driver of the physical device
    needs to be found and installed as well.

    Fixes: a65dde2d ("Reduce kdump memory consumption by only installing needed NIC drivers")
    Signed-off-by: Coiby Xu <coxu@redhat.com>

Signed-off-by: Coiby Xu <coxu@redhat.com>
2024-05-09 13:41:05 +08:00
Coiby Xu
ce8f796720 Try to install PHY and MDIO bus drivers explicitly
Resolves: https://issues.redhat.com/browse/RHEL-7028
Conflict: None

Upstream Status: git@github.com:rhkdump/kdump-utils.git

commit d057153a1c3c36612a14143b29c0ff0be34e4fc2
Author: Coiby Xu <coxu@redhat.com>
Date:   Thu Sep 21 11:50:14 2023 +0800

    Try to install PHY and MDIO bus drivers explicitly

    Resolves: https://issues.redhat.com/browse/RHEL-7028

    Currently, nfs dumping fails on some machines that has a dedicated PHY
    driver (dealing with the physical layer) or MDIO bus (connecting the MAC
    to PHY devices) driver. This is because kexec-tools doesn't install
    dedicated PHY or MDIO driver explicitly. Usually a NIC driver shouldn't
    specify the dependency on the needed PHY or MDIO driver because it
    shouldn't a NIC (medium access control, MAC) driver is for dealing with
    the Data link layer and a PHY driver is for physical layer. So as long
    as a MAC driver can talk to the PHY layer via APIs, it shouldn't care
    which PHY driver or device it's talking to. So when the
    dependency on a PHY driver or MDIO driver is not found by dracut's
    instmods, the PHY or MDIO driver won't be installed.

    This patch passes =drivers/net/phy and =drivers/net/mdio to dracut's
    instmods which will only install in-use PHY or MDIO driver(s).

    Note ideally we should find out which PHY driver is used by a NIC but
    unfortunately currently no universal way can be found
    (/sys/class/net/NIC_NAME/phydev/driver/module can be used to find the
     name of the PHY driver for some NICs but it doesn't exist for some NICs
    like Qualcomm Atheros AR8031). So is it for a MDIO bus driver.
    Fortunately currently no huge memory consumption is found for a PHY or
    MDIO driver.

    Fixes: a65dde2d ("Reduce kdump memory consumption by only installing needed NIC drivers")
    Reported-by: Doreen Alongi <dalongi@redhat.com>
    Signed-off-by: Coiby Xu <coxu@redhat.com>

Signed-off-by: Coiby Xu <coxu@redhat.com>
2024-05-09 13:40:43 +08:00
Lichen Liu
78e19be071 dracut-module-setup: Skip initrd-cleanup and initrd-parse-etc in kdump
Resolves: https://issues.redhat.com/browse/RHEL-13996
Upstream: Fedora
Conflict: None

commit 468336700d
Author: Lichen Liu <lichliu@redhat.com>
Date:   Mon Jan 22 15:59:09 2024 +0800

    dracut-module-setup: Skip initrd-cleanup and initrd-parse-etc in kdump

    When using multipath devices as the target for kdump, if user_friendly_name
    is also specified, devices default to names like "mpath*", e.g., mpatha.
    In dracut, we obtain a persistent device name via get_persistent_dev. However,
    dracut currently believes using /dev/mapper/mpath* could cause issues, thus
    alternatively names are used, here it's /dev/disk/by-uuid/<FS_UUID>.

    During the kdump boot progress, the /dev/disk/by-uuid/<FS_UUID> will exist as
    soon as one of the path devices exists, but it won't be usable by systemd,
    since multipathd will claim that device as a path device. Then multipathd will
    get stopped before it can create the multipath device.

    Without user_friendly_name, /dev/mapper/<WWID> is considered a persistent
    device name, avoiding the issue.

    The exit of multipathd is due to two dependencies in the current dracut module
    90multipath/multipathd.service, "Before=initrd-cleanup.service" and
    "Conflicts=initrd-cleanup.service".

    As per man 5 systemd.unit, if A.service has "Conflicts=B.service", starting
    B.service will stop A.service.

    This is useful during normal boot. However, we will never switch-root after
    capturing vmcore in kdump.

    We need to ensure that multipathd is not killed due to such dependency issue.
    Without modifying multipathd.service, we add ConditionPathExists=!/proc/vmcore
    to skip initrd-cleanup.service in kdump. This approach is beneficial as
    it avoid the potential termination of other services that conflict with
    initrd-cleanup.service. Also skip initrd-parse-etc.service as it will try to
    start initrd-cleanup.service. Both of these services are used for switch root,
    so they can be safely skipped in kdump.

    Suggested-by: Benjamin Marzinski <bmarzins@redhat.com>
    Suggested-by: Dave Young <dyoung@redhat.com>
    Signed-off-by: Lichen Liu <lichliu@redhat.com>
    Reviewed-by: Philipp Rudo <prudo@redhat.com>

Signed-off-by: Lichen Liu <lichliu@redhat.com>
2024-01-24 15:51:18 +08:00
Coiby Xu
c179f45e05 Use the same /etc/resolve.conf in kdump initrd if it's managed manually
Resolves: https://issues.redhat.com/browse/RHEL-11897
Upstream: Fedora
Conflict: None

commit 38d9990389
Author: Coiby Xu <coxu@redhat.com>
Date:   Tue Dec 26 11:17:29 2023 +0800

    Use the same /etc/resolve.conf in kdump initrd if it's managed manually

    Resolves: https://issues.redhat.com/browse/RHEL-11897

    Previously fix 0177e248 ("Use the same /etc/resolve.conf in kdump initrd
    if it's managed manually") is problematic,
       1) it generated .conf file unrecognized by NetowrkManager ;
       2) this .conf file was installed to current file system instead of to the kdump initrd;
       3) this incorrect .conf file prevented the starting of NetworkManager.

    This patch fixes the above issues and also suppresses a harmless warning
    when systemd-resolved.service doesn't exist,

        # systemctl -q is-enabled systemd-resolved
        Failed to get unit file state for systemd-resolved.service: No such file or directory

    Fixes: 0177e248 ("Use the same /etc/resolve.conf in kdump initrd if it's managed manually")
    Reported-by: Jie Li <jieli@redhat.com>
    Cc: Dave Young <dyoung@redhat.com>
    Reviewed-by: Dave Young <dyoung@redhat.com>
    Signed-off-by: Coiby Xu <coxu@redhat.com>

Signed-off-by: Coiby Xu <coxu@redhat.com>
2023-12-26 12:46:10 +08:00
Coiby Xu
6e359f067f Use the same /etc/resolve.conf in kdump initrd if it's managed manually
Resolves: https://issues.redhat.com/browse/RHEL-11897
Upstream: Fedora
Conflict: None

commit 0177e24832
Author: Coiby Xu <coxu@redhat.com>
Date:   Tue Nov 7 08:28:59 2023 +0800

    Use the same /etc/resolve.conf in kdump initrd if it's managed manually

    Resolves: https://issues.redhat.com/browse/RHEL-11897

    Some users may choose to manage /etc/resolve.conf manually [1]
    by setting dns=none or use a symbolic link resolve.conf [2].
    In this case, network dumping will not work because DNS resolution
    fails. Use the same /etc/resolve.conf in kdump initrd to fix this
    problem.

    [1] https://bugzilla.gnome.org/show_bug.cgi?id=690404
    [2] https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/configuring_and_managing_networking/manually-configuring-the-etc-resolv-conf-file_configuring-and-managing-networking

    Fixes: 63c3805c ("Set up kdump network by directly copying NM connection profile to initrd")
    Reported-by: Curtis Taylor <cutaylor@redhat.com>
    Cc: Jie Li <jieli@redhat.com>
    Signed-off-by: Coiby Xu <coxu@redhat.com>
    Reviewed-by: Dave Young <dyoung@redhat.com>

Signed-off-by: Coiby Xu <coxu@redhat.com>
2023-12-05 18:01:01 +08:00
Coiby Xu
a0f7f2ecdf Show how much time kdump has waited for the network to be ready
Related: bz2151504
Upstream: Fedora
Conflict: None

commit 12d9eff9dc
Author: Coiby Xu <coxu@redhat.com>
Date:   Tue Mar 28 16:33:34 2023 +0800

    Show how much time kdump has waited for the network to be ready

    Relates: https://bugzilla.redhat.com/show_bug.cgi?id=2151504

    Currently, when the network isn't ready, kdump would repeatedly print
    the same info,

        [   29.537230] kdump[671]: Bad kdump network destination: 192.123.1.21
        [   30.559418] kdump[679]: Bad kdump network destination: 192.123.1.21
        [   31.580189] kdump[687]: Bad kdump network destination: 192.123.1.21

    This is not user-friendly and users may think kdump has got stuck. So
    also show much time has waited for the network to be ready,

        [   29.546258] kdump[673]: Waiting for network to be ready (50s / 10min)
        ...
        [   32.608967] kdump[697]: Waiting for network to be ready (56s / 10min)

    Note kdump_get_ip_route no longer prints an error message and it's up to
    the caller to determine the log level and print relevant messages. And
    kdump_collect_netif_usage aborts when kdump_get_ip_route fails.

    Reported-by: Martin Pitt <mpitt@redhat.com>
    Signed-off-by: Coiby Xu <coxu@redhat.com>
    Reviewed-by: Philipp Rudo <prudo@redhat.com>

Signed-off-by: Coiby Xu <coxu@redhat.com>
2023-04-18 15:26:17 +08:00
Coiby Xu
c28d6fa950 Tell nmcli to not escape colon when getting the path of connection profile
Resolves: bz2151504
Upstream: Fedora
Conflict: None

commit df6f25ff20
Author: Coiby Xu <coxu@redhat.com>
Date:   Mon Mar 27 13:17:32 2023 +0800

    Tell nmcli to not escape colon when getting the path of connection profile

    Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2151504

    When a NetworManager connection profile contains a colon in the name,
    "nmcli --get-values UUID,FILENAME" by default would escape the colon
    because a colon is also used for separating the values. In this case,
    99kdumpbase fails to get the correct connection profile path,
            kdumpctl[5439]: cp: cannot stat '/run/NetworkManager/system-connections/static-52\\\:54\\\:01.nmconnection': No such file or directory
            kdumpctl[5440]: sed: can't read /tmp/1977-DRACUT_KDUMP_NM/ifcfg-static-52-54-01: No such file or directory
            kdumpctl[5449]: dracut-install: ERROR: installing '/tmp/1977-DRACUT_KDUMP_NM/ifcfg-static-52-54-01' to '/etc/NetworkManager/system-connections/ifcfg-static-52-54-01'

    As a result, dumping vmcore to a remote nfs would fail.

    In our case of getting connection profile path, there is no need to escape the
    colon so pass "-escape no" to nmcli,

            [root@localhost ~]# nmcli --get-values UUID,FILENAME c show
            659e09c1-a6bd-3549-9be4-a07a1a9a8ffd:/etc/NetworkManager/system-connections/aa\:bb.nmconnection

            [root@localhost ~]# nmcli -escape no --get-values UUID,FILENAME c show
            659e09c1-a6bd-3549-9be4-a07a1a9a8ffd:/etc/NetworkManager/system-connections/aa:bb.nmconnection

    Suggested-by: Beniamino Galvani <bgalvani@redhat.com>
    Reported-by: Martin Pitt <mpitt@redhat.com>
    Signed-off-by: Coiby Xu <coxu@redhat.com>
    Reviewed-by: Philipp Rudo <prudo@redhat.com>

Signed-off-by: Coiby Xu <coxu@redhat.com>
2023-04-18 15:25:48 +08:00
Lichen Liu
5e6d9d2679 dracut-module-setup.sh: skip installing driver for the loopback interface
Resolves: bz2151500
Upstream: Fedora
Conflict: None

commit 3b22cce1cb
Author: Coiby Xu <coxu@redhat.com>
Date:   Wed Dec 14 10:12:17 2022 +0800

    dracut-module-setup.sh: skip installing driver for the loopback
    interface

    Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2151500

    Currently, kdump initrd fails to be built when dumping vmcore to
    localhost via ssh or nfs,

      kdumpctl[3331]: Cannot get driver information: Operation not supported
      kdumpctl[1991]: dracut: Failed to get the driver of lo
      dracut[2020]: Failed to get the driver of lo
      kdumpctl[1775]: kdump: mkdumprd: failed to make kdump initrd
      kdumpctl[1775]: kdump: Starting kdump: [FAILED]
      systemd[1]: kdump.service: Main process exited, code=exited, status=1/FAILURE
      systemd[1]: kdump.service: Failed with result 'exit-code'.
      systemd[1]: Failed to start Crash recovery kernel arming.
      systemd[1]: kdump.service: Consumed 1.710s CPU time.

    This is because the loopback interface is used for transferring vmcore and
    ethtool can't get the driver of the loopback interface. In fact, once
    COFNIG_NET is enabled, the loopback device is enabled and there is no driver
    for the loopback device. So skip installing driver for the loopback device.
    The loopback interface is implemented in linux/drivers/net/loopback.c
    and always has the name "lo". So we can safely tell if a network
    interface is the loopback interface by its name.

    Fixes: a65dde2d ("Reduce kdump memory consumption by only installing needed NIC drivers")
    Reported-by: Martin Pitt <mpitt@redhat.com>
    Reported-by: Rich Megginson <rmeggins@redhat.com>
    Reviewed-by: Lichen Liu <lichliu@redhat.com>
    Reviewed-by: Philipp Rudo <prudo@redhat.com>
    Signed-off-by: Coiby Xu <coxu@redhat.com>

Signed-off-by: Lichen Liu <lichliu@redhat.com>
2022-12-27 05:31:11 +00:00
Coiby Xu
06ddf8d90d dracut-module-setup.sh: also install the driver of physical NIC for Hyper-V VM with accelerated networking
Resolves: bz2151842
Upstream: Fedora
Conflict: None

commit bc101086e2
Author: Coiby Xu <coxu@redhat.com>
Date:   Mon Dec 12 18:37:25 2022 +0800

    dracut-module-setup.sh: also install the driver of physical NIC for
    Hyper-V VM with accelerated networking

    Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2151842

    Currently, vmcore dumping to remote fs fails on Azure Hyper-V VM with
    accelerated networking because it uses a physical NIC for accrelarated
    networking [1]. In this case, the driver for this physical NIC should be
    installed as well.

    [1] https://learn.microsoft.com/en-us/azure/virtual-network/accelerated-networking-overview

    Fixes: a65dde2d ("Reduce kdump memory consumption by only installing needed NIC drivers")

    Reported-by: Xiaoqiang Xiong <xxiong@redhat.com>
    Signed-off-by: Coiby Xu <coxu@redhat.com>
    Reviewed-by: Philipp Rudo <prudo@redhat.com>

Signed-off-by: Coiby Xu <coxu@redhat.com>
2022-12-27 02:45:59 +00:00
Coiby Xu
0aaa053cc3 dracut-module-setup.sh: stop overwriting dracut's trap handler
Resolves: bz2151832
Upstream: Fedora
Conflict: None

commit b45896c620
Author: Coiby Xu <coxu@redhat.com>
Date:   Tue Dec 6 18:18:32 2022 +0800

    dracut-module-setup.sh: stop overwriting dracut's trap handler

    Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2149246

    Latest Workstation live x86_64 image has an excess increase of ~300 MB
    in size. This is because kdumpbase module's trap handler overwrites
    dracut's handler and DRACUT_TMPDIR which has three unpacked initramfs
    files fails to be cleaned up. This patch moves kdumpbase module's
    temporary folder under DRACUT_TMPDIR and lets dracut's trap handler do
    the cleanup instead.

    Fixes: d25b1ee3 ("Add functions to copy NetworkManage connection profiles to the initramfs")
    Signed-off-by: Coiby Xu <coxu@redhat.com>
    Reviewed-by: Philipp Rudo <prudo@redhat.com>

Signed-off-by: Coiby Xu <coxu@redhat.com>
2022-12-14 10:02:00 +08:00
Coiby Xu
fa2f8fc244 Don't run kdump_check_setup_iscsi in a subshell in order to collect needed network interfaces
Resolves: bz2076416
Upstream: Fedora
Conflict: None

commit 523cda8f34
Author: Coiby Xu <coxu@redhat.com>
Date:   Fri Nov 25 12:07:25 2022 +0800

    Don't run kdump_check_setup_iscsi in a subshell in order to collect needed
    network interfaces

    Currently, dumping to iSCSI target fails because the global array
    (unique_netifs) that stores the network interfaces needed by kdump is
    empty. The root cause is change of the array made in a subshell (a child
    process) is inaccessible to the parent process. So don't run
    kdump_check_setup_iscsi in a subshell.

    Fixes: 63c3805c ("Set up kdump network by directly copying NM connection profile to initrd")
    Signed-off-by: Coiby Xu <coxu@redhat.com>
    Reviewed-by: Pingfan Liu <piliu@redhat.com>

Signed-off-by: Coiby Xu <coxu@redhat.com>
2022-11-25 13:57:32 +08:00
Coiby Xu
afbb32a83c Simplify setup_znet by copying connection profile to initrd
Resolves: bz2076416
Upstream: Fedora
Conflict: None

commit b5577c163a
Author: Coiby Xu <coxu@redhat.com>
Date:   Thu Sep 23 15:26:00 2021 +0800

    Simplify setup_znet by copying connection profile to initrd

    /usr/lib/udev/ccw_init [1] shipped by s390utils extracts the values of
    SUBCHANNELS, NETTYPE and LAYER2 from /etc/sysconfig/network-scripts/ifcfg-*
    or /etc/NetworkManager/system-connections/*.nmconnection to activate znet
    network device. If the connection profile is copied to initrd,
    there is no need to set up the "rd.znet" dracut cmdline parameter.

    There are two cases addressed by this commit,
     1. znet network interface is a slave of bonding/teaming/vlan/bridging
        network. The connection profile has been copied to initrd by
        kdump_copy_nmconnection_file and it contains the info needed by
        ccw_init.
     2. znet network interface is a slave of bonding/teaming/vlan/bridging
        network. The corresponding ifcfg-*/*.nmconnection file may not contain
        info like SUBCHANNELS [2]. In this case, copy the ifcfg-*/*.nmconnection
        file that has this info to the kdump initrd. Also to prevent the copied
        connection profile from being chosen by NM, set
        connection.autoconnect=false for this connection profile.

    With this implementation, there is also no need to check if znet is
    used beforehand.

    Note
    1. ccw_init doesn't care if SUBCHANNELS, NETTYPE and LAYER2 comes from
       an active NM profile or not. If an inactive NM profile contains this
       info, it needs to be copied to the kdump initrd as well.
    2. "rd.znet_ifname=$_netdev:${SUBCHANNELS}" is no longer needed needed
       because now there is no renaming of s390x network interfaces when
       reusing NetworkManager profiles. rd.znet_ifname was introduced in
       commit ce0305d ("Add a new option 'rd.znet_ifname' in order to use it
       in udev rules") to address the special case of non-persistent
       MAC address by renaming a network interface by SUBCHANNELS.

    [1] https://src.fedoraproject.org/rpms/s390utils/blob/rawhide/f/ccw_init
    [2] https://bugzilla.redhat.com/show_bug.cgi?id=2064708

    Signed-off-by: Coiby Xu <coxu@redhat.com>
    Reviewed-by: Thomas Haller <thaller@redhat.com>
    Reviewed-by: Philipp Rudo <prudo@redhat.com>

Signed-off-by: Coiby Xu <coxu@redhat.com>
2022-11-23 09:42:33 +08:00
Coiby Xu
d22786bb5a Address the cases where a NIC has a different name in kdump kernel
Resolves: bz2076416
Upstream: Fedora
Conflict: None

commit 568623e69a
Author: Coiby Xu <coxu@redhat.com>
Date:   Thu Sep 23 14:25:01 2021 +0800

    Address the cases where a NIC has a different name in kdump kernel

    A NIC may get a different name in the kdump kernel from 1st kernel
    in cases like,
     - kernel assigned network interface names are not persistent e.g. [1]
     - there is an udev rule to rename the NIC in the 1st kernel but the
       kdump initrd may not have that rule e.g. [2]

    If NM tries to match a NIC with a connection profile based on NIC name
    i.e. connection.interface-name, it will fail the above bases. A simple
    solution is to ask NM to match a connection profile by MAC address.
    Note we don't need to do this for user-created NICs like vlan, bridge and
    bond.

    An remaining issue is passing the name of a NIC via the kdumpnic dracut
    command line parameter which requires passing ifname=<interface>:<MAC> to
    have fixed NIC name. But we can simply drop this requirement. kdumpnic
    is needed because kdump needs to get the IP by NIC name and use the IP
    to created a dumping folder named "{IP}-{DATE}". We can simply pass the
    IP to the kdump kernel directly via a new dracut command line parameter
    kdumpip instead. In addition to the benefit of simplifying the code,
    there are other three benefits brought by this approach,
      - make use of whatever network to transfer the vmcore. Because  as long
        as we have the network to we don't care which NIC is active.
      - if obtained IP in the kdump kernel is different from the one in the
        1st kernel. "{IP}-{DATE}" would better tell where the dumped vmcore
        comes from.
      - without passing ifname=<interface>:<MAC> to kdump initrd, the
        issue of there are two interfaces with the same MAC address for
        Azure Hyper-V NIC SR-IOV [3] is resolved automatically.

    [1] https://bugzilla.redhat.com/show_bug.cgi?id=1121778
    [2] https://bugzilla.redhat.com/show_bug.cgi?id=810107
    [3] https://bugzilla.redhat.com/show_bug.cgi?id=1962421

    Signed-off-by: Coiby Xu <coxu@redhat.com>
    Reviewed-by: Thomas Haller <thaller@redhat.com>
    Reviewed-by: Philipp Rudo <prudo@redhat.com>

Signed-off-by: Coiby Xu <coxu@redhat.com>
2022-11-23 09:42:18 +08:00
Coiby Xu
81b414d100 Reduce kdump memory consumption by only installing needed NIC drivers
Resolves: bz2076416
Upstream: Fedora
Conflict: None

commit a65dde2d10
Author: Coiby Xu <coxu@redhat.com>
Date:   Thu May 19 11:39:25 2022 +0800

    Reduce kdump memory consumption by only installing needed NIC drivers

    Even after having asked NM to stop managing a unneeded NIC, a NIC driver
    may still waste memory. For example, mlx5_core uses a substantial amount
    of memory during driver initialization,

    ======== Report format module_summary: ========
    Module mlx5_core using 350.2MB (89650 pages), peak allocation 367.4MB (94056 pages)
    Module squashfs using 13.1MB (3360 pages), peak allocation 13.1MB (3360 pages)
    Module overlay using 2.1MB (550 pages), peak allocation 2.2MB (555 pages)
    Module dns_resolver using 0.9MB (219 pages), peak allocation 5.2MB (1338 pages)
    Module mlxfw using 0.7MB (172 pages), peak allocation 5.3MB (1349 pages)
    ======== Report format module_summary END ========

    ======== Report format module_top: ========
    Top stack usage of module mlx5_core:
      (null) Pages: 89650 (peak: 94056)
        ret_from_fork (0xffffda088b4165f8) Pages: 60007 (peak: 60007)
          kthread (0xffffda088b4bd7e4) Pages: 60007 (peak: 60007)
            worker_thread (0xffffda088b4b48d0) Pages: 60007 (peak: 60007)
              process_one_work (0xffffda088b4b3f40) Pages: 60007 (peak: 60007)
                work_for_cpu_fn (0xffffda088b4aef00) Pages: 53906 (peak: 53906)
                  local_pci_probe (0xffffda088b9e1e44) Pages: 53906 (peak: 53906)
                    probe_one mlx5_core (0xffffda084f899cc8) Pages: 53518 (peak: 53518)
                      mlx5_init_one mlx5_core (0xffffda084f8994ac) Pages: 49756 (peak: 49756)
                        mlx5_function_setup.constprop.0 mlx5_core (0xffffda084f899100) Pages: 44434 (eak: 44434)
                          mlx5_satisfy_startup_pages mlx5_core (0xffffda084f8a4f24) Pages: 44434 (peak: 44434)
                        mlx5_function_setup.constprop.0 mlx5_core (0xffffda084f899078) Pages: 5285 (peak: 5285)
                          mlx5_cmd_init mlx5_core (0xffffda084f89e414) Pages: 4818 (peak: 4818)
                            mlx5_alloc_cmd_msg mlx5_core (0xffffda084f89aaa0) Pages: 4403 (peak: 4403)

    This memory consumption is completely unnecessary when kdump doesn't need
    this NIC. Only install needed NIC drivers to prevent this kind of waste.

    Note
    1. this patch depends on [1] to ask dracut to not install NIC drivers.
    2. "ethtool -i" somehow fails to get the vlan driver
    3. team.ko doesn't depend on the team mode drivers so we need to install
       the team mode drivers manually.

    [1] https://github.com/dracutdevs/dracut/pull/1789

    Signed-off-by: Coiby Xu <coxu@redhat.com>
    Reviewed-by: Thomas Haller <thaller@redhat.com>
    Reviewed-by: Philipp Rudo <prudo@redhat.com>

Signed-off-by: Coiby Xu <coxu@redhat.com>
2022-11-23 09:42:18 +08:00
Coiby Xu
95a39f602b Reduce kdump memory consumption by not letting NetworkManager manage unneeded network interfaces
Resolves: bz2076416
Upstream: Fedora
Conflict: None

commit 586fe410aa
Author: Coiby Xu <coxu@redhat.com>
Date:   Thu Sep 9 11:50:00 2021 +0800

    Reduce kdump memory consumption by not letting NetworkManager manage unneeded network interfaces

    By default, NetworkManger will manage all the network interfaces and
    try to set interface IFF_UP to get carrier state. Regardless of whether
    the network interface is connected to a cable or not, the NIC driver
    will allocate memory resources for e.g. ring buffers when setting IFF_UP.
    This could be a waste of memory. For example it's found i40e consumes ~15GB
    on a power machine. On this machine, i40e manages four interfaces but only
    one interface is valid. This patch use "managed=false" to tell
    NetworkManager to not manage network interfaces that are not needed by
    kdump by putting 10-kdump-netif_allowlist.conf in the initramfs.

    Signed-off-by: Coiby Xu <coxu@redhat.com>
    Reviewed-by: Thomas Haller <thaller@redhat.com>
    Reviewed-by: Philipp Rudo <prudo@redhat.com>

Signed-off-by: Coiby Xu <coxu@redhat.com>
2022-11-23 09:42:18 +08:00
Coiby Xu
420f55c096 Set up kdump network by directly copying NM connection profile to initrd
Resolves: bz2076416
Upstream: Fedora
Conflict: None

commit 63c3805c48
Author: Coiby Xu <coxu@redhat.com>
Date:   Fri Sep 17 13:02:07 2021 +0800

    Set up kdump network by directly copying NM connection profile to initrd

    This patch setup kdump network by directly copying NM connection profile(s)
    for different network setup including bond, bridge, vlan, and team. For
    vlan network, rename phydev to parent_netif to improve code readability.

    With the new approach, the related code to build up dracut cmdline
    parameter such rd.route, ip and etc can be cleaned up. And there is no
    need to setup dns when copying .nmconnection directly to initrd
    either. Note the bootdev dracut command line parameter is only used by
    dracut's 35network-legacy and network-manager doesn't use it, remove
    related code as well.

    Note
    1. kdump_setup_vlan/bond/... are no longer called in subshells in order
       to modify global variables like unique_netifs
    2. The original kdump_install_net is renamed to better reflect its
       current function

    Signed-off-by: Coiby Xu <coxu@redhat.com>
    Reviewed-by: Thomas Haller <thaller@redhat.com>
    Reviewed-by: Philipp Rudo <prudo@redhat.com>

Signed-off-by: Coiby Xu <coxu@redhat.com>
2022-11-23 09:42:18 +08:00
Coiby Xu
1141e03fa1 Stop dracut 35network-manager from running nm-initrd-generator
Resolves: bz2076416
Upstream: Fedora
Conflict: None

commit 62355ebe5a
Author: Coiby Xu <coxu@redhat.com>
Date:   Fri Sep 23 22:16:49 2022 +0800

    Stop dracut 35network-manager from running nm-initrd-generator

    kexec-tools depends on dracut's 35network-manager module which will
    call nm-initrd-generator. We don't want nm-initrd-generator to generate
    connection profiles since we  will copy them from 1st kernel to
    kdump kernel initramfs. NetworkManager >= 1.35.2 won't generate connection
    profiles if there's a connection dir with rd.neednet. For Fedora/RHEL,
    this connection dir is /etc/NetworkManager/system-connections. For the
    details, please refer to the NetworkManager commit 79885656d3
    ("initrd: don't add a connection if there's a connection dir with
    rd.neednet") [1]. Before the release of NetworkManager >= 1.35.2, we
    need to mask /usr/libexec/nm-initrd-generator.

    [1] https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1010

    Signed-off-by: Coiby Xu <coxu@redhat.com>
    Reviewed-by: Thomas Haller <thaller@redhat.com>
    Reviewed-by: Philipp Rudo <prudo@redhat.com>

Signed-off-by: Coiby Xu <coxu@redhat.com>
2022-11-23 09:42:18 +08:00
Coiby Xu
214e9d0bef Apply the timeout configuration of nm-initrd-generator
Resolves: bz2076416
Upstream: Fedora
Conflict: None

commit 6b586a9036
Author: Coiby Xu <coxu@redhat.com>
Date:   Thu Sep 22 22:08:43 2022 +0800

    Apply the timeout configuration of nm-initrd-generator

    nm-wait-online-initrd.service installed by dracut's 35-networkmanager
    module calls nm-online with "-s" which means it returns immediately when
    NetworkManager logs "startup complete" after certain timeouts are
    reached. "startup complete" doesn't necessarily network connectivity has
    been established. nm-initrd-generator has a set of timeouts that in most
    of cases when applied, "startup-complete" means network connectivity has
    been established. So apply it when setting up kdump network.

    Suggested-by: Thomas Haller <thaller@redhat.com>
    Signed-off-by: Coiby Xu <coxu@redhat.com>
    Reviewed-by: Thomas Haller <thaller@redhat.com>
    Reviewed-by: Philipp Rudo <prudo@redhat.com>

Signed-off-by: Coiby Xu <coxu@redhat.com>
2022-11-23 09:42:18 +08:00
Coiby Xu
668875e186 Determine whether IPv4 or IPv6 is needed
Resolves: bz2076416
Upstream: Fedora
Conflict: None

commit 9dfcacf72d
Author: Coiby Xu <coxu@redhat.com>
Date:   Thu Sep 8 17:06:19 2022 +0800

    Determine whether IPv4 or IPv6 is needed

    According to `man nm-online`,
      "By default, connections have the ipv4.may-fail and
      ipv6.may-fail properties set to yes; this means that
      NetworkManager waits for one of the two address families to
      complete configuration before considering the connection
      activated. If you need a specific address family configured
      before network-online.target is reached, set the corresponding
      may-fail property to no."

    If a NIC has an IPv4 or IPv6 address, set the corresponding may-fail
    property to no. Otherwise, dumping vmcore over IPv6 could fail because
    only IPv4 network is ready or vice versa.

    Also disable IPv6 if only IPv4 is used and vice versa.

    Signed-off-by: Coiby Xu <coxu@redhat.com>
    Reviewed-by: Thomas Haller <thaller@redhat.com>
    Reviewed-by: Philipp Rudo <prudo@redhat.com>

Signed-off-by: Coiby Xu <coxu@redhat.com>
2022-11-23 09:42:18 +08:00
Coiby Xu
94f8eed573 Add functions to copy NetworkManage connection profiles to the initramfs
Resolves: bz2076416
Upstream: Fedora
Conflict: None

commit d25b1ee31c
Author: Coiby Xu <coxu@redhat.com>
Date:   Thu Sep 9 11:35:52 2021 +0800

    Add functions to copy NetworkManage connection profiles to the initramfs

    Each network interface is manged by a NM connection. Given a list of
    network interface names, copy the NetworkManager (NM) connection
    profiles i.e. .nmconnection files to the kdump initramfs.

    Before copying a connection file, clone it to automatically convert a
    legacy ifcfg-*[1] file to a .nmconnection file and for the convenience of
    editing the connection profile.

    [1] https://fedoraproject.org/wiki/Changes/NetworkManager_keyfile_instead_of_ifcfg_rh

    Signed-off-by: Coiby Xu <coxu@redhat.com>
    Reviewed-by: Thomas Haller <thaller@redhat.com>
    Reviewed-by: Philipp Rudo <prudo@redhat.com>

Signed-off-by: Coiby Xu <coxu@redhat.com>
2022-11-23 09:42:18 +08:00
Coiby Xu
ec669a8e8b Fix error for vlan over team network interface
Related: bz2076416
Upstream: Fedora
Conflict: None

commit b7e58619d1
Author: Coiby Xu <coxu@redhat.com>
Date:   Mon Sep 13 22:13:44 2021 +0800

    Fix error for vlan over team network interface

    6f9235887f ("module-setup.sh: enable
    vlan on team interface") skips establishing teaming network by mistake.
    Although it could use one of slave netifs to establish connection
    to transfer vmcore to remote fs, it breaks the implicit assumption of
    creating an identical network topology to the 1st kernel.

    Fixes: 6f92358 ("module-setup.sh: enable vlan on team interface")
    Signed-off-by: Coiby Xu <coxu@redhat.com>
    Reviewed-by: Thomas Haller <thaller@redhat.com>
    Reviewed-by: Philipp Rudo <prudo@redhat.com>

Signed-off-by: Coiby Xu <coxu@redhat.com>
2022-11-23 09:42:18 +08:00
Tao Liu
10e0a513a4 Add dependency of dracut lvmthinpool-monitor module
upstream: fedora
resolves: bz2083475
conflict: Yes, use "grep -q <<< $(cmd)" instead of
          "cmd | grep -q", because the latter will
          fail with strange reason.

commit f11721077a
Author: Tao Liu <ltao@redhat.com>
Date:   Sat Oct 8 15:41:41 2022 +0800

    Add dependency of dracut lvmthinpool-monitor module

    The 80lvmthinpool-monitor module is needed for monitor and
    autoextend the size of thin pool in 2nd kernel. The module was
    integrated in dracut version 057.

    If lvmthinpool-monitor module is not found, we will print a warning.
    Because we don't want to block the kdump process when the thin pool
    capacity is enough and no monitor-and-autoextend actually needed.

    Signed-off-by: Tao Liu <ltao@redhat.com>
    Reviewed-by: Philipp Rudo <prudo@redhat.com>

Signed-off-by: Tao Liu <ltao@redhat.com>
2022-11-11 11:49:32 +08:00
Tao Liu
dcaec956e8 virtiofs support for kexec-tools
upstream: fedora
resolves: bz2085347
conflict: yes, small conflict due to patch
          "kdumpctl: drop DUMP_TARGET variable" not
          backported to rhel9.

commit c743881ae6
Author: Tao Liu <ltao@redhat.com>
Date:   Fri Sep 23 18:13:11 2022 +0800

    virtiofs support for kexec-tools

    This patch add virtiofs support for kexec-tools by introducing a new option
    for /etc/kdump.conf:

    virtiofs myfs

    Where myfs is a variable tag name specified in qemu cmdline
    "-device vhost-user-fs-pci,tag=myfs".

    The patch covers the following cases:
    1) Dumping VM's vmcore to a virtiofs shared directory;
    2) When the VM's rootfs is a virtiofs shared directory and dumping the
       VM's vmcore to its subdirectory, such as /var/crash;
    3) The combination of case 1 & 2: The VM's rootfs is a virtiofs shared
       directory and dumping the VM's vmcore to another virtiofs shared
       directory.

    Case 2 & 3 need dracut >= 057, otherwise VM cannot boot from virtiofs
    shared rootfs. But it is not the issue of kexec-tools.

    Reviewed-by: Philipp Rudo <prudo@redhat.com>
    Signed-off-by: Tao Liu <ltao@redhat.com>

Signed-off-by: Tao Liu <ltao@redhat.com>
2022-10-26 10:24:57 +08:00
Pingfan Liu
5ac720fc20 ppc64/ppc64le: drop cpu online rule in 40-redhat.rules in kdump initramfs
Resolves: bz2023165
Upstream: Fedora
Conflict: None

commit a3c1e70fc1c0e4bab4149f617cbd629e89bd5ca0 (HEAD -> main)
Author: Pingfan Liu <piliu@redhat.com>
Date:   Wed Dec 8 10:46:38 2021 +0800

    ppc64/ppc64le: drop cpu online rule in 40-redhat.rules in kdump initramfs

    Onlining secondary cpus breaks kdump completely on KVM on Power hosts
    Though we use maxcpus=1 by default but 40-redhat.rules will bring up all
    possible cpus by default.

    Thus before we get the kernel fix and the systemd rule fix let's remove
    the cpu rule in 40-redhat.rules for ppc64/ppc64le kdump initramfs.

    This is back ported from RHEL, and original credit goes to Dave Young
    <dyoung@redhat.com>

    Signed-off-by: Pingfan Liu <piliu@redhat.com>

Signed-off-by: Pingfan Liu <piliu@redhat.com>
2021-12-29 11:21:42 +08:00
Tao Liu
6a373dffde fix broken extra_bins when installing multiple binaries
upstream: fedora
resolves: bz2003832
conflict: none

commit 6936fbc1b2
Author: Coiby Xu <coxu@redhat.com>
Date:   Mon Nov 1 14:13:16 2021 +0800

    fix broken extra_bins when installing multiple binaries

    When there more than one binaries, quoting "$val" would make
    dracut-install treat multiple binaries as one binary. Take
    "extra_bins /usr/sbin/ping /usr/sbin/ip" as an example, the
    following error would occur when building initrd,

    dracut-install: ERROR: installing '/usr/sbin/ping /usr/sbin/ip'
    dracut: FAILED: /usr/lib/dracut/dracut-install -D /var/tmp/dracut.ODrioZ/initramfs -a /usr/sbin/ping /usr/sbin/ip

    Fix it by not quoting the variable and bypassing SC2086 shellcheck.

    Fixes: commit 86538ca6e2
           ("bash scripts: fix variable quoting issue")

    Acked-by: Tao Liu <ltao@redhat.com>
    Signed-off-by: Coiby Xu <coxu@redhat.com>

Signed-off-by: Tao Liu <ltao@redhat.com>
2021-11-10 10:27:18 +08:00
Tao Liu
c9f583baa4 kdump-lib.sh: rework nmcli related functions
upstream: fedora
resolves: bz2003832
conflict: none

commit 58d3e6db3a
Author: Kairui Song <kasong@redhat.com>
Date:   Wed Sep 8 15:20:42 2021 +0800

    kdump-lib.sh: rework nmcli related functions

    This fixes word splitting issue with nmcli args. Current kexec-tools
    scripts won't call nmcli with correct arguments when there are space in
    network interface name.

    nmcli expects multiple parameters, but get_nmcli_value_by_field only
    accepts two params and depends on shell word splitting to split the
    _nm_show_cmd into multiple params, which is very fragile.
    So switch the param order, simplified this function and now multiple
    params can be used properly.

    And get_nmcli_connection_show_cmd_by_ifname returns multiple
    nmcli params in a single variable, it depend on shell word splitting to
    split the words when calling nmcli. But this is very fragile and break
    easily when there are any special character in the connection path.

    This function is only introduced to get and cache the nmcli command
    which contains the "connection name".

    Actually only cache the "connection path" is enough. Callers should
    just call get_nmcli_connection_apath_by_ifname to cache the path, and
    a new helper get_nmcli_field_by_conpath is introduced here to get value
    from nmcli. This way "connection path" can contain any character.

    Also get rid of another nmcli_cmd usage in
    get_nmcli_connection_apath_by_ifname which stores multiple params in a
    single bash variable separated by space.

    Signed-off-by: Kairui Song <kasong@redhat.com>
    Acked-by: Philipp Rudo <prudo@redhat.com>

Signed-off-by: Tao Liu <ltao@redhat.com>
2021-11-09 21:53:00 +08:00
Tao Liu
039c1d4dc8 dracut-kdump.sh: Use stat instead of ls to get vmcore size
upstream: fedora
resolves: bz2003832
conflict: none

commit b1c794a2cf
Author: Kairui Song <kasong@redhat.com>
Date:   Tue Sep 14 03:00:48 2021 +0800

    dracut-kdump.sh: Use stat instead of ls to get vmcore size

    ls output is fragile, so use stat instead.

    Signed-off-by: Kairui Song <kasong@redhat.com>
    Acked-by: Philipp Rudo <prudo@redhat.com>

Signed-off-by: Tao Liu <ltao@redhat.com>
2021-11-09 21:46:04 +08:00
Tao Liu
13a24c49ab Merge kdump-error-handler.sh into kdump.sh
upstream: fedora
resolves: bz2003832
conflict: none

commit e7118d1de8
Author: Kairui Song <kasong@redhat.com>
Date:   Mon Aug 2 00:50:22 2021 +0800

    Merge kdump-error-handler.sh into kdump.sh

    kdump-error-handler.sh does nothing except calling three functions,
    it can be easily merged into kdump.sh by using a parameter to run the
    error handling routine.

    kdump-lib-initramfs.sh was created to hold the three shared functions
    and related code, so by merging these two files, kdump-lib-initramfs.sh
    can be simplified by a lot.

    Following up commits will clean up kdump-lib-initramfs.sh.

    Signed-off-by: Kairui Song <kasong@redhat.com>
    Acked-by: Philipp Rudo <prudo@redhat.com>

Signed-off-by: Tao Liu <ltao@redhat.com>
2021-11-09 21:45:31 +08:00
Tao Liu
35519c3eca kdump-lib-initramfs.sh: prepare to be a POSIX compatible lib
upstream: fedora
resolves: bz2003832
conflict: none

commit a5faa052d4
Author: Kairui Song <kasong@redhat.com>
Date:   Tue Sep 14 03:25:46 2021 +0800

    kdump-lib-initramfs.sh: prepare to be a POSIX compatible lib

    Move all functions needed in the second kernel from kdump-lib.sh
    to kdump-lib-initramfs.sh, and update shebang headers.

    Now, kdump-lib-initramfs.sh is an independent lib script, no longer
    depend on kdump-lib.sh, and kdump-lib.sh is no longer needed for
    the second kernel.

    In later commits, functions in kdump-lib-initramfs.sh will be reworked
    to be POSIX compatible, kdump-lib.sh will contain bash only functions.

    POSIX shell have very limited features, eg. `local` keyword doesn't
    exist in POSIX but we rely on that heavily. So kdump-lib.sh will
    use bash syntax and contain the most complex helper and codes.

    kdump-lib-initramfs.sh will contain the minimum set of helpers,
    and be shared by both the first and second kernel.

    Signed-off-by: Kairui Song <kasong@redhat.com>
    Acked-by: Philipp Rudo <prudo@redhat.com>

Signed-off-by: Tao Liu <ltao@redhat.com>
2021-11-09 21:45:15 +08:00
Tao Liu
b494b7f193 bash scripts: reformat with shfmt
upstream: fedora
resolves: bz2003832
conflict:
    function load_kdump_kernel_key() not exist in rhel9,
    so related patch hunk is removed.

commit 0e4b66b1ab
Author: Kairui Song <kasong@redhat.com>
Date:   Tue Sep 14 02:25:40 2021 +0800

    bash scripts: reformat with shfmt

    This is a batch update done with:
    shfmt -s -w mkfadumprd mkdumprd kdumpctl *-module-setup.sh

    Clean up code style and reduce code base size, no behaviour change.

    Signed-off-by: Kairui Song <kasong@redhat.com>
    Acked-by: Philipp Rudo <prudo@redhat.com>

Signed-off-by: Tao Liu <ltao@redhat.com>
2021-11-09 21:42:45 +08:00
Tao Liu
63308480fc bash scripts: declare and assign separately
upstream: fedora
resolves: bz2003832
conflict: none

commit 4f75e16700
Author: Kairui Song <kasong@redhat.com>
Date:   Wed Aug 18 02:04:45 2021 +0800

    bash scripts: declare and assign separately

    Declare and assign separately to avoid masking return values:
    https://github.com/koalaman/shellcheck/wiki/SC2155

    Signed-off-by: Kairui Song <kasong@redhat.com>
    Acked-by: Philipp Rudo <prudo@redhat.com>

Signed-off-by: Tao Liu <ltao@redhat.com>
2021-11-09 21:37:15 +08:00
Tao Liu
f6d6b60a6a bash scripts: fix redundant exit code check
upstream: fedora
resolves: bz2003832
conflict: none

commit a4648fc851
Author: Kairui Song <kasong@redhat.com>
Date:   Wed Sep 8 17:23:16 2021 +0800

    bash scripts: fix redundant exit code check

    As suggested by:
    https://github.com/koalaman/shellcheck/wiki/SC2181

    Signed-off-by: Kairui Song <kasong@redhat.com>
    Acked-by: Philipp Rudo <prudo@redhat.com>

Signed-off-by: Tao Liu <ltao@redhat.com>
2021-11-09 21:35:45 +08:00
Tao Liu
bf4667b866 bash scripts: fix variable quoting issue
upstream: fedora
resolves: bz2003832
conflict:
    function remove_kdump_kernel_key() not presented in rhel9,
    so related patch hunk are removed.

commit 86538ca6e2
Author: Kairui Song <kasong@redhat.com>
Date:   Wed Sep 8 17:21:41 2021 +0800

    bash scripts: fix variable quoting issue

    Fixed quoting issues found by shellcheck, no feature
    change. This should fix many errors when there is space
    in any shell variables, eg. dump target's name/path/id.

    False positives are marked with "# shellcheck disable=SCXXXX", for
    example, args are expected to split so it should not be quoted.

    And replaced some `cut -d ' ' -fX` with `awk '{print $X}'` since cut
    is fragile, and doesn't work well with any quoted strings that have
    redundant space.

    Following quoting related issues are fixed (check the link
    for example code and what could go wrong):

    https://github.com/koalaman/shellcheck/wiki/SC2046
    https://github.com/koalaman/shellcheck/wiki/SC2053
    https://github.com/koalaman/shellcheck/wiki/SC2068
    https://github.com/koalaman/shellcheck/wiki/SC2086
    https://github.com/koalaman/shellcheck/wiki/SC2206

    Signed-off-by: Kairui Song <kasong@redhat.com>
    Acked-by: Philipp Rudo <prudo@redhat.com>

Signed-off-by: Tao Liu <ltao@redhat.com>
2021-11-09 21:27:55 +08:00
Tao Liu
c373a2c582 Don't use die in dracut-module-setup.sh
upstream: fedora
resolves: bz2003832
conflict: none

commit 8b4b7bf808
Author: Coiby Xu <coxu@redhat.com>
Date:   Fri Mar 26 10:22:09 2021 +0800

    Don't use die in dracut-module-setup.sh

    die (in dracut-lib.sh) is supposed to be used in the initramfs environment.

    Signed-off-by: Coiby Xu <coxu@redhat.com>
    Acked-by: Kairui Song <kasong@redhat.com>

Signed-off-by: Tao Liu <ltao@redhat.com>
2021-11-09 21:22:55 +08:00
Tao Liu
dcb59c30d5 bash scripts: replace '[ ]' with '[[ ]]' for bash scripts
upstream: fedora
resolves: bz2003832
conflict:
    function load_kdump_kernel_key() not presented in rhel9,
    so related patch hunk are removed.

commit 70978c00e5
Author: Kairui Song <kasong@redhat.com>
Date:   Wed Sep 8 17:20:51 2021 +0800

    bash scripts: replace '[ ]' with '[[ ]]' for bash scripts

    kdumpctl, mkdumprd, *-module-setup.sh only target bash, since they
    only run in first kernel and depend on dracut, and dracut depends
    on bash. So use '[[ ]]' to replace '[ ]'.

    This is a batch update done with following command:
    `sed -i -e 's/\(\s\)\[\s\([^]]*\)\s\]/\1\[\[\ \2 \]\]/g' kdumpctl, mkdumprd, *-module-setup.sh`
    and replaced [ ... -a ... ] with [[ ... ]] && [[ ... ]] manually.

    See https://tldp.org/LDP/abs/html/testconstructs.html for more details
    on '[[ ]]', it's more versatile, safer, and slightly faster than '[ ]'.

    This will also help shfmt to clean up the code in later commits.

    Signed-off-by: Kairui Song <kasong@redhat.com>
    Acked-by: Philipp Rudo <prudo@redhat.com>

Signed-off-by: Tao Liu <ltao@redhat.com>
2021-11-09 21:13:59 +08:00
Tao Liu
98e1935293 Don't iterate the whole /sys/devices just to find drm device
upstream: fedora
resolves: bz2003832
conflict: none

commit c6021648f1
Author: Kairui Song <kasong@redhat.com>
Date:   Fri Mar 19 18:21:11 2021 +0800

    Don't iterate the whole /sys/devices just to find drm device

    On some large systems, /sys/devices is huge and it's not a wise idea to
    iterate it. `find` may cause tremendous contention on the kernfs_mutex
    when there are already stress on /sys, and it will perform very very
    poorly.

    Simply check if drm class presents should be good enough.

    Signed-off-by: Kairui Song <kasong@redhat.com>
    Acked-by: Pingfan Liu <piliu@redhat.com>

Signed-off-by: Tao Liu <ltao@redhat.com>
2021-11-09 20:48:26 +08:00
Tao Liu
16c2821171 bash scripts: use $(...) notation instead of legacy ...
upstream: fedora
resolves: bz2003832
conflict: none

commit 54cc5c44be
Author: Kairui Song <kasong@redhat.com>
Date:   Wed Sep 8 01:48:52 2021 +0800

    bash scripts: use $(...) notation instead of legacy `...`

    This is a batch update done with following command:

    `sed -i -e 's/`\([^`]*\)`/\$(\1)/g' mkfadumprd mkdumprd \
     kdumpctl dracut-module-setup.sh dracut-fadump-module-setup.sh \
     dracut-early-kdump-module-setup.sh`

    And manually converted some corner cases. This fixes
    all related issues detected by shellcheck.
    Make it easier to do clean up in later commits.

    Check following link for reasons to switch to the new syntax:
    https://github.com/koalaman/shellcheck/wiki/SC2006

    Signed-off-by: Kairui Song <kasong@redhat.com>
    Acked-by: Philipp Rudo <prudo@redhat.com>

Signed-off-by: Tao Liu <ltao@redhat.com>
2021-11-09 20:39:50 +08:00
Tao Liu
67611bba2a bash scripts: always use "read -r"
upstream: fedora
resolves: bz2003832
conflict: none

commit a416930706
Author: Kairui Song <kasong@redhat.com>
Date:   Wed Aug 4 15:50:30 2021 +0800

    bash scripts: always use "read -r"

    This helps to strip spaces and avoid mangling backslashes:

    https://github.com/koalaman/shellcheck/wiki/SC2162

    Signed-off-by: Kairui Song <kasong@redhat.com>
    Acked-by: Philipp Rudo <prudo@redhat.com>

Signed-off-by: Tao Liu <ltao@redhat.com>
2021-11-09 20:39:17 +08:00
Tao Liu
d07b20d718 bash scripts: get rid of unnecessary sed calls
upstream: fedora
resolves: bz2003832
conflict: none

commit fdfad3102e
Author: Kairui Song <kasong@redhat.com>
Date:   Wed Aug 4 15:46:27 2021 +0800

    bash scripts: get rid of unnecessary sed calls

    Use bash builtin string substitution instead, as suggested by:
    https://github.com/koalaman/shellcheck/wiki/SC2001

    Signed-off-by: Kairui Song <kasong@redhat.com>
    Acked-by: Philipp Rudo <prudo@redhat.com>

Signed-off-by: Tao Liu <ltao@redhat.com>
2021-11-09 20:39:09 +08:00
Tao Liu
480de7c63d bash scripts: get rid of expr and let
upstream: fedora
resolves: bz2003832
conflict: none

commit c4d85142be
Author: Kairui Song <kasong@redhat.com>
Date:   Wed Aug 4 15:18:59 2021 +0800

    bash scripts: get rid of expr and let

    As suggested by:
    https://github.com/koalaman/shellcheck/wiki/SC2219

    Signed-off-by: Kairui Song <kasong@redhat.com>
    Acked-by: Philipp Rudo <prudo@redhat.com>

Signed-off-by: Tao Liu <ltao@redhat.com>
2021-11-09 20:39:02 +08:00
Tao Liu
e07098aa14 bash scripts: remove useless cat
upstream: fedora
resolves: bz2003832
conflict:
    load_kdump_kernel_key() didn't present in rhel9,
    so removed the patch for it.

commit 6d45257cc1
Author: Kairui Song <kasong@redhat.com>
Date:   Wed Aug 4 15:14:00 2021 +0800

    bash scripts: remove useless cat

    Some `cat` calls are useless, remove them to make it cleaner.
    See: https://github.com/koalaman/shellcheck/wiki/SC2002

    Signed-off-by: Kairui Song <kasong@redhat.com>
    Acked-by: Philipp Rudo <prudo@redhat.com>

Signed-off-by: Tao Liu <ltao@redhat.com>
2021-11-09 20:38:55 +08:00
Tao Liu
6d930905d5 dracut-module-setup.sh: remove surrounding $() for subshell
upstream: fedora
resolves: bz2003832
conflict: none

commit 3b0157197b
Author: Kairui Song <kasong@redhat.com>
Date:   Wed Sep 8 15:15:44 2021 +0800

    dracut-module-setup.sh: remove surrounding $() for subshell

    Some functions are executed in subshell to avoid variable environment
    pollution. But the surrounding $() is not needed, and it may lead to
    executing output which is unexpected here.

    See: https://github.com/koalaman/shellcheck/wiki/SC2091

    Signed-off-by: Kairui Song <kasong@redhat.com>
    Suggested-by: Coiby Xu <coxu@redhat.com>

Signed-off-by: Tao Liu <ltao@redhat.com>
2021-11-03 16:15:26 +08:00
Tao Liu
9abf44a082 dracut-module-setup.sh: make iscsi check fail early if cd failed
upstream: fedora
resolves: bz2003832
conflict: none

commit 67e559a6b9
Author: Kairui Song <kasong@redhat.com>
Date:   Wed Aug 4 16:29:55 2021 +0800

    dracut-module-setup.sh: make iscsi check fail early if cd failed

    As suggested by:
    https://github.com/koalaman/shellcheck/wiki/SC2164

    Signed-off-by: Kairui Song <kasong@redhat.com>
    Acked-by: Philipp Rudo <prudo@redhat.com>

Signed-off-by: Tao Liu <ltao@redhat.com>
2021-11-03 16:15:06 +08:00
Tao Liu
72c3befcb8 dracut-module-setup.sh: fix a loop over ls issue
upstream: fedora
resolves: bz2003832
conflict: none

commit 3b2fa982bb
Author: Kairui Song <kasong@redhat.com>
Date:   Wed Aug 4 16:16:44 2021 +0800

    dracut-module-setup.sh: fix a loop over ls issue

    Iterating over ls output is fragile:
    https://github.com/koalaman/shellcheck/wiki/SC2045

    Signed-off-by: Kairui Song <kasong@redhat.com>
    Acked-by: Philipp Rudo <prudo@redhat.com>

Signed-off-by: Tao Liu <ltao@redhat.com>
2021-11-03 16:14:44 +08:00
Tao Liu
057e505536 dracut-module-setup.sh: fix a ambiguous variable reference
upstream: fedora
resolves: bz2003832
conflict: none

commit dfe7555323
Author: Kairui Song <kasong@redhat.com>
Date:   Wed Aug 4 15:51:34 2021 +0800

    dracut-module-setup.sh: fix a ambiguous variable reference

    Wrap the variable with {...}, else it may get interpreted as array due
    to the '[' char next to it.

    Signed-off-by: Kairui Song <kasong@redhat.com>
    Acked-by: Philipp Rudo <prudo@redhat.com>

Signed-off-by: Tao Liu <ltao@redhat.com>
2021-11-03 16:14:24 +08:00
Tao Liu
c8faddc4f8 dracut-module-setup.sh: use "*" to expend array as string
upstream: fedora
resolves: bz2003832
conflict: none

commit da3ad9cbda
Author: Kairui Song <kasong@redhat.com>
Date:   Wed Aug 4 15:47:43 2021 +0800

    dracut-module-setup.sh: use "*" to expend array as string

    As suggested by:
    https://github.com/koalaman/shellcheck/wiki/SC2199
    The array is not quoted here but implicitly concatenate still happens,
    could be harmless but shellcheck complains about it so fix it.

    Signed-off-by: Kairui Song <kasong@redhat.com>
    Acked-by: Philipp Rudo <prudo@redhat.com>

Signed-off-by: Tao Liu <ltao@redhat.com>
2021-11-03 16:14:08 +08:00
Tao Liu
c0cbd45726 dracut-module-setup.sh: fix _bondoptions wrong references
upstream: fedora
resolves: bz2003832
conflict: none

commit 49dd4fcdbb
Author: Kairui Song <kasong@redhat.com>
Date:   Wed Aug 4 15:41:10 2021 +0800

    dracut-module-setup.sh: fix _bondoptions wrong references

    Signed-off-by: Kairui Song <kasong@redhat.com>
    Acked-by: Philipp Rudo <prudo@redhat.com>

Signed-off-by: Tao Liu <ltao@redhat.com>
2021-11-03 16:13:55 +08:00