Commit Graph

319 Commits

Author SHA1 Message Date
Tao Liu c04910eebd Release 2.0.26-3
Resovles: bz2173815
Resovles: bz2078176

Signed-off-by: Tao Liu <ltao@redhat.com>
2023-05-09 18:39:00 +08:00
Tao Liu 3762c208aa Rebase makedumpfile to v1.7.3
Resolves: bz2173815

Signed-off-by: Tao Liu <ltao@redhat.com>
2023-05-09 18:34:18 +08:00
Lichen Liu 3a3c3a924a kdumpctl: lower the log level in reset_crashkernel_for_installed_kernel
Resolves: bz2078176
Upstream: Fedora
Conflict: None

commit d619b6dabe354100e52b3280c3d36ace88217fd4
Author: Lichen Liu <lichliu@redhat.com>
Date:   Tue Apr 4 14:13:14 2023 +0800

    kdumpctl: lower the log level in reset_crashkernel_for_installed_kernel

    Although upgrading the kernel with `rpm -Uvh` is not recommended, the
    kexec-tools plugin prints confusing error logs when a customer upgrades the
    kernel through it.

    ```
    kdump: kernel 5.14.0-80.el9.x86_64 doesn't exist
    kdump: Couldn't find current running kernel
    ```

    Not finding the currently running kernel will only make kdump unable to copy the
    grub entry parameters to the newly installed kernel, so lower the log level.

    Signed-off-by: Lichen Liu <lichliu@redhat.com>
    Reviewed-by: Coiby Xu <coxu@redhat.com>

Signed-off-by: Lichen Liu <lichliu@redhat.com>
2023-05-06 11:19:16 +08:00
Tao Liu fa20bd98e5 Release 2.0.26-2
Resovles: bz2173815
Resovles: bz2151504

Signed-off-by: Tao Liu <ltao@redhat.com>
2023-04-21 16:14:51 +08:00
Tao Liu 2ba6f6fb2f Rebase makedumpfile to upstream latest(8e8b8814be1)
Resolves: bz2173815

Signed-off-by: Tao Liu <ltao@redhat.com>
2023-04-21 16:03:34 +08:00
Coiby Xu a0f7f2ecdf Show how much time kdump has waited for the network to be ready
Related: bz2151504
Upstream: Fedora
Conflict: None

commit 12d9eff9dcd3bcd5890821c8d8a219b94412aca8
Author: Coiby Xu <coxu@redhat.com>
Date:   Tue Mar 28 16:33:34 2023 +0800

    Show how much time kdump has waited for the network to be ready

    Relates: https://bugzilla.redhat.com/show_bug.cgi?id=2151504

    Currently, when the network isn't ready, kdump would repeatedly print
    the same info,

        [   29.537230] kdump[671]: Bad kdump network destination: 192.123.1.21
        [   30.559418] kdump[679]: Bad kdump network destination: 192.123.1.21
        [   31.580189] kdump[687]: Bad kdump network destination: 192.123.1.21

    This is not user-friendly and users may think kdump has got stuck. So
    also show much time has waited for the network to be ready,

        [   29.546258] kdump[673]: Waiting for network to be ready (50s / 10min)
        ...
        [   32.608967] kdump[697]: Waiting for network to be ready (56s / 10min)

    Note kdump_get_ip_route no longer prints an error message and it's up to
    the caller to determine the log level and print relevant messages. And
    kdump_collect_netif_usage aborts when kdump_get_ip_route fails.

    Reported-by: Martin Pitt <mpitt@redhat.com>
    Signed-off-by: Coiby Xu <coxu@redhat.com>
    Reviewed-by: Philipp Rudo <prudo@redhat.com>

Signed-off-by: Coiby Xu <coxu@redhat.com>
2023-04-18 15:26:17 +08:00
Coiby Xu c28d6fa950 Tell nmcli to not escape colon when getting the path of connection profile
Resolves: bz2151504
Upstream: Fedora
Conflict: None

commit df6f25ff20a660ce8c300eba95e21e2fed6ed99f
Author: Coiby Xu <coxu@redhat.com>
Date:   Mon Mar 27 13:17:32 2023 +0800

    Tell nmcli to not escape colon when getting the path of connection profile

    Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2151504

    When a NetworManager connection profile contains a colon in the name,
    "nmcli --get-values UUID,FILENAME" by default would escape the colon
    because a colon is also used for separating the values. In this case,
    99kdumpbase fails to get the correct connection profile path,
            kdumpctl[5439]: cp: cannot stat '/run/NetworkManager/system-connections/static-52\\\:54\\\:01.nmconnection': No such file or directory
            kdumpctl[5440]: sed: can't read /tmp/1977-DRACUT_KDUMP_NM/ifcfg-static-52-54-01: No such file or directory
            kdumpctl[5449]: dracut-install: ERROR: installing '/tmp/1977-DRACUT_KDUMP_NM/ifcfg-static-52-54-01' to '/etc/NetworkManager/system-connections/ifcfg-static-52-54-01'

    As a result, dumping vmcore to a remote nfs would fail.

    In our case of getting connection profile path, there is no need to escape the
    colon so pass "-escape no" to nmcli,

            [root@localhost ~]# nmcli --get-values UUID,FILENAME c show
            659e09c1-a6bd-3549-9be4-a07a1a9a8ffd:/etc/NetworkManager/system-connections/aa\:bb.nmconnection

            [root@localhost ~]# nmcli -escape no --get-values UUID,FILENAME c show
            659e09c1-a6bd-3549-9be4-a07a1a9a8ffd:/etc/NetworkManager/system-connections/aa:bb.nmconnection

    Suggested-by: Beniamino Galvani <bgalvani@redhat.com>
    Reported-by: Martin Pitt <mpitt@redhat.com>
    Signed-off-by: Coiby Xu <coxu@redhat.com>
    Reviewed-by: Philipp Rudo <prudo@redhat.com>

Signed-off-by: Coiby Xu <coxu@redhat.com>
2023-04-18 15:25:48 +08:00
Tao Liu f698814882 Rebase kexec-tools to v2.0.26
Resovles: bz2173814

Signed-off-by: Tao Liu <ltao@redhat.com>
2023-04-07 16:07:26 +08:00
Tao Liu b9a8a181ac Release 2.0.25-14
Resolves: bz2140721
Resolves: bz2177574
Resolves: bz2177674

Signed-off-by: Tao Liu <ltao@redhat.com>
2023-03-21 16:09:11 +08:00
Coiby Xu 5f9fa02614 Install nfsv4-related drivers when users specify nfs dumping via dracut_args
Resolves: bz2140721
Upstream: Fedora
Conflict: None

commit 70c7598ef03a1f611b3a00d8f2254fae1da5b0eb
Author: Coiby Xu <coxu@redhat.com>
Date:   Fri Dec 23 16:03:38 2022 +0800

    Install nfsv4-related drivers when users specify nfs dumping via dracut_args

    Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2140721

    Currently, if users specify dumping to nfsv4 target via
      dracut_args --mount "<NFS-server-ip>:/var/crash /mnt nfs defaults"
    it fails with the following errors,
        [    5.159760] mount[446]: mount.nfs: Protocol not supported
        [    5.164502] systemd[1]: mnt.mount: Mount process exited, code=exited, status=32/n/a
        [    5.167616] systemd[1]: mnt.mount: Failed with result 'exit-code'.
        [FAILED] Failed to mount /mnt.

    This is because nfsv4-releted drivers are not installed to kdump initrd.
    mkdumprd calls dracut with "--hostonly-mode strict". If nfsv4-related
    drivers aren't loaded before calling dracut, they won't be installed.
    When users specify nfs dumping via dracut_args, kexec-tools won't mount
    the nfs fs beforehand hence nfsv4-related drivers won't be installed.
    Note dracut only installs the nfs driver i.e. nfsv3 driver for "--mount
    ... nfs". So also install nfsv4-related drivers when users specify nfs
    dumping via dracut_args. Since nfs_layout_nfsv41_files depends on nfsv4,
    the nfsv4 driver will be installed automatically.

    As for the reason why we support nfs dumping via dracut_args instead of
    asking user to use the nfs directive, please refer to commit 74c6f464
    ("Support special mount information via 'dracut_args'").

    Fixes: 4eedcae5 ("dracut-module-setup.sh: don't include multipath-hostonly")
    Reported-by: rcheerla@redhat.com
    Signed-off-by: Coiby Xu <coxu@redhat.com>
    Reviewed-by: Philipp Rudo <prudo@redhat.com>

Signed-off-by: Coiby Xu <coxu@redhat.com>
2023-03-21 16:01:22 +08:00
Pingfan Liu 2b2b6b84c0 Revert "ppc64: tackle SRCU hang issue"
Resolves: bz2177574
Upstream: RHEL-only

This reverts commit 870ec2ec93.

Now the real fix has gone into the RHEL-9 kernel [1], the temporary
workaround can be removed.

[1]: https://bugzilla.redhat.com/show_bug.cgi?id=2129726

Signed-off-by: Pingfan Liu <piliu@redhat.com>
2023-03-21 07:50:06 +00:00
Philipp Rudo 2f5889df5e sysconfig: add zfcp.allow_lun_scan to KDUMP_COMMANDLINE_REMOVE on s390
Resolves: bz2177674
Upstream: Fedora
Conflict: Move to kdump.sysconfig.s390 due to missing
          677da8a ("sysconfig: use a simple generator script to maintain")

Author: Philipp Rudo <prudo@redhat.com>
Date:   Tue Mar 7 14:45:35 2023 +0100

    sysconfig: add zfcp.allow_lun_scan to KDUMP_COMMANDLINE_REMOVE on s390

    Probing unnecessary I/O devices wastes memory and in extreme cases can
    cause the crashkernel to run OOM. That's why the s390-tools maintain
    their own module, 95zdev-kdump [1], that disables auto LUN scanning and
    only configures zfcp devices that can be used as dump target. So remove
    zfcp.allow_lun_scan from the kernel command line to prevent that we
    accidentally overwrite the default set by the module.

    [1] https://github.com/ibm-s390-linux/s390-tools/blob/master/zdev/dracut/95zdev-kdump/module-setup.sh

    Signed-off-by: Philipp Rudo <prudo@redhat.com>
    Reviewed-by: Coiby Xu <coxu@redhat.com>

Signed-off-by: Philipp Rudo <prudo@redhat.com>
2023-03-13 13:41:05 +01:00
Tao Liu fe7198e928 Release 2.0.25-13
Resolves: bz2174836

Signed-off-by: Tao Liu <ltao@redhat.com>
2023-03-10 11:14:01 +08:00
Lichen Liu 67f450cc9f kdump-lib: Add the CoreOS kernel dir to the boot_dirlist
Resolves: bz2174836
Upstream: Fedora
Conflict: None

commit f9c32372d2c8d5e58024d2ddc0b70498c696b5d8
Author: Lichen Liu <lichliu@redhat.com>
Date:   Tue Jun 21 16:55:09 2022 +0800

    kdump-lib: Add the CoreOS kernel dir to the boot_dirlist

    The kernel of CoreOS is not in the standard locations, add
    /boot/ostree/* to the boot_dirlist to find the vmlinuz.

    Signed-off-by: Lichen Liu <lichliu@redhat.com>
    Acked-by: Coiby Xu <coxu@redhat.com>

Signed-off-by: Lichen Liu <lichliu@redhat.com>
2023-03-07 10:42:24 +08:00
Lichen Liu 1eb996d08f kdump-lib: attempt to fix BOOT_IMAGE detection
Resolves: bz2174836
Upstream: Fedora
Conflict: None

commit f9c32372d2c8d5e58024d2ddc0b70498c696b5d8
Author: Dusty Mabe <dusty@dustymabe.com>
Date:   Wed Jun 22 12:34:12 2022 -0400

    kdump-lib: attempt to fix BOOT_IMAGE detection

    Currently $boot_img can get bad data if running on a platform
    that doesn't set BOOT_IMAGE in the kernel command line. For
    example, currently:

    - s390x Fedora CoreOS machine:

    ```
    [root@cosa-devsh ~]# sed "s/^BOOT_IMAGE=\((\S*)\)\?\(\S*\) .*/\2/" /proc/cmdline
    mitigations=auto,nosmt ignition.platform.id=qemu ostree=/ostree/boot.0/fedora-coreos/2a72567ac8f7ed678c3ac89408f795e6ccd4e97b41e14af5f471b6a807e858b9/0 root=UUID=2a88436a-3b6b-4706-b33a-b8270bd87cde rw rootflags=prjquota boot=UUID=f4b2eaa5-9317-4798-85cf-308c477fee4c crashkernel=600M
    ```

    where on a platform that uses GRUB we get:

    - x86_64 Fedora CoreOS machine:

    ```
    [root@cosa-devsh ~]# sed "s/^BOOT_IMAGE=\((\S*)\)\?\(\S*\) .*/\2/" /proc/cmdline
    /ostree/fedora-coreos-af4f6cc7b9ff486cfa647680b180e989c72c8eed03a34a42e7328e49332bd20e/vmlinuz-5.18.5-200.fc36.x86_64
    ```

    We should change the setting of the boot_img variable such that it will
    be empty if BOOT_IMAGE doesn't exist.

    With this change on the s390x machine:

    ```
    [root@cosa-devsh ~]# grep -P -o '^BOOT_IMAGE=(\S+)' /proc/cmdline | sed "s/^BOOT_IMAGE=\((\S*)\)\?\(\S*\)/\2/"
    [root@cosa-devsh ~]#
    ```

    This change mattered much more before the change in c5bdd2d which changed
    the following line from [[ -n $boot_img ]] to [[ "$boot_img" == *"$kdump_kernelver" ]].
    Still I think this change has merit.

    Signed-off-by: Dusty Mabe <dusty@dustymabe.com>
    Acked-by: Coiby Xu <coxu@redhat.com>

Signed-off-by: Lichen Liu <lichliu@redhat.com>
2023-03-07 10:41:50 +08:00
Lichen Liu 0cecfa7d45 kdump-lib: change how ostree based systems are detected
Resolves: bz2174836
Upstream: Fedora
Conflict: None

commit a1ebf0b5654625cd7a80a3b368080d4f56088537
Author: Dusty Mabe <dusty@dustymabe.com>
Date:   Fri Jun 24 09:57:03 2022 -0400

    kdump-lib: change how ostree based systems are detected

    The current recommendation is to check for /run/ostree-booted.

    See https://bugzilla.redhat.com/show_bug.cgi?id=2092012#c0

    Signed-off-by: Dusty Mabe <dusty@dustymabe.com>
    Acked-by: Coiby Xu <coxu@redhat.com>

Signed-off-by: Lichen Liu <lichliu@redhat.com>
2023-03-07 10:41:26 +08:00
Lichen Liu e47ec659e9 kdump-lib: clear up references to Atomic/CoreOS
Resolves: bz2174836
Upstream: Fedora
Conflict: None

commit 980f10aa40852da41907dc0aeb59ad7d3e8f4c30
Author: Dusty Mabe <dusty@dustymabe.com>
Date:   Wed Jun 22 11:58:31 2022 -0400

    kdump-lib: clear up references to Atomic/CoreOS

    There are many variants on OSTree based systems these days so
    we should probably refer to the class of systems as "OSTree
    based systems". Also, Atomic Host is dead.

    Signed-off-by: Dusty Mabe <dusty@dustymabe.com>
    Acked-by: Coiby Xu <coxu@redhat.com>

Signed-off-by: Lichen Liu <lichliu@redhat.com>
2023-03-07 10:40:52 +08:00
Tao Liu 577dc4415a Release 2.0.25-12
Resolves: bz2168504
Related: bz2060319

Signed-off-by: Tao Liu <ltao@redhat.com>
2023-02-24 14:06:46 +08:00
Coiby Xu ae272e2df8 Reset crashkernel to default value if newly installed kernel has crashkernel=auto
Resolves: bz2168504
Upstream: RHEL-only

After leapp upgrade from 8.8 to 9.2 on Azure,  RHEL9 kernel has
crashkernel=auto. This happens because kexec-tools's posttrans scriptlet
is executed before kernel's posttrans scriptlet (which in turn runs the
kernel-install hooks). One of the kernel-install hook is responsible for
adding a new boot entry for the new kernel. So when kexec-tools's posttrans
scriptlet is running, RHEL9 kernel is yet to have a boot entry so
kexec-tools couldn't set up the crashkernel parameter. Later one
kernel-install hook makes RHEL9 kernel inherit crashkernel=auto.

Fix this issue by letting 92-crashkernel.install reset crashkernel=auto.

Reported-by: Yuxin Sun <yuxisun@redhat.com>
Signed-off-by: Coiby Xu <coxu@redhat.com>
2023-02-23 09:21:47 +08:00
Coiby Xu ef81bb9f44 Use the correct command to get architecture
Related: bz2060319
Upstream: Fedora
Conflict: None

commit 12e6cd2b76a10bb6b52c0cc28ad0e8c8f57a319a
Author: Coiby Xu <coxu@redhat.com>
Date:   Mon Feb 20 17:33:08 2023 +0800

    Use the correct command to get architecture

    `uname -r` was used by mistake. As a result, kexec-tools failed to
    update crashkernel=auto during in-place upgrade from RHEL8 to RHEL9.

    `uname -m` should be used to get architecture instead.

    Fixes: 5951b5e2 ("Don't try to update crashkernel when bootloader is not installed")

    Signed-off-by: Coiby Xu <coxu@redhat.com>
    Reviewed-by: Lichen Liu <lichliu@redhat.com>

Signed-off-by: Coiby Xu <coxu@redhat.com>
2023-02-21 12:16:09 +08:00
Tao Liu a95e71e516 Release 2.0.25-11
Resolves: bz2158296

Signed-off-by: Tao Liu <ltao@redhat.com>
2023-01-11 17:17:15 +08:00
Pingfan Liu 870ec2ec93 ppc64: tackle SRCU hang issue
Resolves: bz2158296
Upstream: RHEL-only

On PowerPC platform, the following hang is witnessed:

Welcome to
Red Hat Enterprise Linux 9.2 Beta (Plow) dracut-057-13.git20220816.el9 (Initramfs)
!

[    1.631210] systemd[1]: Hostname set to <ibm-p9z-18-lp11.virt.pnr.lab.eng.rdu2.redhat.com>.
[-- MARK -- Mon Sep 26 01:45:00 2022]
[  243.681283] INFO: task systemd:1 blocked for more than 122 seconds.
[  243.681303]       Not tainted 5.14.0-167.el9.ppc64le #1
[  243.681315] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  243.681329] task:systemd         state:D stack:    0 pid:    1 ppid:     0 flags:0x00042000
[  243.681349] Call Trace:
[  243.681356] [c00000001a603640] [c00000004f990100] 0xc00000004f990100 (unreliable)
[  243.681378] [c00000001a603830] [c00000001001e9cc] __switch_to+0x12c/0x220
[  243.681400] [c00000001a603890] [c000000010ec5b40] __schedule+0x230/0x720
[  243.681418] [c00000001a603950] [c000000010ec6090] schedule+0x60/0x110
[  243.681435] [c00000001a603980] [c000000010ecd948] schedule_timeout+0x168/0x1c0
[  243.681454] [c00000001a603a60] [c000000010ec7214] __wait_for_common+0x134/0x360
[  243.681473] [c00000001a603b00] [c00000001017c98c] __flush_work.isra.0+0x1dc/0x3d0
[  243.681493] [c00000001a603ba0] [c0000000105cbd88] fsnotify_wait_marks_destroyed+0x28/0x40
[  243.681512] [c00000001a603bc0] [c0000000105cb800] fsnotify_destroy_group+0x60/0x150
[  243.681531] [c00000001a603c30] [c0000000105cf640] inotify_release+0x30/0xa0
[  243.681548] [c00000001a603ca0] [c00000001054fad8] __fput+0xc8/0x350
[  243.681565] [c00000001a603cf0] [c000000010183174] task_work_run+0xe4/0x160
[  243.681583] [c00000001a603d40] [c000000010021874] do_notify_resume+0x134/0x140
[  243.681602] [c00000001a603d70] [c000000010030168] interrupt_exit_user_prepare_main+0x198/0x270
[  243.681622] [c00000001a603de0] [c0000000100305ac] syscall_exit_prepare+0x6c/0x180
[  243.681641] [c00000001a603e10] [c00000001000bff4] system_call_vectored_common+0xf4/0x278
[  243.681661] --- interrupt: 3000 at 0x7fffb3015ba4
[  243.681673] NIP:  00007fffb3015ba4 LR: 0000000000000000 CTR: 0000000000000000
[  243.681687] REGS: c00000001a603e80 TRAP: 3000   Not tainted  (5.14.0-167.el9.ppc64le)
[  243.681703] MSR:  800000000000d033 <SF,EE,PR,ME,IR,DR,RI,LE>  CR: 42044440  XER: 00000000
[  243.681737] IRQMASK: 0
[  243.681737] GPR00: 0000000000000006 00007fffd24a31a0 00007fffb3127200 0000000000000000
[  243.681737] GPR04: 0000000000000002 000000000000000a 0000000000000000 0000000000000000
[  243.681737] GPR08: 0000010009ea2d40 0000000000000000 0000000000000000 0000000000000000
[  243.681737] GPR12: 0000000000000000 00007fffb3834bc0 0000000000000000 0000000000000000
[  243.681737] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[  243.681737] GPR20: 000000012c74ddf0 000000000000000e 000000000017cd3f 0000000000000000
[  243.681737] GPR24: 00007fffd24a3570 0000000000000005 0000010009eb5490 0000010009ea24e0
[  243.681737] GPR28: 0000010009ea2900 0000010009eb4850 0000010009ea2d70 00007fffb382dd98
[  243.681896] NIP [00007fffb3015ba4] 0x7fffb3015ba4
[  243.681907] LR [0000000000000000] 0x0
[  243.681917] --- interrupt: 3000
[  243.681928] INFO: task kworker/u16:1:34 blocked for more than 122 seconds.
[  243.681941]       Not tainted 5.14.0-167.el9.ppc64le #1
[  243.681951] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  243.681964] task:kworker/u16:1   state:D stack:    0 pid:   34 ppid:     2 flags:0x00000800
[  243.681982] Workqueue: events_unbound fsnotify_mark_destroy_workfn
[  243.681998] Call Trace:
[  243.682005] [c00000001a9336d0] [c00000004f990100] 0xc00000004f990100 (unreliable)
[  243.682023] [c00000001a9338c0] [c00000001001e9cc] __switch_to+0x12c/0x220
[  243.682042] [c00000001a933920] [c000000010ec5b40] __schedule+0x230/0x720
[  243.682059] [c00000001a9339e0] [c000000010ec6090] schedule+0x60/0x110
[  243.682075] [c00000001a933a10] [c000000010ecd948] schedule_timeout+0x168/0x1c0
[  243.682094] [c00000001a933af0] [c000000010ec7214] __wait_for_common+0x134/0x360
[  243.682113] [c00000001a933b90] [c000000010213370] __synchronize_srcu.part.0+0xa0/0xe0
[  243.682132] [c00000001a933c00] [c0000000105cc154] fsnotify_mark_destroy_workfn+0xc4/0x1a0
[  243.682151] [c00000001a933c70] [c00000001017acb8] process_one_work+0x298/0x580
[  243.682169] [c00000001a933d10] [c00000001017b048] worker_thread+0xa8/0x630
[  243.682185] [c00000001a933da0] [c000000010188348] kthread+0x1b8/0x1c0
[  243.682203] [c00000001a933e10] [c00000001000cd64] ret_from_kernel_thread+0x5c/0x64
[  366.561279] INFO: task systemd:1 blocked for more than 245 seconds.

The right solution should be in kernel, but since the patch [1] for SRCU
will not be merged into the mainline in near future, it had better to
have a userspace workaround to overcome this test blocker.

The workaround method is to pass the kernel parameter "srcutree.big_cpu_lim=0", so
that the SRCU system will always use srcu_node array.

[1]: https://lore.kernel.org/rcu/20221026032716.78674-1-kernelfans@gmail.com/T/#m6534975507c2abca497a94d81c7abbfea1d0978d

Signed-off-by: Pingfan Liu <piliu@redhat.com>
2023-01-06 11:26:03 +08:00
Pingfan Liu 54d8965261 Release 2.0.25-10
Resolves: bz2151500
Resolves: bz2060319
Resolves: bz2151842
Resolves: bz2139000

Signed-off-by: Pingfan Liu <piliu@redhat.com>
2022-12-27 15:11:50 +08:00
Lichen Liu e44295c4f4 Update supported-kdump-targets.txt
Related: bz2080110
Related: bz2110127
Upstream: RHEL-only

Kexec-tools supports NVMe-FC storage as dump target now.

Signed-off-by: Lichen Liu <lichliu@redhat.com>
2022-12-27 05:31:11 +00:00
Lichen Liu 5e6d9d2679 dracut-module-setup.sh: skip installing driver for the loopback interface
Resolves: bz2151500
Upstream: Fedora
Conflict: None

commit 3b22cce1cb7dd12be07822018d08c9bb8f03add9
Author: Coiby Xu <coxu@redhat.com>
Date:   Wed Dec 14 10:12:17 2022 +0800

    dracut-module-setup.sh: skip installing driver for the loopback
    interface

    Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2151500

    Currently, kdump initrd fails to be built when dumping vmcore to
    localhost via ssh or nfs,

      kdumpctl[3331]: Cannot get driver information: Operation not supported
      kdumpctl[1991]: dracut: Failed to get the driver of lo
      dracut[2020]: Failed to get the driver of lo
      kdumpctl[1775]: kdump: mkdumprd: failed to make kdump initrd
      kdumpctl[1775]: kdump: Starting kdump: [FAILED]
      systemd[1]: kdump.service: Main process exited, code=exited, status=1/FAILURE
      systemd[1]: kdump.service: Failed with result 'exit-code'.
      systemd[1]: Failed to start Crash recovery kernel arming.
      systemd[1]: kdump.service: Consumed 1.710s CPU time.

    This is because the loopback interface is used for transferring vmcore and
    ethtool can't get the driver of the loopback interface. In fact, once
    COFNIG_NET is enabled, the loopback device is enabled and there is no driver
    for the loopback device. So skip installing driver for the loopback device.
    The loopback interface is implemented in linux/drivers/net/loopback.c
    and always has the name "lo". So we can safely tell if a network
    interface is the loopback interface by its name.

    Fixes: a65dde2d ("Reduce kdump memory consumption by only installing needed NIC drivers")
    Reported-by: Martin Pitt <mpitt@redhat.com>
    Reported-by: Rich Megginson <rmeggins@redhat.com>
    Reviewed-by: Lichen Liu <lichliu@redhat.com>
    Reviewed-by: Philipp Rudo <prudo@redhat.com>
    Signed-off-by: Coiby Xu <coxu@redhat.com>

Signed-off-by: Lichen Liu <lichliu@redhat.com>
2022-12-27 05:31:11 +00:00
Coiby Xu e120508100 Don't try to update crashkernel when bootloader is not installed
Resolves: bz2060319
Upstream: Fedora
Conflict: commit a3da46d6 ("Skip reset_crashkernel_after_update
          during package install") hasn't been backported. Note it's now
          no longer needed.

commit 5951b5e26823b6bedf3237bd169a781b03f25031
Author: Coiby Xu <coxu@redhat.com>
Date:   Tue Dec 20 13:59:18 2022 +0800

    Don't try to update crashkernel when bootloader is not installed

    Currently when using anaconda to install the OS, the following errors
    occur,

        INF packaging: Configuring (running scriptlet for): kernel-core-5.14.0-70.el9.x86_64 ...
        INF dnf.rpm: grep: /boot/grub2/grubenv: No such file or directory
        grep: /boot/grub2/grubenv: No such file or directory
        grep: /boot/grub2/grubenv: No such file or directory
        grep: /boot/grub2/grubenv: No such file or directory
        ...
        INF packaging: Configuring (running scriptlet for): kexec-tools-2.0.23-9.el9.x86_64 ...
        INF dnf.rpm: grep: /boot/grub2/grubenv: No such file or directory
        grep: /boot/grub2/grubenv: No such file or directory
        grep: /boot/grub2/grubenv: No such file or directory

    Or for s390, the following errors occur,

        INF packaging: Configuring (running scriptlet for): kernel-core-5.14.0-71.el9.s390x ...
        03:37:51,232 INF dnf.rpm: grep: /etc/zipl.conf: No such file or directory
        grep: /etc/zipl.conf: No such file or directory
        grep: /etc/zipl.conf: No such file or directory

        INF packaging: Configuring (running scriptlet for): kexec-tools-2.0.23-9_1.el9_0.s390x ...
        INF dnf.rpm: grep: /etc/zipl.conf: No such file or directory

    This is because when anaconda installs the packages, bootloader hasn't
    been installed and /boot/grub2/grubenv or /etc/zipl.conf doesn't exist.
    So don't try to update crashkernel when bootloader isn't ready to avoid
    the above errors.

    Note this is the second attempt to fix this issue. Previously a file
    /tmp/kexec_tools_package_install was created to avoid running the
    related code thus to avoid the above errors but unfortunately that
    approach has two issues a) somehow osbuild doesn't delete it for RHEL b)
    this file could still exist if users manually remove kexec-tools.

    Fixes: e218128 ("Only try to reset crashkernel for osbuild during package install")
    Reported-by: Jan Stodola <jstodola@redhat.com>
    Signed-off-by: Coiby Xu <coxu@redhat.com>
    Reviewed-by: Philipp Rudo <prudo@redhat.com>

Signed-off-by: Coiby Xu <coxu@redhat.com>
2022-12-27 03:11:43 +00:00
Coiby Xu 06ddf8d90d dracut-module-setup.sh: also install the driver of physical NIC for Hyper-V VM with accelerated networking
Resolves: bz2151842
Upstream: Fedora
Conflict: None

commit bc101086e2c32594c8e01b50f0353f50d71f87f5
Author: Coiby Xu <coxu@redhat.com>
Date:   Mon Dec 12 18:37:25 2022 +0800

    dracut-module-setup.sh: also install the driver of physical NIC for
    Hyper-V VM with accelerated networking

    Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2151842

    Currently, vmcore dumping to remote fs fails on Azure Hyper-V VM with
    accelerated networking because it uses a physical NIC for accrelarated
    networking [1]. In this case, the driver for this physical NIC should be
    installed as well.

    [1] https://learn.microsoft.com/en-us/azure/virtual-network/accelerated-networking-overview

    Fixes: a65dde2d ("Reduce kdump memory consumption by only installing needed NIC drivers")

    Reported-by: Xiaoqiang Xiong <xxiong@redhat.com>
    Signed-off-by: Coiby Xu <coxu@redhat.com>
    Reviewed-by: Philipp Rudo <prudo@redhat.com>

Signed-off-by: Coiby Xu <coxu@redhat.com>
2022-12-27 02:45:59 +00:00
Lichen Liu 77ca80f75b fadump: use 'zstd' as the default compression method
Resolves: bz2139000
Upstream: Fedora
Conflict: None

commit f98bd5895e74043430b927828b0bd6944073e0cd
Author: Hari Bathini <hbathini@linux.ibm.com>
Date:   Fri Dec 2 18:46:49 2022 +0530

    fadump: use 'zstd' as the default compression method

    If available, use 'zstd' compression method to optimize the size of
    the initrd built with fadump support. Also, 'squash+zstd' is not
    preferred because more disk space is consumed with 'squash+zstd' due
    to the additional binaries needed for fadump with squash case.

    Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
    Acked-by: Tao Liu <ltao@redhat.com>
    Reviewed-by: Philipp Rudo <prudo@redhat.com>

Signed-off-by: Lichen Liu <lichliu@redhat.com>
2022-12-22 14:36:23 +08:00
Lichen Liu 73721c9a94 fadump: fix default initrd backup and restore logic
Resolves: bz2139000
Upstream: Fedora
Conflict: None

commit 25411da9660e732a53e675936813aad924ba65df
Author: Hari Bathini <hbathini@linux.ibm.com>
Date:   Fri Dec 2 18:46:50 2022 +0530

    fadump: fix default initrd backup and restore logic

    In case of fadump, default initrd is rebuilt with dump capturing
    capability, as the same initrd is used for booting production kernel
    as well as capture kernel.

    The original initrd file is backed up with a checksum, to restore
    it as the default initrd when fadump is disabled. As the checksum
    file is not kernel version specific, switching between different
    kernel versions and kdump/fadump dump mode breaks the default initrd
    backup/restore logic. Fix this by having a kernel version specific
    checksum file.

    Also, if backing up initrd fails, retaining the checksum file isn't
    useful. Remove it.

    Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
    Reviewed-by: Philipp Rudo <prudo@redhat.com>

Signed-off-by: Lichen Liu <lichliu@redhat.com>
2022-12-22 14:36:23 +08:00
Lichen Liu fb93b28df8 fadump: add a kernel install hook to clean up fadump initramfs
Resolves: bz2139000
Upstream: Fedora
Conflict: Upstream doesn't have Source37: supported-kdump-targets.txt,
so the number of SourceXX need to be changed.

commit 4a2dcab26ac5ede266449515fc906687728f9ace
Author: Hari Bathini <hbathini@linux.ibm.com>
Date:   Fri Dec 2 18:46:51 2022 +0530

    fadump: add a kernel install hook to clean up fadump initramfs

    Kdump service will create fadump initramfs when needed, but it won't
    clean up the fadump initramfs on kernel uninstall. So create a kernel
    install hook to do the clean up job.

    Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
    Reviewed-by: Philipp Rudo <prudo@redhat.com>

Signed-off-by: Lichen Liu <lichliu@redhat.com>
2022-12-22 14:36:23 +08:00
Lichen Liu 5b2306b562 fadump: avoid status check while starting in fadump mode
Resolves: bz2139000
Upstream: Fedora
Conflict: None

commit a833624fe57a28a4ccb44f7f27f06a7e867e8755
Author: Hari Bathini <hbathini@linux.ibm.com>
Date:   Mon Nov 21 18:56:08 2022 +0530

    fadump: avoid status check while starting in fadump mode

    With kernel commit 607451ce0aa9b ("powerpc/fadump: register for fadump
    as early as possible"), 'kdumpctl start' prematurely returns with the
    below message:

        "Kdump already running: [WARNING]"

    instead of setting default initrd with dump capture capability as
    required for fadump. Skip status check in fadump mode to avoid this
    problem.

    Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
    Reviewed-by: Philipp Rudo <prudo@redhat.com>

Signed-off-by: Lichen Liu <lichliu@redhat.com>
2022-12-22 14:36:23 +08:00
Lichen Liu bfe235b413 spec: only install mkfadumprd for ppc
Resolves: bz2139000
Upstream: Fedora
Conflict: None

commit 748eb3a2a6b41bc74748f1f1845b91b77548e1d8
Author: Kairui Song <kasong@tencent.com>
Date:   Sun Jan 9 18:03:35 2022 +0800

    spec: only install mkfadumprd for ppc

    fadump is a ppc only feature, mkfadumprd is only needed for fadump, drop
    it for other arch.

    Reviewed-by: Philipp Rudo <prudo@redhat.com>
    Signed-off-by: Kairui Song <kasong@tencent.com>

Signed-off-by: Lichen Liu <lichliu@redhat.com>
2022-12-22 14:36:23 +08:00
Lichen Liu a74225f763 fadump: preserve file modification time to help with hardlinking
Resolves: bz2139000
Upstream: Fedora
Conflict: None

commit f33c99e34749e976b610ad939f507cc471b33980
Author: Hari Bathini <hbathini@linux.ibm.com>
Date:   Mon Oct 31 15:42:21 2022 +0530

    fadump: preserve file modification time to help with hardlinking

    With commit fa9201b2 ("fadump: isolate fadump initramfs image within
    the default one"), initramfs image gets to hold two images, one for
    production kernel boot purpose and the other for capture kernel boot.
    Most files are common among the two images. Retain file modification
    time to replace duplicate files with hardlinks and save space. Also,
    avoid unnecessarily compressing fadump image that is decompressed
    immediately anyway.

    Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
    Reviewed-by: Philipp Rudo <prudo@redhat.com>

Signed-off-by: Lichen Liu <lichliu@redhat.com>
2022-12-22 14:36:23 +08:00
Lichen Liu 878faf6ab8 fadump: do not use squash to reduce image size
Resolves: bz2139000
Upstream: Fedora
Conflict: None

commit 55b0dd03b36556b40e1f0d660b76bc5acb99b982
Author: Hari Bathini <hbathini@linux.ibm.com>
Date:   Mon Oct 31 15:42:20 2022 +0530

    fadump: do not use squash to reduce image size

    With commit fa9201b2 ("fadump: isolate fadump initramfs image within
    the default one"), initramfs image gets to hold two squash images, one
    for production kernel boot purpose and the other for capture kernel
    boot. Having separate images improved reliability for both production
    kernel and capture kernel boot scenarios, but the size of initramfs
    image became considerably larger.

    Instead of having squash images, compressing $initdir without using
    squash images reduced the size of initramfs image for fadump case by
    around 30%. So, avoid using squash for fadump case.

    Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
    Reviewed-by: Philipp Rudo <prudo@redhat.com>

Signed-off-by: Lichen Liu <lichliu@redhat.com>
2022-12-22 14:36:23 +08:00
Tao Liu dc26e4b45e Release 2.0.25-9
Related: bz2085347
Resolves: bz2151832

Signed-off-by: Tao Liu <ltao@redhat.com>
2022-12-19 12:23:58 +08:00
Tao Liu 241dadbf19 Add virtiofs to kdump supported-kdump-targets.txt
Related: bz2085347
Upstream: RHEL-only

Signed-off-by: Tao Liu <ltao@redhat.com>
2022-12-19 03:36:58 +00:00
Coiby Xu 0aaa053cc3 dracut-module-setup.sh: stop overwriting dracut's trap handler
Resolves: bz2151832
Upstream: Fedora
Conflict: None

commit b45896c62096f7fcfd65afffb4cd93a3ae5f8b1a
Author: Coiby Xu <coxu@redhat.com>
Date:   Tue Dec 6 18:18:32 2022 +0800

    dracut-module-setup.sh: stop overwriting dracut's trap handler

    Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2149246

    Latest Workstation live x86_64 image has an excess increase of ~300 MB
    in size. This is because kdumpbase module's trap handler overwrites
    dracut's handler and DRACUT_TMPDIR which has three unpacked initramfs
    files fails to be cleaned up. This patch moves kdumpbase module's
    temporary folder under DRACUT_TMPDIR and lets dracut's trap handler do
    the cleanup instead.

    Fixes: d25b1ee3 ("Add functions to copy NetworkManage connection profiles to the initramfs")
    Signed-off-by: Coiby Xu <coxu@redhat.com>
    Reviewed-by: Philipp Rudo <prudo@redhat.com>

Signed-off-by: Coiby Xu <coxu@redhat.com>
2022-12-14 10:02:00 +08:00
Tao Liu 243717f988 Release 2.0.25-8
Resolves: bz2145087
Resolves: bz2141536
Resolves: bz2078460

Signed-off-by: Tao Liu <ltao@redhat.com>
2022-12-07 16:45:25 +08:00
Coiby Xu d88a4a1402 kexec-tools: ppc64: remove rma_top limit
Resolves: bz2145087
Conflict: None

commit 6b6187f546f0ddad8ea84d22c3f7ad72133dcfe3
Author: Sourabh Jain <sourabhjain@linux.ibm.com>
Date:   Thu Sep 15 14:12:40 2022 +0530

    ppc64: remove rma_top limit

    Restricting kexec tool to allocate hole for kexec segments below 768MB
    may not be relavent now since first memory block size can be 1024MB and
    more.

    Removing rma_top restriction will give more space to find holes for
    kexec segments and existing in-place checks make sure that kexec segment
    allocation doesn't cross the first memory block because every kexec segment
    has to be within first memory block for kdump kernel to boot properly.

    Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com>
    Acked-by: Hari Bathini <hbathini@linux.ibm.com>
    Signed-off-by: Simon Horman <horms@kernel.org>

Signed-off-by: Coiby Xu <coxu@redhat.com>
2022-12-06 11:01:48 +08:00
Lichen Liu fd2521df50 kdumpctl: Optimize _find_kernel_path_by_release regex string
Resolves: bz2141536
Upstream: Fedora
Conflict: None

commit 5eb77ee3fa986f39bb74a5956910aed6f60b8008
Author: Lichen Liu <lichliu@redhat.com>
Date:   Thu Nov 24 09:15:25 2022 +0800

    kdumpctl: Optimize _find_kernel_path_by_release regex string

    Currently _find_kernel_path_by_release uses grubby and grep to
    find the kernel path, if both the normal kernel and it's debug
    varient exist, the grep will give more than one kernel strings.

    ```
    kernel="/boot/vmlinuz-5.14.0-139.kpq0.el9.s390x+debug"
    kernel="/boot/vmlinuz-5.14.0-139.kpq0.el9.s390x"
    ```

    This will cause an error when installing debug kernel.

    ```
    The param "/boot/vmlinuz-5.14.0-139.kpq0.el9.s390x+debug
    /boot/vmlinuz-5.14.0-139.kpq0.el9.s390x" is incorrect
    ```

    Fixes: 945cbbd ("add helper functions to get kernel path by kernel release and the path of current running kernel")

    Signed-off-by: Lichen Liu <lichliu@redhat.com>
    Reviewed-by: Philipp Rudo <prudo@redhat.com>

Signed-off-by: Lichen Liu <lichliu@redhat.com>
2022-12-05 08:57:09 +00:00
Pingfan Liu aa204a3b63 kdump.conf: use a simple generator script to maintain
Resolves: bz2078460
Upstream: Fedora

commit 787b041aabd90ff2bf8fe6cc4d225032efde24d0
Author: Pingfan Liu <piliu@redhat.com>
Date:   Tue Nov 15 12:00:09 2022 +0800

    kdump.conf: use a simple generator script to maintain

    This commit has the same motivation as the commit 677da8a "sysconfig:
    use a simple generator script to maintain".

    At present, only the kdump.conf generated for s390x has a slight
    difference from the other arches, where the core_collector asks the
    makedumpfile to use "-c" option to compress dump data by each page using
    zlib, which is more efficient than lzo on s390x.

    Signed-off-by: Pingfan Liu <piliu@redhat.com>
    Reviewed-by: Philipp Rudo <prudo@redhat.com>

Signed-off-by: Pingfan Liu <piliu@redhat.com>
2022-12-01 11:01:17 +08:00
Tao Liu 576b8fa374 Release 2.0.25-7
Resolves: bz2076416

Signed-off-by: Tao Liu <ltao@redhat.com>
2022-11-25 14:46:56 +08:00
Coiby Xu fa2f8fc244 Don't run kdump_check_setup_iscsi in a subshell in order to collect needed network interfaces
Resolves: bz2076416
Upstream: Fedora
Conflict: None

commit 523cda8f343b1a48799a5c25e1d76a2a475a6b1d
Author: Coiby Xu <coxu@redhat.com>
Date:   Fri Nov 25 12:07:25 2022 +0800

    Don't run kdump_check_setup_iscsi in a subshell in order to collect needed
    network interfaces

    Currently, dumping to iSCSI target fails because the global array
    (unique_netifs) that stores the network interfaces needed by kdump is
    empty. The root cause is change of the array made in a subshell (a child
    process) is inaccessible to the parent process. So don't run
    kdump_check_setup_iscsi in a subshell.

    Fixes: 63c3805c ("Set up kdump network by directly copying NM connection profile to initrd")
    Signed-off-by: Coiby Xu <coxu@redhat.com>
    Reviewed-by: Pingfan Liu <piliu@redhat.com>

Signed-off-by: Coiby Xu <coxu@redhat.com>
2022-11-25 13:57:32 +08:00
Tao Liu 0459f68dcc Release 2.0.25-6
Resolves: bz2076416

Signed-off-by: Tao Liu <ltao@redhat.com>
2022-11-24 11:27:37 +08:00
Coiby Xu afbb32a83c Simplify setup_znet by copying connection profile to initrd
Resolves: bz2076416
Upstream: Fedora
Conflict: None

commit b5577c163aff88c458d638eb7954a000f6513ddb
Author: Coiby Xu <coxu@redhat.com>
Date:   Thu Sep 23 15:26:00 2021 +0800

    Simplify setup_znet by copying connection profile to initrd

    /usr/lib/udev/ccw_init [1] shipped by s390utils extracts the values of
    SUBCHANNELS, NETTYPE and LAYER2 from /etc/sysconfig/network-scripts/ifcfg-*
    or /etc/NetworkManager/system-connections/*.nmconnection to activate znet
    network device. If the connection profile is copied to initrd,
    there is no need to set up the "rd.znet" dracut cmdline parameter.

    There are two cases addressed by this commit,
     1. znet network interface is a slave of bonding/teaming/vlan/bridging
        network. The connection profile has been copied to initrd by
        kdump_copy_nmconnection_file and it contains the info needed by
        ccw_init.
     2. znet network interface is a slave of bonding/teaming/vlan/bridging
        network. The corresponding ifcfg-*/*.nmconnection file may not contain
        info like SUBCHANNELS [2]. In this case, copy the ifcfg-*/*.nmconnection
        file that has this info to the kdump initrd. Also to prevent the copied
        connection profile from being chosen by NM, set
        connection.autoconnect=false for this connection profile.

    With this implementation, there is also no need to check if znet is
    used beforehand.

    Note
    1. ccw_init doesn't care if SUBCHANNELS, NETTYPE and LAYER2 comes from
       an active NM profile or not. If an inactive NM profile contains this
       info, it needs to be copied to the kdump initrd as well.
    2. "rd.znet_ifname=$_netdev:${SUBCHANNELS}" is no longer needed needed
       because now there is no renaming of s390x network interfaces when
       reusing NetworkManager profiles. rd.znet_ifname was introduced in
       commit ce0305d ("Add a new option 'rd.znet_ifname' in order to use it
       in udev rules") to address the special case of non-persistent
       MAC address by renaming a network interface by SUBCHANNELS.

    [1] https://src.fedoraproject.org/rpms/s390utils/blob/rawhide/f/ccw_init
    [2] https://bugzilla.redhat.com/show_bug.cgi?id=2064708

    Signed-off-by: Coiby Xu <coxu@redhat.com>
    Reviewed-by: Thomas Haller <thaller@redhat.com>
    Reviewed-by: Philipp Rudo <prudo@redhat.com>

Signed-off-by: Coiby Xu <coxu@redhat.com>
2022-11-23 09:42:33 +08:00
Coiby Xu 561952f12a Wait for the network to be truly ready before dumping vmcore
Resolves: bz2076416
Upstream: Fedora
Conflict: None

commit 9792994f2f13dd7d6561140fe4b590114a8571c1
Author: Coiby Xu <coxu@redhat.com>
Date:   Thu Sep 22 22:31:47 2022 +0800

    Wait for the network to be truly ready before dumping vmcore

    nm-wait-online-initrd.service installed by dracut's 35-networkmanager
    module calls nm-online with "-s" which means it returns immediately when
    NetworkManager logs "startup complete". Thus it doesn't truly wait for
    network connectivity to be established [1]. Wait for the network to be
    truly ready before dumping vmcore. There are two benefits brought by
    this approach,
      - ssh/nfs dumping won't fail because of that the network is not
       ready e.g. [2][3]
      - users don't need to use workarounds like rd.net.carrier.timeout to
        make sure the network is ready

    [1] https://bugzilla.redhat.com/show_bug.cgi?id=1485712
    [2] https://bugzilla.redhat.com/show_bug.cgi?id=1909014
    [3] https://bugzilla.redhat.com/show_bug.cgi?id=2035451

    Signed-off-by: Coiby Xu <coxu@redhat.com>
    Reviewed-by: Thomas Haller <thaller@redhat.com>
    Reviewed-by: Philipp Rudo <prudo@redhat.com>

Signed-off-by: Coiby Xu <coxu@redhat.com>
2022-11-23 09:42:33 +08:00
Coiby Xu d22786bb5a Address the cases where a NIC has a different name in kdump kernel
Resolves: bz2076416
Upstream: Fedora
Conflict: None

commit 568623e69a32266a3b00225b9de6a93435c44474
Author: Coiby Xu <coxu@redhat.com>
Date:   Thu Sep 23 14:25:01 2021 +0800

    Address the cases where a NIC has a different name in kdump kernel

    A NIC may get a different name in the kdump kernel from 1st kernel
    in cases like,
     - kernel assigned network interface names are not persistent e.g. [1]
     - there is an udev rule to rename the NIC in the 1st kernel but the
       kdump initrd may not have that rule e.g. [2]

    If NM tries to match a NIC with a connection profile based on NIC name
    i.e. connection.interface-name, it will fail the above bases. A simple
    solution is to ask NM to match a connection profile by MAC address.
    Note we don't need to do this for user-created NICs like vlan, bridge and
    bond.

    An remaining issue is passing the name of a NIC via the kdumpnic dracut
    command line parameter which requires passing ifname=<interface>:<MAC> to
    have fixed NIC name. But we can simply drop this requirement. kdumpnic
    is needed because kdump needs to get the IP by NIC name and use the IP
    to created a dumping folder named "{IP}-{DATE}". We can simply pass the
    IP to the kdump kernel directly via a new dracut command line parameter
    kdumpip instead. In addition to the benefit of simplifying the code,
    there are other three benefits brought by this approach,
      - make use of whatever network to transfer the vmcore. Because  as long
        as we have the network to we don't care which NIC is active.
      - if obtained IP in the kdump kernel is different from the one in the
        1st kernel. "{IP}-{DATE}" would better tell where the dumped vmcore
        comes from.
      - without passing ifname=<interface>:<MAC> to kdump initrd, the
        issue of there are two interfaces with the same MAC address for
        Azure Hyper-V NIC SR-IOV [3] is resolved automatically.

    [1] https://bugzilla.redhat.com/show_bug.cgi?id=1121778
    [2] https://bugzilla.redhat.com/show_bug.cgi?id=810107
    [3] https://bugzilla.redhat.com/show_bug.cgi?id=1962421

    Signed-off-by: Coiby Xu <coxu@redhat.com>
    Reviewed-by: Thomas Haller <thaller@redhat.com>
    Reviewed-by: Philipp Rudo <prudo@redhat.com>

Signed-off-by: Coiby Xu <coxu@redhat.com>
2022-11-23 09:42:18 +08:00
Coiby Xu 81b414d100 Reduce kdump memory consumption by only installing needed NIC drivers
Resolves: bz2076416
Upstream: Fedora
Conflict: None

commit a65dde2d1083a57824aecd1840dea417c98c553d
Author: Coiby Xu <coxu@redhat.com>
Date:   Thu May 19 11:39:25 2022 +0800

    Reduce kdump memory consumption by only installing needed NIC drivers

    Even after having asked NM to stop managing a unneeded NIC, a NIC driver
    may still waste memory. For example, mlx5_core uses a substantial amount
    of memory during driver initialization,

    ======== Report format module_summary: ========
    Module mlx5_core using 350.2MB (89650 pages), peak allocation 367.4MB (94056 pages)
    Module squashfs using 13.1MB (3360 pages), peak allocation 13.1MB (3360 pages)
    Module overlay using 2.1MB (550 pages), peak allocation 2.2MB (555 pages)
    Module dns_resolver using 0.9MB (219 pages), peak allocation 5.2MB (1338 pages)
    Module mlxfw using 0.7MB (172 pages), peak allocation 5.3MB (1349 pages)
    ======== Report format module_summary END ========

    ======== Report format module_top: ========
    Top stack usage of module mlx5_core:
      (null) Pages: 89650 (peak: 94056)
        ret_from_fork (0xffffda088b4165f8) Pages: 60007 (peak: 60007)
          kthread (0xffffda088b4bd7e4) Pages: 60007 (peak: 60007)
            worker_thread (0xffffda088b4b48d0) Pages: 60007 (peak: 60007)
              process_one_work (0xffffda088b4b3f40) Pages: 60007 (peak: 60007)
                work_for_cpu_fn (0xffffda088b4aef00) Pages: 53906 (peak: 53906)
                  local_pci_probe (0xffffda088b9e1e44) Pages: 53906 (peak: 53906)
                    probe_one mlx5_core (0xffffda084f899cc8) Pages: 53518 (peak: 53518)
                      mlx5_init_one mlx5_core (0xffffda084f8994ac) Pages: 49756 (peak: 49756)
                        mlx5_function_setup.constprop.0 mlx5_core (0xffffda084f899100) Pages: 44434 (eak: 44434)
                          mlx5_satisfy_startup_pages mlx5_core (0xffffda084f8a4f24) Pages: 44434 (peak: 44434)
                        mlx5_function_setup.constprop.0 mlx5_core (0xffffda084f899078) Pages: 5285 (peak: 5285)
                          mlx5_cmd_init mlx5_core (0xffffda084f89e414) Pages: 4818 (peak: 4818)
                            mlx5_alloc_cmd_msg mlx5_core (0xffffda084f89aaa0) Pages: 4403 (peak: 4403)

    This memory consumption is completely unnecessary when kdump doesn't need
    this NIC. Only install needed NIC drivers to prevent this kind of waste.

    Note
    1. this patch depends on [1] to ask dracut to not install NIC drivers.
    2. "ethtool -i" somehow fails to get the vlan driver
    3. team.ko doesn't depend on the team mode drivers so we need to install
       the team mode drivers manually.

    [1] https://github.com/dracutdevs/dracut/pull/1789

    Signed-off-by: Coiby Xu <coxu@redhat.com>
    Reviewed-by: Thomas Haller <thaller@redhat.com>
    Reviewed-by: Philipp Rudo <prudo@redhat.com>

Signed-off-by: Coiby Xu <coxu@redhat.com>
2022-11-23 09:42:18 +08:00
Coiby Xu 95a39f602b Reduce kdump memory consumption by not letting NetworkManager manage unneeded network interfaces
Resolves: bz2076416
Upstream: Fedora
Conflict: None

commit 586fe410aa1b0093fb208c869b70ffeb7f085a55
Author: Coiby Xu <coxu@redhat.com>
Date:   Thu Sep 9 11:50:00 2021 +0800

    Reduce kdump memory consumption by not letting NetworkManager manage unneeded network interfaces

    By default, NetworkManger will manage all the network interfaces and
    try to set interface IFF_UP to get carrier state. Regardless of whether
    the network interface is connected to a cable or not, the NIC driver
    will allocate memory resources for e.g. ring buffers when setting IFF_UP.
    This could be a waste of memory. For example it's found i40e consumes ~15GB
    on a power machine. On this machine, i40e manages four interfaces but only
    one interface is valid. This patch use "managed=false" to tell
    NetworkManager to not manage network interfaces that are not needed by
    kdump by putting 10-kdump-netif_allowlist.conf in the initramfs.

    Signed-off-by: Coiby Xu <coxu@redhat.com>
    Reviewed-by: Thomas Haller <thaller@redhat.com>
    Reviewed-by: Philipp Rudo <prudo@redhat.com>

Signed-off-by: Coiby Xu <coxu@redhat.com>
2022-11-23 09:42:18 +08:00
Coiby Xu 420f55c096 Set up kdump network by directly copying NM connection profile to initrd
Resolves: bz2076416
Upstream: Fedora
Conflict: None

commit 63c3805c486adf700bafb5ad78cc9b0f55fcb345
Author: Coiby Xu <coxu@redhat.com>
Date:   Fri Sep 17 13:02:07 2021 +0800

    Set up kdump network by directly copying NM connection profile to initrd

    This patch setup kdump network by directly copying NM connection profile(s)
    for different network setup including bond, bridge, vlan, and team. For
    vlan network, rename phydev to parent_netif to improve code readability.

    With the new approach, the related code to build up dracut cmdline
    parameter such rd.route, ip and etc can be cleaned up. And there is no
    need to setup dns when copying .nmconnection directly to initrd
    either. Note the bootdev dracut command line parameter is only used by
    dracut's 35network-legacy and network-manager doesn't use it, remove
    related code as well.

    Note
    1. kdump_setup_vlan/bond/... are no longer called in subshells in order
       to modify global variables like unique_netifs
    2. The original kdump_install_net is renamed to better reflect its
       current function

    Signed-off-by: Coiby Xu <coxu@redhat.com>
    Reviewed-by: Thomas Haller <thaller@redhat.com>
    Reviewed-by: Philipp Rudo <prudo@redhat.com>

Signed-off-by: Coiby Xu <coxu@redhat.com>
2022-11-23 09:42:18 +08:00