sysconfig: add pcie_ports compat to KDUMP_COMMANDLINE_APPEND on x86_64

Upstream: fedora
Resolves: RHEL-3929
Conflict: Yes, for fedora there is no kdump.sysconfig.x86_64,
          but gen-kdump-sysconfig.sh. So for backporting, the
          modification is made on kdump.sysconfig.x86_64.

commit ada6f5edf1ae06fc88759aa2f94d09e2a98d21ef
Author: Tao Liu <ltao@redhat.com>
Date:   Wed May 1 16:53:19 2024 +0800

    sysconfig: add pcie_ports compat to KDUMP_COMMANDLINE_APPEND on x86_64

    There have been some of failing cases of kdump in 2nd kernel, where
    ususally only one cpu is enabled by "nr_cpus=1", but with a large
    number of devices, which may easily exceed the maximum IRQ resources of
    one cpu can handle. As a result, the 2nd kernel will hang and kdump
    fails. This issue is often observed on machines with many cpus and many
    devices.

    On those systems, pcieports consume quite proportion of IRQ resources,
    many following message can be seen in dmesg log:

       pcieport 0000:18:01.0: PME: Signaling with IRQ 109

    According to kernel doc[1], when "pcie_ports=compat" applied, it will disable
    native PCIe services (PME, AER, DPC, PCIe hotplug). Those functions are
    power management events, error reporting, performance, hotplug related,
    which are not the must-have functions for kdump. In addition, after
    testing, no side effects such as cannot writing vmcore into sdx, nvme
    etc been noticed.

    This patch will disable native PCIe services for 2nd kernel, to saving the
    scarce IRQ resources and increase the kdump success.

    Attach Prarit's comments:

    This makes sense to me. The only concern anyone should have is that a PCIE
    error could have been responsible for taking down the kernel in the first
    place, and booting into the second kernel could then also have a fatal
    problem. I'm not sure we can ever fix that type of cascade of panics :)
    so it makes sense to disable these features.

    [1]: https://www.kernel.org/doc/html/v6.9-rc1/admin-guide/kernel-parameters.html

    Signed-off-by: Tao Liu <ltao@redhat.com>
    Acked-by: Prarit Bhargava <prarit@redhat.com>
    Acked-by: Dave Young <dyoung@redhat.com>

Signed-off-by: Tao Liu <ltao@redhat.com>
This commit is contained in:
Tao Liu 2024-05-31 13:21:20 +08:00
parent d11491330e
commit 810b726b82

View File

@ -21,7 +21,7 @@ KDUMP_COMMANDLINE_REMOVE="hugepages hugepagesz slub_debug quiet log_buf_len swio
# This variable lets us append arguments to the current kdump commandline
# after processed by KDUMP_COMMANDLINE_REMOVE
KDUMP_COMMANDLINE_APPEND="irqpoll nr_cpus=1 reset_devices cgroup_disable=memory mce=off numa=off udev.children-max=2 panic=10 acpi_no_memhotplug transparent_hugepage=never nokaslr hest_disable novmcoredd cma=0 hugetlb_cma=0"
KDUMP_COMMANDLINE_APPEND="irqpoll nr_cpus=1 reset_devices cgroup_disable=memory mce=off numa=off udev.children-max=2 panic=10 acpi_no_memhotplug transparent_hugepage=never nokaslr hest_disable novmcoredd cma=0 hugetlb_cma=0 pcie_ports=compat"
# Any additional kexec arguments required. In most situations, this should
# be left empty