kexec-tools/0006-sysconfig-add-pcie_ports-compat-to-KDUMP_COMMANDLINE.patch
Lichen Liu 426b9269b8
Re apply upstream patches
We should use PatchXX to apply upstream kdump-utils
patches instead of directly merge them into git.

Resolves: RHEL-36415
Resolves: RHEL-37670
Upstream: https://github.com/rhkdump/kdump-utils/
Conflict: None

Signed-off-by: Lichen Liu <lichliu@redhat.com>
2024-06-11 11:05:24 +08:00

62 lines
2.7 KiB
Diff

From ada6f5edf1ae06fc88759aa2f94d09e2a98d21ef Mon Sep 17 00:00:00 2001
From: Tao Liu <ltao@redhat.com>
Date: Wed, 1 May 2024 16:53:19 +0800
Subject: [PATCH 6/7] sysconfig: add pcie_ports compat to
KDUMP_COMMANDLINE_APPEND on x86_64
There have been some of failing cases of kdump in 2nd kernel, where
ususally only one cpu is enabled by "nr_cpus=1", but with a large
number of devices, which may easily exceed the maximum IRQ resources of
one cpu can handle. As a result, the 2nd kernel will hang and kdump
fails. This issue is often observed on machines with many cpus and many
devices.
On those systems, pcieports consume quite proportion of IRQ resources,
many following message can be seen in dmesg log:
pcieport 0000:18:01.0: PME: Signaling with IRQ 109
According to kernel doc[1], when "pcie_ports=compat" applied, it will disable
native PCIe services (PME, AER, DPC, PCIe hotplug). Those functions are
power management events, error reporting, performance, hotplug related,
which are not the must-have functions for kdump. In addition, after
testing, no side effects such as cannot writing vmcore into sdx, nvme
etc been noticed.
This patch will disable native PCIe services for 2nd kernel, to saving the
scarce IRQ resources and increase the kdump success.
Attach Prarit's comments:
This makes sense to me. The only concern anyone should have is that a PCIE
error could have been responsible for taking down the kernel in the first
place, and booting into the second kernel could then also have a fatal
problem. I'm not sure we can ever fix that type of cascade of panics :)
so it makes sense to disable these features.
[1]: https://www.kernel.org/doc/html/v6.9-rc1/admin-guide/kernel-parameters.html
Signed-off-by: Tao Liu <ltao@redhat.com>
Acked-by: Prarit Bhargava <prarit@redhat.com>
Acked-by: Dave Young <dyoung@redhat.com>
---
gen-kdump-sysconfig.sh | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/gen-kdump-sysconfig.sh b/gen-kdump-sysconfig.sh
index 78b0bb7..1a2cd92 100755
--- a/gen-kdump-sysconfig.sh
+++ b/gen-kdump-sysconfig.sh
@@ -104,7 +104,7 @@ s390x)
x86_64)
update_param KEXEC_ARGS "-s"
update_param KDUMP_COMMANDLINE_APPEND \
- "irqpoll nr_cpus=1 reset_devices cgroup_disable=memory mce=off numa=off udev.children-max=2 panic=10 acpi_no_memhotplug transparent_hugepage=never nokaslr hest_disable novmcoredd cma=0 hugetlb_cma=0"
+ "irqpoll nr_cpus=1 reset_devices cgroup_disable=memory mce=off numa=off udev.children-max=2 panic=10 acpi_no_memhotplug transparent_hugepage=never nokaslr hest_disable novmcoredd cma=0 hugetlb_cma=0 pcie_ports=compat"
;;
*)
echo "Warning: Unknown architecture '$1', using default sysconfig template." >&2
--
2.44.0