kexec-tools/kdump.sysconfig.x86_64

61 lines
2.6 KiB
Plaintext
Raw Normal View History

# Kernel Version string for the -kdump kernel, such as 2.6.13-1544.FC5kdump
# If no version is specified, then the init script will try to find a
# kdump kernel with the same version number as the running kernel.
KDUMP_KERNELVER=""
# The kdump commandline is the command line that needs to be passed off to
# the kdump kernel. This will likely match the contents of the grub kernel
# line. For example:
# KDUMP_COMMANDLINE="ro root=LABEL=/"
# Dracut depends on proper root= options, so please make sure that appropriate
# root= options are copied from /proc/cmdline. In general it is best to append
# command line options using "KDUMP_COMMANDLINE_APPEND=".
# If a command line is not specified, the default will be taken from
# /proc/cmdline
KDUMP_COMMANDLINE=""
# This variable lets us remove arguments from the current kdump commandline
# as taken from either KDUMP_COMMANDLINE above, or from /proc/cmdline
# NOTE: some arguments such as crashkernel will always be removed
KDUMP_COMMANDLINE_REMOVE="hugepages hugepagesz slub_debug quiet log_buf_len swiotlb cma hugetlb_cma ignition.firstboot"
# This variable lets us append arguments to the current kdump commandline
# after processed by KDUMP_COMMANDLINE_REMOVE
sysconfig: add pcie_ports compat to KDUMP_COMMANDLINE_APPEND on x86_64 Upstream: fedora Resolves: RHEL-3929 Conflict: Yes, for fedora there is no kdump.sysconfig.x86_64, but gen-kdump-sysconfig.sh. So for backporting, the modification is made on kdump.sysconfig.x86_64. commit ada6f5edf1ae06fc88759aa2f94d09e2a98d21ef Author: Tao Liu <ltao@redhat.com> Date: Wed May 1 16:53:19 2024 +0800 sysconfig: add pcie_ports compat to KDUMP_COMMANDLINE_APPEND on x86_64 There have been some of failing cases of kdump in 2nd kernel, where ususally only one cpu is enabled by "nr_cpus=1", but with a large number of devices, which may easily exceed the maximum IRQ resources of one cpu can handle. As a result, the 2nd kernel will hang and kdump fails. This issue is often observed on machines with many cpus and many devices. On those systems, pcieports consume quite proportion of IRQ resources, many following message can be seen in dmesg log: pcieport 0000:18:01.0: PME: Signaling with IRQ 109 According to kernel doc[1], when "pcie_ports=compat" applied, it will disable native PCIe services (PME, AER, DPC, PCIe hotplug). Those functions are power management events, error reporting, performance, hotplug related, which are not the must-have functions for kdump. In addition, after testing, no side effects such as cannot writing vmcore into sdx, nvme etc been noticed. This patch will disable native PCIe services for 2nd kernel, to saving the scarce IRQ resources and increase the kdump success. Attach Prarit's comments: This makes sense to me. The only concern anyone should have is that a PCIE error could have been responsible for taking down the kernel in the first place, and booting into the second kernel could then also have a fatal problem. I'm not sure we can ever fix that type of cascade of panics :) so it makes sense to disable these features. [1]: https://www.kernel.org/doc/html/v6.9-rc1/admin-guide/kernel-parameters.html Signed-off-by: Tao Liu <ltao@redhat.com> Acked-by: Prarit Bhargava <prarit@redhat.com> Acked-by: Dave Young <dyoung@redhat.com> Signed-off-by: Tao Liu <ltao@redhat.com>
2024-05-31 05:21:20 +00:00
KDUMP_COMMANDLINE_APPEND="irqpoll nr_cpus=1 reset_devices cgroup_disable=memory mce=off numa=off udev.children-max=2 panic=10 acpi_no_memhotplug transparent_hugepage=never nokaslr hest_disable novmcoredd cma=0 hugetlb_cma=0 pcie_ports=compat"
# Any additional kexec arguments required. In most situations, this should
# be left empty
#
# Example:
# KEXEC_ARGS="--elf32-core-headers"
KEXEC_ARGS="-s"
#Where to find the boot image
#KDUMP_BOOTDIR="/boot"
#What is the image type used for kdump
KDUMP_IMG="vmlinuz"
#What is the images extension. Relocatable kernels don't have one
KDUMP_IMG_EXT=""
Introduce vmcore creation notification to kdump Upstream: fedora Resolves: RHEL-32060 Conflict: Yes, there are several conflicts. 1) Upstream have moved dracut-kdump.sh into kdump-utils/dracut/99kdumpbase/kdump.sh, so the targeting files are changed. 2) There are several patchsets([1] [2]) which not backported to rhel9, so some formating conflicts encountered. But there is no functional change been made for the patch backporting. [1]: https://github.com/rhkdump/kdump-utils/pull/18/commits [2]: https://github.com/rhkdump/kdump-utils/pull/33/commits commit 88525ebf5e43cc86aea66dc75ec83db58233883b Author: Tao Liu <ltao@redhat.com> Date: Thu Sep 5 15:49:07 2024 +1200 Introduce vmcore creation notification to kdump Motivation ========== People may forget to recheck to ensure kdump works, which as a result, a possibility of no vmcores generated after a real system crash. It is unexpected for kdump. It is highly recommended people to recheck kdump after any system modification, such as: a. after kernel patching or whole yum update, as it might break something on which kdump is dependent, maybe due to introduction of any new bug etc. b. after any change at hardware level, maybe storage, networking, firmware upgrading etc. c. after implementing any new application, like which involves 3rd party modules etc. Though these exceed the range of kdump, however a simple vmcore creation status notification is good to have for now. Design ====== Kdump currently will check any relating files/fs/drivers modified before determine if initrd should rebuild when (re)start. A rebuild is an indicator of such modification, and kdump need to be rechecked. This will clear the vmcore creation status specified in $VMCORE_CREATION_STATUS. Vmcore creation check will happen at "kdumpctl (re)start/status", and will report the creation success/fail status to users. A "success" status indicates previously there has been a vmcore successfully generated based on the current env, so it is more likely a vmcore will be generated later when real crash happens; A "fail" status indicates previously there was no vmcore generated, or has been a vmcore creation failed based on current env. User should check the 2nd kernel log or the kexec-dmesg.log for the failing reason. $VMCORE_CREATION_STATUS is used for recording the vmcore creation status of the current env. The format will be like: success 1718682002 Which means, there has been a vmcore generated successfully at this timestamp for the current env. Usage ===== [root@localhost ~]# kdumpctl restart kdump: kexec: unloaded kdump kernel kdump: Stopping kdump: [OK] kdump: kexec: loaded kdump kernel kdump: Starting kdump: [OK] kdump: Notice: No vmcore creation test performed! [root@localhost ~]# kdumpctl test [root@localhost ~]# kdumpctl status kdump: Kdump is operational kdump: Notice: Last successful vmcore creation on Tue Jun 18 16:39:10 CST 2024 [root@localhost ~]# kdumpctl restart kdump: kexec: unloaded kdump kernel kdump: Stopping kdump: [OK] kdump: kexec: loaded kdump kernel kdump: Starting kdump: [OK] kdump: Notice: Last successful vmcore creation on Tue Jun 18 16:39:10 CST 2024 The notification for kdumpctl (re)start/status can be disabled by setting VMCORE_CREATION_NOTIFICATION in /etc/sysconfig/kdump Signed-off-by: Tao Liu <ltao@redhat.com> Signed-off-by: Tao Liu <ltao@redhat.com>
2024-10-08 01:48:04 +00:00
# Enable vmcore creation notification by default, disable by setting
# VMCORE_CREATION_NOTIFICATION=""
VMCORE_CREATION_NOTIFICATION="yes"
# Logging is controlled by following variables in the first kernel:
# - @var KDUMP_STDLOGLVL - logging level to standard error (console output)
# - @var KDUMP_SYSLOGLVL - logging level to syslog (by logger command)
# - @var KDUMP_KMSGLOGLVL - logging level to /dev/kmsg (only for boot-time)
#
# In the second kernel, kdump will use the rd.kdumploglvl option to set the
# log level in the above KDUMP_COMMANDLINE_APPEND.
# - @var rd.kdumploglvl - logging level to syslog (by logger command)
# - for example: add the rd.kdumploglvl=3 option to KDUMP_COMMANDLINE_APPEND
#
# Logging levels: no logging(0), error(1),warn(2),info(3),debug(4)
#
# KDUMP_STDLOGLVL=3
# KDUMP_SYSLOGLVL=0
# KDUMP_KMSGLOGLVL=0