kexec-tools

Author	SHA1	Message	Date
Pingfan Liu	340188939b	fadump: pass additional parameters for capture kernel Resolves: https://issues.redhat.com/browse/RHEL-52925 Upstream: kdump-utils Confilict: apply the changes in kdump.sysconfig.ppc64le by manual commit 77b80ce5e369c7b5cf8321e4cdc20c139910f92c Author: Hari Bathini <hbathini@linux.ibm.com> Date: Sat Oct 19 00:28:34 2024 +0530 fadump: pass additional parameters for capture kernel Since kernel commit 3416c9daa6b13 ("powerpc/fadump: pass additional parameters when fadump is active"), fadump supports passing additional parameters to dump capture kernel. Leverage that support here to pass additional parameters to dump capture kernel. Also, update fadump-howto.txt to make clear on the options that are not relevant for fadump in /etc/sysconfig/kdump The default bootargs to append for fadump capture kernel boot are chosen with the intent to optimize resources and reduce memory footprint in dump capture environment. Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> Signed-off-by: Pingfan Liu <piliu@redhat.com>	2024-12-09 21:45:00 +08:00
Tao Liu	fc66e25f7b	Re-introduce vmcore creation notification to kdump Upstream: fedora Resolves: RHEL-70214 Conflict: Yes, the conflict is the same as the original c9s commit `c5aa4609` ("Introduce vmcore creation notification to kdump") `9ec61f6c` ("Return the correct exit code of rebuild initrd") Also this patch cherry-picked the ipv6 fixed in [1]. [1]: https://github.com/rhkdump/kdump-utils/pull/60/files commit 24e76222c740def1d03a506652400fe55959e024 Author: Tao Liu <ltao@redhat.com> Date: Fri Nov 29 16:15:18 2024 +1300 Re-introduce vmcore creation notification to kdump Motivation ========== People may forget to recheck to ensure kdump works, which as a result, a possibility of no vmcores generated after a real system crash. It is unexpected for kdump. It is highly recommended people to test kdump after any system modification, such as: a. after kernel patching or whole yum update, as it might break something on which kdump is dependent, maybe due to introduction of any new bug etc. b. after any change at hardware level, maybe storage, networking, firmware upgrading etc. c. after implementing any new application, like which involves 3rd party modules etc. Though these exceed the range of kdump, however a simple vmcore creation status notification is good to have for now. Design ====== Kdump currently will check any relating files/fs/drivers modified before determine if initrd should rebuild when (re)start. A rebuild is an indicator of such modification, and kdump need to be tested. This will clear the vmcore creation status specified in $VMCORE_CREATION_STATUS, and as a result, a notification of vmcore creation test will be outputted. To test kdump, there is an entry for doing that by "kdumpctl test". It will generate a timestamp string as the ID of the current test, along with a "pending" status in $VMCORE_CREATION_STATUS, then a real crash & dump process will be triggered. After system reboot back to normal, a vmcore creation check will start at "kdumpctl (re)start/status", and will report the results as success/fail/manual status to users. To achieve that, program will first check the status in $VMCORE_CREATION_STATUS. If "pending" status if found, which means the test result is undetermined and need a retrive from remote/local dump folder. Then if test id is found in the dump folder and vmcore is complete, then "pending" would be overwritten by "success", which indicates a successful kdump test. If test id is found in the dump folder but vmcore is incomplete, then it is a "fail" kdump test. If no test id is found, then it is a "manual" status, which indicates users should check the test results manually. If $VMCORE_CREATION_STATUS is already success/fail/manual status, it indicates the test result has already been determined, so the program will not access the remote/local dump folder again. This can limite any unnecessary access to dump target, shorten the time consumption. User should check for the root cause of fail/manual status when get reports. $VMCORE_CREATION_STATUS is used for recording the vmcore creation status of the current env. The format is like: <status> kdump_test_id=<timestamp sec>-<timestamp nanosec> e.g: success kdump_test_id=1729823462-938751820 Which means, there has been a successful kdump test at $(date -d "@1729823462") timestamp for the current env. Timestamp nanosec is only meaningful for uniquify id string. Difference ========== Previously there is one commit 88525ebf ("Introduce vmcore creation notification to kdump") merged and addressing the same issue, but implemented differently: The prev one: Save the $VMCORE_CREATION_STATUS to local drive during the 2nd kernel dumping. If vmcore dumping target is different from $VMCORE_CREATION_STATUS's drive, then the latter one need to be mounted in 2nd kernel. This one: Save the $VMCORE_CREATION_STATUS to local drive only in 1nd kernel, that is, the test result is retrived after 2nd kernel dumping. So it doesn't load or mount other drive in 2nd kernel. The advantage: Extra mounting in 2nd kernel will introduce higher risk of failure, as a result, lower the success of vmcore dumping, which is unaccepted. So keep the code for 2nd kernel as simple is preferred. Usage ===== [root@localhost ~]# kdumpctl restart kdump: kexec: unloaded kdump kernel kdump: Stopping kdump: [OK] kdump: kexec: loaded kdump kernel kdump: Starting kdump: [OK] kdump: Notice: No vmcore creation test performed! [root@localhost ~]# kdumpctl status kdump: Kdump is operational kdump: Notice: No vmcore creation test performed! [root@localhost ~]# kdumpctl test [root@localhost ~]# cat /var/lib/kdump/vmcore-creation.status pending kdump_test_id=1729823462-938751820 [root@localhost ~]# kdumpctl status kdump: Kdump is operational kdump: Notice: Last successful vmcore creation on Fri Oct 25 02:31:02 AM UTC 2024 [root@localhost ~]# cat /var/lib/kdump/vmcore-creation.status success kdump_test_id=1729823462-938751820 [root@localhost ~]# kdumpctl restart kdump: kexec: unloaded kdump kernel kdump: Stopping kdump: [OK] kdump: kexec: loaded kdump kernel kdump: Starting kdump: [OK] kdump: Notice: Last successful vmcore creation on Fri Oct 25 02:31:02 AM UTC 2024 Note: the notification for kdumpctl (re)start/status can be disabled by setting VMCORE_CREATION_NOTIFICATION in /etc/sysconfig/kdump. And fadump is NOT supported for this feature. Signed-off-by: Tao Liu <ltao@redhat.com> Signed-off-by: Tao Liu <ltao@redhat.com>	2024-12-06 15:27:20 +13:00
Tao Liu	79aec45f8c	Revert "Introduce vmcore creation notification to kdump" Resolves: RHEL-70214 Upstream: fedora Conflict: Yes, the conflict is the same as the original c9s commit `c5aa4609` ("Introduce vmcore creation notification to kdump") `9ec61f6c` ("Return the correct exit code of rebuild initrd") commit 96956928a66d9256cdf8bfed6a8963ddea35aac9 Author: Tao Liu <ltao@redhat.com> Date: Fri Nov 29 14:42:01 2024 +1300 Revert "Introduce vmcore creation notification to kdump" This patch will revert the following 2 patches: 88525ebf ("Introduce vmcore creation notification to kdump") 35449537 ("Return the correct exit code of rebuild initrd") For the preparation of reimplementation of vmcore creation notification. Signed-off-by: Tao Liu <ltao@redhat.com> Signed-off-by: Tao Liu <ltao@redhat.com>	2024-12-06 11:25:25 +13:00
Tao Liu	c5aa460992	Introduce vmcore creation notification to kdump Upstream: fedora Resolves: RHEL-32060 Conflict: Yes, there are several conflicts. 1) Upstream have moved dracut-kdump.sh into kdump-utils/dracut/99kdumpbase/kdump.sh, so the targeting files are changed. 2) There are several patchsets([1] [2]) which not backported to rhel9, so some formating conflicts encountered. But there is no functional change been made for the patch backporting. [1]: https://github.com/rhkdump/kdump-utils/pull/18/commits [2]: https://github.com/rhkdump/kdump-utils/pull/33/commits commit 88525ebf5e43cc86aea66dc75ec83db58233883b Author: Tao Liu <ltao@redhat.com> Date: Thu Sep 5 15:49:07 2024 +1200 Introduce vmcore creation notification to kdump Motivation ========== People may forget to recheck to ensure kdump works, which as a result, a possibility of no vmcores generated after a real system crash. It is unexpected for kdump. It is highly recommended people to recheck kdump after any system modification, such as: a. after kernel patching or whole yum update, as it might break something on which kdump is dependent, maybe due to introduction of any new bug etc. b. after any change at hardware level, maybe storage, networking, firmware upgrading etc. c. after implementing any new application, like which involves 3rd party modules etc. Though these exceed the range of kdump, however a simple vmcore creation status notification is good to have for now. Design ====== Kdump currently will check any relating files/fs/drivers modified before determine if initrd should rebuild when (re)start. A rebuild is an indicator of such modification, and kdump need to be rechecked. This will clear the vmcore creation status specified in $VMCORE_CREATION_STATUS. Vmcore creation check will happen at "kdumpctl (re)start/status", and will report the creation success/fail status to users. A "success" status indicates previously there has been a vmcore successfully generated based on the current env, so it is more likely a vmcore will be generated later when real crash happens; A "fail" status indicates previously there was no vmcore generated, or has been a vmcore creation failed based on current env. User should check the 2nd kernel log or the kexec-dmesg.log for the failing reason. $VMCORE_CREATION_STATUS is used for recording the vmcore creation status of the current env. The format will be like: success 1718682002 Which means, there has been a vmcore generated successfully at this timestamp for the current env. Usage ===== [root@localhost ~]# kdumpctl restart kdump: kexec: unloaded kdump kernel kdump: Stopping kdump: [OK] kdump: kexec: loaded kdump kernel kdump: Starting kdump: [OK] kdump: Notice: No vmcore creation test performed! [root@localhost ~]# kdumpctl test [root@localhost ~]# kdumpctl status kdump: Kdump is operational kdump: Notice: Last successful vmcore creation on Tue Jun 18 16:39:10 CST 2024 [root@localhost ~]# kdumpctl restart kdump: kexec: unloaded kdump kernel kdump: Stopping kdump: [OK] kdump: kexec: loaded kdump kernel kdump: Starting kdump: [OK] kdump: Notice: Last successful vmcore creation on Tue Jun 18 16:39:10 CST 2024 The notification for kdumpctl (re)start/status can be disabled by setting VMCORE_CREATION_NOTIFICATION in /etc/sysconfig/kdump Signed-off-by: Tao Liu <ltao@redhat.com> Signed-off-by: Tao Liu <ltao@redhat.com>	2024-10-08 18:23:12 +13:00
Pingfan Liu	d9904e1794	ppc64le: replace kernel cmdline maxcpu=1 with nr_cpus=1 Resolves: https://issues.redhat.com/browse/RHEL-43581 Upstream: Fedora Conflict: Applied by manual commit 44a1b7da908a52c15a2b7ed286b59cfe7319b4c9 Author: Sourabh Jain <sourabhjain@linux.ibm.com> Date: Wed Feb 28 22:51:15 2024 +0530 ppc64le: replace kernel cmdline maxcpu=1 with nr_cpus=1 With patch series [1], PowerPC supports nr_cpus=1, so use nr_cpus=1 instead of maxcpu=1 in the kdump environment. Note this changes is dependent on kernel changes [1] [1] https://lore.kernel.org/all/170800202447.601034.7290612623478478380.b4-ty@ellerman.id.au/#t Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com> Cc: Hari Bathini <hbathini@linux.ibm.com> Cc: Mahesh Salgaonkar <mahesh@linux.ibm.com> Acked-by: Pingfan Liu <piliu@redhat.com> Signed-off-by: Pingfan Liu <piliu@redhat.com>	2024-06-25 15:43:20 +08:00
Pingfan Liu	2b2b6b84c0	Revert "ppc64: tackle SRCU hang issue" Resolves: bz2177574 Upstream: RHEL-only This reverts commit `870ec2ec93`. Now the real fix has gone into the RHEL-9 kernel [1], the temporary workaround can be removed. [1]: https://bugzilla.redhat.com/show_bug.cgi?id=2129726 Signed-off-by: Pingfan Liu <piliu@redhat.com>	2023-03-21 07:50:06 +00:00
Pingfan Liu	870ec2ec93	ppc64: tackle SRCU hang issue Resolves: bz2158296 Upstream: RHEL-only On PowerPC platform, the following hang is witnessed: Welcome to Red Hat Enterprise Linux 9.2 Beta (Plow) dracut-057-13.git20220816.el9 (Initramfs) ! [ 1.631210] systemd[1]: Hostname set to <ibm-p9z-18-lp11.virt.pnr.lab.eng.rdu2.redhat.com>. [-- MARK -- Mon Sep 26 01:45:00 2022] [ 243.681283] INFO: task systemd:1 blocked for more than 122 seconds. [ 243.681303] Not tainted 5.14.0-167.el9.ppc64le #1 [ 243.681315] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 243.681329] task:systemd state:D stack: 0 pid: 1 ppid: 0 flags:0x00042000 [ 243.681349] Call Trace: [ 243.681356] [c00000001a603640] [c00000004f990100] 0xc00000004f990100 (unreliable) [ 243.681378] [c00000001a603830] [c00000001001e9cc] __switch_to+0x12c/0x220 [ 243.681400] [c00000001a603890] [c000000010ec5b40] __schedule+0x230/0x720 [ 243.681418] [c00000001a603950] [c000000010ec6090] schedule+0x60/0x110 [ 243.681435] [c00000001a603980] [c000000010ecd948] schedule_timeout+0x168/0x1c0 [ 243.681454] [c00000001a603a60] [c000000010ec7214] __wait_for_common+0x134/0x360 [ 243.681473] [c00000001a603b00] [c00000001017c98c] __flush_work.isra.0+0x1dc/0x3d0 [ 243.681493] [c00000001a603ba0] [c0000000105cbd88] fsnotify_wait_marks_destroyed+0x28/0x40 [ 243.681512] [c00000001a603bc0] [c0000000105cb800] fsnotify_destroy_group+0x60/0x150 [ 243.681531] [c00000001a603c30] [c0000000105cf640] inotify_release+0x30/0xa0 [ 243.681548] [c00000001a603ca0] [c00000001054fad8] __fput+0xc8/0x350 [ 243.681565] [c00000001a603cf0] [c000000010183174] task_work_run+0xe4/0x160 [ 243.681583] [c00000001a603d40] [c000000010021874] do_notify_resume+0x134/0x140 [ 243.681602] [c00000001a603d70] [c000000010030168] interrupt_exit_user_prepare_main+0x198/0x270 [ 243.681622] [c00000001a603de0] [c0000000100305ac] syscall_exit_prepare+0x6c/0x180 [ 243.681641] [c00000001a603e10] [c00000001000bff4] system_call_vectored_common+0xf4/0x278 [ 243.681661] --- interrupt: 3000 at 0x7fffb3015ba4 [ 243.681673] NIP: 00007fffb3015ba4 LR: 0000000000000000 CTR: 0000000000000000 [ 243.681687] REGS: c00000001a603e80 TRAP: 3000 Not tainted (5.14.0-167.el9.ppc64le) [ 243.681703] MSR: 800000000000d033 <SF,EE,PR,ME,IR,DR,RI,LE> CR: 42044440 XER: 00000000 [ 243.681737] IRQMASK: 0 [ 243.681737] GPR00: 0000000000000006 00007fffd24a31a0 00007fffb3127200 0000000000000000 [ 243.681737] GPR04: 0000000000000002 000000000000000a 0000000000000000 0000000000000000 [ 243.681737] GPR08: 0000010009ea2d40 0000000000000000 0000000000000000 0000000000000000 [ 243.681737] GPR12: 0000000000000000 00007fffb3834bc0 0000000000000000 0000000000000000 [ 243.681737] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [ 243.681737] GPR20: 000000012c74ddf0 000000000000000e 000000000017cd3f 0000000000000000 [ 243.681737] GPR24: 00007fffd24a3570 0000000000000005 0000010009eb5490 0000010009ea24e0 [ 243.681737] GPR28: 0000010009ea2900 0000010009eb4850 0000010009ea2d70 00007fffb382dd98 [ 243.681896] NIP [00007fffb3015ba4] 0x7fffb3015ba4 [ 243.681907] LR [0000000000000000] 0x0 [ 243.681917] --- interrupt: 3000 [ 243.681928] INFO: task kworker/u16:1:34 blocked for more than 122 seconds. [ 243.681941] Not tainted 5.14.0-167.el9.ppc64le #1 [ 243.681951] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 243.681964] task:kworker/u16:1 state:D stack: 0 pid: 34 ppid: 2 flags:0x00000800 [ 243.681982] Workqueue: events_unbound fsnotify_mark_destroy_workfn [ 243.681998] Call Trace: [ 243.682005] [c00000001a9336d0] [c00000004f990100] 0xc00000004f990100 (unreliable) [ 243.682023] [c00000001a9338c0] [c00000001001e9cc] __switch_to+0x12c/0x220 [ 243.682042] [c00000001a933920] [c000000010ec5b40] __schedule+0x230/0x720 [ 243.682059] [c00000001a9339e0] [c000000010ec6090] schedule+0x60/0x110 [ 243.682075] [c00000001a933a10] [c000000010ecd948] schedule_timeout+0x168/0x1c0 [ 243.682094] [c00000001a933af0] [c000000010ec7214] __wait_for_common+0x134/0x360 [ 243.682113] [c00000001a933b90] [c000000010213370] __synchronize_srcu.part.0+0xa0/0xe0 [ 243.682132] [c00000001a933c00] [c0000000105cc154] fsnotify_mark_destroy_workfn+0xc4/0x1a0 [ 243.682151] [c00000001a933c70] [c00000001017acb8] process_one_work+0x298/0x580 [ 243.682169] [c00000001a933d10] [c00000001017b048] worker_thread+0xa8/0x630 [ 243.682185] [c00000001a933da0] [c000000010188348] kthread+0x1b8/0x1c0 [ 243.682203] [c00000001a933e10] [c00000001000cd64] ret_from_kernel_thread+0x5c/0x64 [ 366.561279] INFO: task systemd:1 blocked for more than 245 seconds. The right solution should be in kernel, but since the patch [1] for SRCU will not be merged into the mainline in near future, it had better to have a userspace workaround to overcome this test blocker. The workaround method is to pass the kernel parameter "srcutree.big_cpu_lim=0", so that the SRCU system will always use srcu_node array. [1]: https://lore.kernel.org/rcu/20221026032716.78674-1-kernelfans@gmail.com/T/#m6534975507c2abca497a94d81c7abbfea1d0978d Signed-off-by: Pingfan Liu <piliu@redhat.com>	2023-01-06 11:26:03 +08:00
Lichen Liu	fcca486525	kdump.sysconfig: add ignition.firstboot to KDUMP_COMMANDLINE_REMOVE Resolves: bz2090533 Upstream: Fedora Conflict: None commit `218d9917c0` Author: Dusty Mabe <dusty@dustymabe.com> Date: Mon May 16 14:04:12 2022 -0400 kdump.sysconfig: add ignition.firstboot to KDUMP_COMMANDLINE_REMOVE For CoreOS based systems we use Ignition for provisioning machines in the initramfs on first boot. We trigger Ignition right now by the presence of `ignition.firstboot` in the kernel command line. The kernel argument is only present on first boot so after a reboot it no longer is in the kernel command line. If a kernel crash happens before the first reboot of a machine we want the `ignition.firstboot` kernel argument to be removed and not passed on to the crash kernel. Signed-off-by: Lichen Liu <lichliu@redhat.com>	2022-05-27 10:08:59 +08:00
Pingfan Liu	888c24c90b	kdump.sysconfig: make kexec_file_load as default option on ppc64le Resolves: bz1881876 Upstream: Fedora Conflict: None commit a239a939237ced11c35d52d722a7eecb84091de6 Author: Pingfan Liu <piliu@redhat.com> Date: Thu Oct 21 10:13:10 2021 +0800 sysconfig: make kexec_file_load as default option on ppc64le Signed-off-by: Pingfan Liu <piliu@redhat.com> Signed-off-by: Pingfan Liu <piliu@redhat.com>	2021-11-12 09:47:42 +08:00
Tao Liu	2b7d3aa34d	Disable CMA in kdump 2nd kernel Resolves: bz1950885 Upstream: fedora Conflict: none commit `d5fe96cd7a` Author: Tao Liu <ltao@redhat.com> Date: Tue Apr 27 17:58:40 2021 +0800 Disable CMA in kdump 2nd kernel kexec-tools needs to disable CMA for kdump kernel cmdline, otherwise kdump kernel may run out of memory. This patch strips the inherited cma=, hugetlb_cma= cmd line from 1st kernel, and sets to be 0 for 2nd kernel. Signed-off-by: Tao Liu <ltao@redhat.com> Acked-by: Kairui Song <kasong@redhat.com> Signed-off-by: Tao Liu <ltao@redhat.com>	2021-05-14 14:27:03 +08:00
DistroBaker	17a51515f0	Merged update from upstream sources This is an automated DistroBaker update from upstream sources. If you do not know what this is about or would like to opt out, contact the OSCI team. Source: https://src.fedoraproject.org/rpms/kexec-tools.git#4f492cf73ea11ff74f5b062e18fcea45cb5e7eeb	2020-11-20 12:35:49 +00:00
DistroBaker	5cac7c3f96	Merged update from upstream sources This is an automated DistroBaker update from upstream sources. If you do not know what this is about or would like to opt out, contact the OSCI team. Source: https://src.fedoraproject.org/rpms/kexec-tools.git#bfd06661e81465d077bac435c90b4082134adf19	2020-11-05 05:34:29 +00:00
Petr Šabata	f5bf4978d8	RHEL 9.0.0 Alpha bootstrap The content of this branch was automatically imported from Fedora ELN with the following as its source: https://src.fedoraproject.org/rpms/kexec-tools#041ba89902961b5490a7143d9596dc00d732cba0	2020-10-15 14:45:57 +02:00

13 Commits