Commit Graph

10 Commits

Author SHA1 Message Date
Tao Liu
c5aa460992 Introduce vmcore creation notification to kdump
Upstream: fedora
Resolves: RHEL-32060
Conflict: Yes, there are several conflicts. 1) Upstream have moved
          dracut-kdump.sh into kdump-utils/dracut/99kdumpbase/kdump.sh,
          so the targeting files are changed. 2) There are several
          patchsets([1] [2]) which not backported to rhel9, so some
          formating conflicts encountered. But there is no functional
          change been made for the patch backporting.

[1]: https://github.com/rhkdump/kdump-utils/pull/18/commits
[2]: https://github.com/rhkdump/kdump-utils/pull/33/commits

commit 88525ebf5e43cc86aea66dc75ec83db58233883b
Author: Tao Liu <ltao@redhat.com>
Date:   Thu Sep 5 15:49:07 2024 +1200

    Introduce vmcore creation notification to kdump

    Motivation
    ==========

    People may forget to recheck to ensure kdump works, which as a result, a
    possibility of no vmcores generated after a real system crash. It is
    unexpected for kdump.

    It is highly recommended people to recheck kdump after any system
    modification, such as:

    a. after kernel patching or whole yum update, as it might break something
       on which kdump is dependent, maybe due to introduction of any new bug etc.
    b. after any change at hardware level, maybe storage, networking,
       firmware upgrading etc.
    c. after implementing any new application, like which involves 3rd party modules
       etc.

    Though these exceed the range of kdump, however a simple vmcore creation
    status notification is good to have for now.

    Design
    ======

    Kdump currently will check any relating files/fs/drivers modified before
    determine if initrd should rebuild when (re)start. A rebuild is an
    indicator of such modification, and kdump need to be rechecked. This will
    clear the vmcore creation status specified in $VMCORE_CREATION_STATUS.

    Vmcore creation check will happen at "kdumpctl (re)start/status", and will
    report the creation success/fail status to users. A "success" status indicates
    previously there has been a vmcore successfully generated based on the current
    env, so it is more likely a vmcore will be generated later when real crash
    happens; A "fail" status indicates previously there was no vmcore
    generated, or has been a vmcore creation failed based on current env. User
    should check the 2nd kernel log or the kexec-dmesg.log for the failing reason.

    $VMCORE_CREATION_STATUS is used for recording the vmcore creation status of
    the current env. The format will be like:

       success 1718682002

    Which means, there has been a vmcore generated successfully at this
    timestamp for the current env.

    Usage
    =====

    [root@localhost ~]# kdumpctl restart
    kdump: kexec: unloaded kdump kernel
    kdump: Stopping kdump: [OK]
    kdump: kexec: loaded kdump kernel
    kdump: Starting kdump: [OK]
    kdump: Notice: No vmcore creation test performed!

    [root@localhost ~]# kdumpctl test

    [root@localhost ~]# kdumpctl status
    kdump: Kdump is operational
    kdump: Notice: Last successful vmcore creation on Tue Jun 18 16:39:10 CST 2024

    [root@localhost ~]# kdumpctl restart
    kdump: kexec: unloaded kdump kernel
    kdump: Stopping kdump: [OK]
    kdump: kexec: loaded kdump kernel
    kdump: Starting kdump: [OK]
    kdump: Notice: Last successful vmcore creation on Tue Jun 18 16:39:10 CST 2024

    The notification for kdumpctl (re)start/status can be disabled by
    setting VMCORE_CREATION_NOTIFICATION in /etc/sysconfig/kdump

    Signed-off-by: Tao Liu <ltao@redhat.com>

Signed-off-by: Tao Liu <ltao@redhat.com>
2024-10-08 18:23:12 +13:00
Pingfan Liu
d9904e1794 ppc64le: replace kernel cmdline maxcpu=1 with nr_cpus=1
Resolves: https://issues.redhat.com/browse/RHEL-43581
Upstream: Fedora
Conflict: Applied by manual

commit 44a1b7da908a52c15a2b7ed286b59cfe7319b4c9
Author: Sourabh Jain <sourabhjain@linux.ibm.com>
Date:   Wed Feb 28 22:51:15 2024 +0530

    ppc64le: replace kernel cmdline maxcpu=1 with nr_cpus=1

    With patch series [1], PowerPC supports nr_cpus=1,
    so use nr_cpus=1 instead of maxcpu=1 in the kdump environment.

    Note this changes is dependent on kernel changes [1]

    [1] https://lore.kernel.org/all/170800202447.601034.7290612623478478380.b4-ty@ellerman.id.au/#t

    Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com>
    Cc: Hari Bathini <hbathini@linux.ibm.com>
    Cc: Mahesh Salgaonkar <mahesh@linux.ibm.com>
    Acked-by: Pingfan Liu <piliu@redhat.com>

Signed-off-by: Pingfan Liu <piliu@redhat.com>
2024-06-25 15:43:20 +08:00
Pingfan Liu
2b2b6b84c0 Revert "ppc64: tackle SRCU hang issue"
Resolves: bz2177574
Upstream: RHEL-only

This reverts commit 870ec2ec93.

Now the real fix has gone into the RHEL-9 kernel [1], the temporary
workaround can be removed.

[1]: https://bugzilla.redhat.com/show_bug.cgi?id=2129726

Signed-off-by: Pingfan Liu <piliu@redhat.com>
2023-03-21 07:50:06 +00:00
Pingfan Liu
870ec2ec93 ppc64: tackle SRCU hang issue
Resolves: bz2158296
Upstream: RHEL-only

On PowerPC platform, the following hang is witnessed:

Welcome to
Red Hat Enterprise Linux 9.2 Beta (Plow) dracut-057-13.git20220816.el9 (Initramfs)
!

[    1.631210] systemd[1]: Hostname set to <ibm-p9z-18-lp11.virt.pnr.lab.eng.rdu2.redhat.com>.
[-- MARK -- Mon Sep 26 01:45:00 2022]
[  243.681283] INFO: task systemd:1 blocked for more than 122 seconds.
[  243.681303]       Not tainted 5.14.0-167.el9.ppc64le #1
[  243.681315] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  243.681329] task:systemd         state:D stack:    0 pid:    1 ppid:     0 flags:0x00042000
[  243.681349] Call Trace:
[  243.681356] [c00000001a603640] [c00000004f990100] 0xc00000004f990100 (unreliable)
[  243.681378] [c00000001a603830] [c00000001001e9cc] __switch_to+0x12c/0x220
[  243.681400] [c00000001a603890] [c000000010ec5b40] __schedule+0x230/0x720
[  243.681418] [c00000001a603950] [c000000010ec6090] schedule+0x60/0x110
[  243.681435] [c00000001a603980] [c000000010ecd948] schedule_timeout+0x168/0x1c0
[  243.681454] [c00000001a603a60] [c000000010ec7214] __wait_for_common+0x134/0x360
[  243.681473] [c00000001a603b00] [c00000001017c98c] __flush_work.isra.0+0x1dc/0x3d0
[  243.681493] [c00000001a603ba0] [c0000000105cbd88] fsnotify_wait_marks_destroyed+0x28/0x40
[  243.681512] [c00000001a603bc0] [c0000000105cb800] fsnotify_destroy_group+0x60/0x150
[  243.681531] [c00000001a603c30] [c0000000105cf640] inotify_release+0x30/0xa0
[  243.681548] [c00000001a603ca0] [c00000001054fad8] __fput+0xc8/0x350
[  243.681565] [c00000001a603cf0] [c000000010183174] task_work_run+0xe4/0x160
[  243.681583] [c00000001a603d40] [c000000010021874] do_notify_resume+0x134/0x140
[  243.681602] [c00000001a603d70] [c000000010030168] interrupt_exit_user_prepare_main+0x198/0x270
[  243.681622] [c00000001a603de0] [c0000000100305ac] syscall_exit_prepare+0x6c/0x180
[  243.681641] [c00000001a603e10] [c00000001000bff4] system_call_vectored_common+0xf4/0x278
[  243.681661] --- interrupt: 3000 at 0x7fffb3015ba4
[  243.681673] NIP:  00007fffb3015ba4 LR: 0000000000000000 CTR: 0000000000000000
[  243.681687] REGS: c00000001a603e80 TRAP: 3000   Not tainted  (5.14.0-167.el9.ppc64le)
[  243.681703] MSR:  800000000000d033 <SF,EE,PR,ME,IR,DR,RI,LE>  CR: 42044440  XER: 00000000
[  243.681737] IRQMASK: 0
[  243.681737] GPR00: 0000000000000006 00007fffd24a31a0 00007fffb3127200 0000000000000000
[  243.681737] GPR04: 0000000000000002 000000000000000a 0000000000000000 0000000000000000
[  243.681737] GPR08: 0000010009ea2d40 0000000000000000 0000000000000000 0000000000000000
[  243.681737] GPR12: 0000000000000000 00007fffb3834bc0 0000000000000000 0000000000000000
[  243.681737] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[  243.681737] GPR20: 000000012c74ddf0 000000000000000e 000000000017cd3f 0000000000000000
[  243.681737] GPR24: 00007fffd24a3570 0000000000000005 0000010009eb5490 0000010009ea24e0
[  243.681737] GPR28: 0000010009ea2900 0000010009eb4850 0000010009ea2d70 00007fffb382dd98
[  243.681896] NIP [00007fffb3015ba4] 0x7fffb3015ba4
[  243.681907] LR [0000000000000000] 0x0
[  243.681917] --- interrupt: 3000
[  243.681928] INFO: task kworker/u16:1:34 blocked for more than 122 seconds.
[  243.681941]       Not tainted 5.14.0-167.el9.ppc64le #1
[  243.681951] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  243.681964] task:kworker/u16:1   state:D stack:    0 pid:   34 ppid:     2 flags:0x00000800
[  243.681982] Workqueue: events_unbound fsnotify_mark_destroy_workfn
[  243.681998] Call Trace:
[  243.682005] [c00000001a9336d0] [c00000004f990100] 0xc00000004f990100 (unreliable)
[  243.682023] [c00000001a9338c0] [c00000001001e9cc] __switch_to+0x12c/0x220
[  243.682042] [c00000001a933920] [c000000010ec5b40] __schedule+0x230/0x720
[  243.682059] [c00000001a9339e0] [c000000010ec6090] schedule+0x60/0x110
[  243.682075] [c00000001a933a10] [c000000010ecd948] schedule_timeout+0x168/0x1c0
[  243.682094] [c00000001a933af0] [c000000010ec7214] __wait_for_common+0x134/0x360
[  243.682113] [c00000001a933b90] [c000000010213370] __synchronize_srcu.part.0+0xa0/0xe0
[  243.682132] [c00000001a933c00] [c0000000105cc154] fsnotify_mark_destroy_workfn+0xc4/0x1a0
[  243.682151] [c00000001a933c70] [c00000001017acb8] process_one_work+0x298/0x580
[  243.682169] [c00000001a933d10] [c00000001017b048] worker_thread+0xa8/0x630
[  243.682185] [c00000001a933da0] [c000000010188348] kthread+0x1b8/0x1c0
[  243.682203] [c00000001a933e10] [c00000001000cd64] ret_from_kernel_thread+0x5c/0x64
[  366.561279] INFO: task systemd:1 blocked for more than 245 seconds.

The right solution should be in kernel, but since the patch [1] for SRCU
will not be merged into the mainline in near future, it had better to
have a userspace workaround to overcome this test blocker.

The workaround method is to pass the kernel parameter "srcutree.big_cpu_lim=0", so
that the SRCU system will always use srcu_node array.

[1]: https://lore.kernel.org/rcu/20221026032716.78674-1-kernelfans@gmail.com/T/#m6534975507c2abca497a94d81c7abbfea1d0978d

Signed-off-by: Pingfan Liu <piliu@redhat.com>
2023-01-06 11:26:03 +08:00
Lichen Liu
fcca486525 kdump.sysconfig*: add ignition.firstboot to KDUMP_COMMANDLINE_REMOVE
Resolves: bz2090533
Upstream: Fedora
Conflict: None

commit 218d9917c0
Author: Dusty Mabe <dusty@dustymabe.com>
Date:   Mon May 16 14:04:12 2022 -0400

    kdump.sysconfig*: add ignition.firstboot to KDUMP_COMMANDLINE_REMOVE

    For CoreOS based systems we use Ignition for provisioning machines
    in the initramfs on first boot. We trigger Ignition right now by
    the presence of `ignition.firstboot` in the kernel command line. The
    kernel argument is only present on first boot so after a reboot it
    no longer is in the kernel command line.

    If a kernel crash happens before the first reboot of a machine we
    want the `ignition.firstboot` kernel argument to be removed and not
    passed on to the crash kernel.

Signed-off-by: Lichen Liu <lichliu@redhat.com>
2022-05-27 10:08:59 +08:00
Pingfan Liu
888c24c90b kdump.sysconfig: make kexec_file_load as default option on ppc64le
Resolves: bz1881876
Upstream: Fedora
Conflict: None

commit a239a939237ced11c35d52d722a7eecb84091de6
Author: Pingfan Liu <piliu@redhat.com>
Date:   Thu Oct 21 10:13:10 2021 +0800

    sysconfig: make kexec_file_load as default option on ppc64le

    Signed-off-by: Pingfan Liu <piliu@redhat.com>

Signed-off-by: Pingfan Liu <piliu@redhat.com>
2021-11-12 09:47:42 +08:00
Tao Liu
2b7d3aa34d Disable CMA in kdump 2nd kernel
Resolves: bz1950885
Upstream: fedora
Conflict: none

commit d5fe96cd7a
Author: Tao Liu <ltao@redhat.com>
Date:   Tue Apr 27 17:58:40 2021 +0800

    Disable CMA in kdump 2nd kernel

    kexec-tools needs to disable CMA for kdump kernel cmdline,
    otherwise kdump kernel may run out of memory.

    This patch strips the inherited cma=, hugetlb_cma= cmd
    line from 1st kernel, and sets to be 0 for 2nd kernel.

    Signed-off-by: Tao Liu <ltao@redhat.com>
    Acked-by: Kairui Song <kasong@redhat.com>

Signed-off-by: Tao Liu <ltao@redhat.com>
2021-05-14 14:27:03 +08:00
DistroBaker
17a51515f0 Merged update from upstream sources
This is an automated DistroBaker update from upstream sources.
If you do not know what this is about or would like to opt out,
contact the OSCI team.

Source: https://src.fedoraproject.org/rpms/kexec-tools.git#4f492cf73ea11ff74f5b062e18fcea45cb5e7eeb
2020-11-20 12:35:49 +00:00
DistroBaker
5cac7c3f96 Merged update from upstream sources
This is an automated DistroBaker update from upstream sources.
If you do not know what this is about or would like to opt out,
contact the OSCI team.

Source: https://src.fedoraproject.org/rpms/kexec-tools.git#bfd06661e81465d077bac435c90b4082134adf19
2020-11-05 05:34:29 +00:00
Petr Šabata
f5bf4978d8 RHEL 9.0.0 Alpha bootstrap
The content of this branch was automatically imported from Fedora ELN
with the following as its source:
https://src.fedoraproject.org/rpms/kexec-tools#041ba89902961b5490a7143d9596dc00d732cba0
2020-10-15 14:45:57 +02:00