Commit Graph

13 Commits

Author SHA1 Message Date
Pingfan Liu
340188939b fadump: pass additional parameters for capture kernel
Resolves: https://issues.redhat.com/browse/RHEL-52925
Upstream: kdump-utils
Confilict: apply the changes in kdump.sysconfig.ppc64le by manual

commit 77b80ce5e369c7b5cf8321e4cdc20c139910f92c
Author: Hari Bathini <hbathini@linux.ibm.com>
Date:   Sat Oct 19 00:28:34 2024 +0530

    fadump: pass additional parameters for capture kernel

    Since kernel commit 3416c9daa6b13 ("powerpc/fadump: pass additional
    parameters when fadump is active"), fadump supports passing additional
    parameters to dump capture kernel. Leverage that support here to pass
    additional parameters to dump capture kernel.

    Also, update fadump-howto.txt to make clear on the options that are
    not relevant for fadump in /etc/sysconfig/kdump

    The default bootargs to append for fadump capture kernel boot are
    chosen with the intent to optimize resources and reduce memory
    footprint in dump capture environment.

    Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>

Signed-off-by: Pingfan Liu <piliu@redhat.com>
2024-12-09 21:45:00 +08:00
Tao Liu
fc66e25f7b Re-introduce vmcore creation notification to kdump
Upstream: fedora
Resolves: RHEL-70214
Conflict: Yes, the conflict is the same as the original c9s commit
	  c5aa4609 ("Introduce vmcore creation notification to kdump")
	  9ec61f6c ("Return the correct exit code of rebuild initrd")
          Also this patch cherry-picked the ipv6 fixed in [1].

[1]: https://github.com/rhkdump/kdump-utils/pull/60/files

commit 24e76222c740def1d03a506652400fe55959e024
Author: Tao Liu <ltao@redhat.com>
Date:   Fri Nov 29 16:15:18 2024 +1300

    Re-introduce vmcore creation notification to kdump

    Motivation
    ==========

    People may forget to recheck to ensure kdump works, which as a result, a
    possibility of no vmcores generated after a real system crash. It is
    unexpected for kdump.

    It is highly recommended people to test kdump after any system modification,
    such as:

    a. after kernel patching or whole yum update, as it might break something
       on which kdump is dependent, maybe due to introduction of any new bug etc.
    b. after any change at hardware level, maybe storage, networking,
       firmware upgrading etc.
    c. after implementing any new application, like which involves 3rd party modules
       etc.

    Though these exceed the range of kdump, however a simple vmcore creation
    status notification is good to have for now.

    Design
    ======

    Kdump currently will check any relating files/fs/drivers modified before
    determine if initrd should rebuild when (re)start. A rebuild is an
    indicator of such modification, and kdump need to be tested. This will
    clear the vmcore creation status specified in $VMCORE_CREATION_STATUS,
    and as a result, a notification of vmcore creation test will be
    outputted.

    To test kdump, there is an entry for doing that by "kdumpctl test". It
    will generate a timestamp string as the ID of the current test, along
    with a "pending" status in $VMCORE_CREATION_STATUS, then a real crash &
    dump process will be triggered.

    After system reboot back to normal, a vmcore creation check will start at
    "kdumpctl (re)start/status", and will report the results as
    success/fail/manual status to users.

    To achieve that, program will first check the status in $VMCORE_CREATION_STATUS.
    If "pending" status if found, which means the test result is
    undetermined and need a retrive from remote/local dump folder. Then if test
    id is found in the dump folder and vmcore is complete, then "pending"
    would be overwritten by "success", which indicates a successful kdump
    test. If test id is found in the dump folder but vmcore is incomplete,
    then it is a "fail" kdump test. If no test id is found, then it is a "manual"
    status, which indicates users should check the test results manually.

    If $VMCORE_CREATION_STATUS is already success/fail/manual status, it indicates
    the test result has already been determined, so the program will not access
    the remote/local dump folder again. This can limite any unnecessary
    access to dump target, shorten the time consumption.

    User should check for the root cause of fail/manual status when get
    reports.

    $VMCORE_CREATION_STATUS is used for recording the vmcore creation status of
    the current env. The format is like:

       <status> kdump_test_id=<timestamp sec>-<timestamp nanosec>
    e.g:
       success kdump_test_id=1729823462-938751820

    Which means, there has been a successful kdump test at
    $(date -d "@1729823462") timestamp for the current env. Timestamp
    nanosec is only meaningful for uniquify id string.

    Difference
    ==========
    Previously there is one commit 88525ebf ("Introduce vmcore creation
    notification to kdump") merged and addressing the same issue, but
    implemented differently:

    The prev one:
    Save the $VMCORE_CREATION_STATUS to local drive during the 2nd kernel
    dumping. If vmcore dumping target is different from $VMCORE_CREATION_STATUS's
    drive, then the latter one need to be mounted in 2nd kernel.

    This one:
    Save the $VMCORE_CREATION_STATUS to local drive only in 1nd kernel, that
    is, the test result is retrived after 2nd kernel dumping. So it doesn't
    load or mount other drive in 2nd kernel.

    The advantage:
    Extra mounting in 2nd kernel will introduce higher risk of failure,
    as a result, lower the success of vmcore dumping, which is
    unaccepted. So keep the code for 2nd kernel as simple is preferred.

    Usage
    =====
    [root@localhost ~]# kdumpctl restart
    kdump: kexec: unloaded kdump kernel
    kdump: Stopping kdump: [OK]
    kdump: kexec: loaded kdump kernel
    kdump: Starting kdump: [OK]
    kdump: Notice: No vmcore creation test performed!

    [root@localhost ~]# kdumpctl status
    kdump: Kdump is operational
    kdump: Notice: No vmcore creation test performed!

    [root@localhost ~]# kdumpctl test

    [root@localhost ~]# cat /var/lib/kdump/vmcore-creation.status
    pending kdump_test_id=1729823462-938751820

    [root@localhost ~]# kdumpctl status
    kdump: Kdump is operational
    kdump: Notice: Last successful vmcore creation on Fri Oct 25 02:31:02 AM UTC 2024

    [root@localhost ~]# cat /var/lib/kdump/vmcore-creation.status
    success kdump_test_id=1729823462-938751820

    [root@localhost ~]# kdumpctl restart
    kdump: kexec: unloaded kdump kernel
    kdump: Stopping kdump: [OK]
    kdump: kexec: loaded kdump kernel
    kdump: Starting kdump: [OK]
    kdump: Notice: Last successful vmcore creation on Fri Oct 25 02:31:02 AM UTC 2024

    Note: the notification for kdumpctl (re)start/status can be disabled by
    setting VMCORE_CREATION_NOTIFICATION in /etc/sysconfig/kdump. And fadump
    is NOT supported for this feature.

    Signed-off-by: Tao Liu <ltao@redhat.com>

Signed-off-by: Tao Liu <ltao@redhat.com>
2024-12-06 15:27:20 +13:00
Tao Liu
79aec45f8c Revert "Introduce vmcore creation notification to kdump"
Resolves: RHEL-70214
Upstream: fedora
Conflict: Yes, the conflict is the same as the original c9s commit
          c5aa4609 ("Introduce vmcore creation notification to kdump")
          9ec61f6c ("Return the correct exit code of rebuild initrd")

commit 96956928a66d9256cdf8bfed6a8963ddea35aac9
Author: Tao Liu <ltao@redhat.com>
Date:   Fri Nov 29 14:42:01 2024 +1300

    Revert "Introduce vmcore creation notification to kdump"

    This patch will revert the following 2 patches:

        88525ebf ("Introduce vmcore creation notification to kdump")
        35449537 ("Return the correct exit code of rebuild initrd")

    For the preparation of reimplementation of vmcore creation notification.

    Signed-off-by: Tao Liu <ltao@redhat.com>

Signed-off-by: Tao Liu <ltao@redhat.com>
2024-12-06 11:25:25 +13:00
Tao Liu
c5aa460992 Introduce vmcore creation notification to kdump
Upstream: fedora
Resolves: RHEL-32060
Conflict: Yes, there are several conflicts. 1) Upstream have moved
          dracut-kdump.sh into kdump-utils/dracut/99kdumpbase/kdump.sh,
          so the targeting files are changed. 2) There are several
          patchsets([1] [2]) which not backported to rhel9, so some
          formating conflicts encountered. But there is no functional
          change been made for the patch backporting.

[1]: https://github.com/rhkdump/kdump-utils/pull/18/commits
[2]: https://github.com/rhkdump/kdump-utils/pull/33/commits

commit 88525ebf5e43cc86aea66dc75ec83db58233883b
Author: Tao Liu <ltao@redhat.com>
Date:   Thu Sep 5 15:49:07 2024 +1200

    Introduce vmcore creation notification to kdump

    Motivation
    ==========

    People may forget to recheck to ensure kdump works, which as a result, a
    possibility of no vmcores generated after a real system crash. It is
    unexpected for kdump.

    It is highly recommended people to recheck kdump after any system
    modification, such as:

    a. after kernel patching or whole yum update, as it might break something
       on which kdump is dependent, maybe due to introduction of any new bug etc.
    b. after any change at hardware level, maybe storage, networking,
       firmware upgrading etc.
    c. after implementing any new application, like which involves 3rd party modules
       etc.

    Though these exceed the range of kdump, however a simple vmcore creation
    status notification is good to have for now.

    Design
    ======

    Kdump currently will check any relating files/fs/drivers modified before
    determine if initrd should rebuild when (re)start. A rebuild is an
    indicator of such modification, and kdump need to be rechecked. This will
    clear the vmcore creation status specified in $VMCORE_CREATION_STATUS.

    Vmcore creation check will happen at "kdumpctl (re)start/status", and will
    report the creation success/fail status to users. A "success" status indicates
    previously there has been a vmcore successfully generated based on the current
    env, so it is more likely a vmcore will be generated later when real crash
    happens; A "fail" status indicates previously there was no vmcore
    generated, or has been a vmcore creation failed based on current env. User
    should check the 2nd kernel log or the kexec-dmesg.log for the failing reason.

    $VMCORE_CREATION_STATUS is used for recording the vmcore creation status of
    the current env. The format will be like:

       success 1718682002

    Which means, there has been a vmcore generated successfully at this
    timestamp for the current env.

    Usage
    =====

    [root@localhost ~]# kdumpctl restart
    kdump: kexec: unloaded kdump kernel
    kdump: Stopping kdump: [OK]
    kdump: kexec: loaded kdump kernel
    kdump: Starting kdump: [OK]
    kdump: Notice: No vmcore creation test performed!

    [root@localhost ~]# kdumpctl test

    [root@localhost ~]# kdumpctl status
    kdump: Kdump is operational
    kdump: Notice: Last successful vmcore creation on Tue Jun 18 16:39:10 CST 2024

    [root@localhost ~]# kdumpctl restart
    kdump: kexec: unloaded kdump kernel
    kdump: Stopping kdump: [OK]
    kdump: kexec: loaded kdump kernel
    kdump: Starting kdump: [OK]
    kdump: Notice: Last successful vmcore creation on Tue Jun 18 16:39:10 CST 2024

    The notification for kdumpctl (re)start/status can be disabled by
    setting VMCORE_CREATION_NOTIFICATION in /etc/sysconfig/kdump

    Signed-off-by: Tao Liu <ltao@redhat.com>

Signed-off-by: Tao Liu <ltao@redhat.com>
2024-10-08 18:23:12 +13:00
Pingfan Liu
d9904e1794 ppc64le: replace kernel cmdline maxcpu=1 with nr_cpus=1
Resolves: https://issues.redhat.com/browse/RHEL-43581
Upstream: Fedora
Conflict: Applied by manual

commit 44a1b7da908a52c15a2b7ed286b59cfe7319b4c9
Author: Sourabh Jain <sourabhjain@linux.ibm.com>
Date:   Wed Feb 28 22:51:15 2024 +0530

    ppc64le: replace kernel cmdline maxcpu=1 with nr_cpus=1

    With patch series [1], PowerPC supports nr_cpus=1,
    so use nr_cpus=1 instead of maxcpu=1 in the kdump environment.

    Note this changes is dependent on kernel changes [1]

    [1] https://lore.kernel.org/all/170800202447.601034.7290612623478478380.b4-ty@ellerman.id.au/#t

    Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com>
    Cc: Hari Bathini <hbathini@linux.ibm.com>
    Cc: Mahesh Salgaonkar <mahesh@linux.ibm.com>
    Acked-by: Pingfan Liu <piliu@redhat.com>

Signed-off-by: Pingfan Liu <piliu@redhat.com>
2024-06-25 15:43:20 +08:00
Pingfan Liu
2b2b6b84c0 Revert "ppc64: tackle SRCU hang issue"
Resolves: bz2177574
Upstream: RHEL-only

This reverts commit 870ec2ec93.

Now the real fix has gone into the RHEL-9 kernel [1], the temporary
workaround can be removed.

[1]: https://bugzilla.redhat.com/show_bug.cgi?id=2129726

Signed-off-by: Pingfan Liu <piliu@redhat.com>
2023-03-21 07:50:06 +00:00
Pingfan Liu
870ec2ec93 ppc64: tackle SRCU hang issue
Resolves: bz2158296
Upstream: RHEL-only

On PowerPC platform, the following hang is witnessed:

Welcome to
Red Hat Enterprise Linux 9.2 Beta (Plow) dracut-057-13.git20220816.el9 (Initramfs)
!

[    1.631210] systemd[1]: Hostname set to <ibm-p9z-18-lp11.virt.pnr.lab.eng.rdu2.redhat.com>.
[-- MARK -- Mon Sep 26 01:45:00 2022]
[  243.681283] INFO: task systemd:1 blocked for more than 122 seconds.
[  243.681303]       Not tainted 5.14.0-167.el9.ppc64le #1
[  243.681315] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  243.681329] task:systemd         state:D stack:    0 pid:    1 ppid:     0 flags:0x00042000
[  243.681349] Call Trace:
[  243.681356] [c00000001a603640] [c00000004f990100] 0xc00000004f990100 (unreliable)
[  243.681378] [c00000001a603830] [c00000001001e9cc] __switch_to+0x12c/0x220
[  243.681400] [c00000001a603890] [c000000010ec5b40] __schedule+0x230/0x720
[  243.681418] [c00000001a603950] [c000000010ec6090] schedule+0x60/0x110
[  243.681435] [c00000001a603980] [c000000010ecd948] schedule_timeout+0x168/0x1c0
[  243.681454] [c00000001a603a60] [c000000010ec7214] __wait_for_common+0x134/0x360
[  243.681473] [c00000001a603b00] [c00000001017c98c] __flush_work.isra.0+0x1dc/0x3d0
[  243.681493] [c00000001a603ba0] [c0000000105cbd88] fsnotify_wait_marks_destroyed+0x28/0x40
[  243.681512] [c00000001a603bc0] [c0000000105cb800] fsnotify_destroy_group+0x60/0x150
[  243.681531] [c00000001a603c30] [c0000000105cf640] inotify_release+0x30/0xa0
[  243.681548] [c00000001a603ca0] [c00000001054fad8] __fput+0xc8/0x350
[  243.681565] [c00000001a603cf0] [c000000010183174] task_work_run+0xe4/0x160
[  243.681583] [c00000001a603d40] [c000000010021874] do_notify_resume+0x134/0x140
[  243.681602] [c00000001a603d70] [c000000010030168] interrupt_exit_user_prepare_main+0x198/0x270
[  243.681622] [c00000001a603de0] [c0000000100305ac] syscall_exit_prepare+0x6c/0x180
[  243.681641] [c00000001a603e10] [c00000001000bff4] system_call_vectored_common+0xf4/0x278
[  243.681661] --- interrupt: 3000 at 0x7fffb3015ba4
[  243.681673] NIP:  00007fffb3015ba4 LR: 0000000000000000 CTR: 0000000000000000
[  243.681687] REGS: c00000001a603e80 TRAP: 3000   Not tainted  (5.14.0-167.el9.ppc64le)
[  243.681703] MSR:  800000000000d033 <SF,EE,PR,ME,IR,DR,RI,LE>  CR: 42044440  XER: 00000000
[  243.681737] IRQMASK: 0
[  243.681737] GPR00: 0000000000000006 00007fffd24a31a0 00007fffb3127200 0000000000000000
[  243.681737] GPR04: 0000000000000002 000000000000000a 0000000000000000 0000000000000000
[  243.681737] GPR08: 0000010009ea2d40 0000000000000000 0000000000000000 0000000000000000
[  243.681737] GPR12: 0000000000000000 00007fffb3834bc0 0000000000000000 0000000000000000
[  243.681737] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[  243.681737] GPR20: 000000012c74ddf0 000000000000000e 000000000017cd3f 0000000000000000
[  243.681737] GPR24: 00007fffd24a3570 0000000000000005 0000010009eb5490 0000010009ea24e0
[  243.681737] GPR28: 0000010009ea2900 0000010009eb4850 0000010009ea2d70 00007fffb382dd98
[  243.681896] NIP [00007fffb3015ba4] 0x7fffb3015ba4
[  243.681907] LR [0000000000000000] 0x0
[  243.681917] --- interrupt: 3000
[  243.681928] INFO: task kworker/u16:1:34 blocked for more than 122 seconds.
[  243.681941]       Not tainted 5.14.0-167.el9.ppc64le #1
[  243.681951] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  243.681964] task:kworker/u16:1   state:D stack:    0 pid:   34 ppid:     2 flags:0x00000800
[  243.681982] Workqueue: events_unbound fsnotify_mark_destroy_workfn
[  243.681998] Call Trace:
[  243.682005] [c00000001a9336d0] [c00000004f990100] 0xc00000004f990100 (unreliable)
[  243.682023] [c00000001a9338c0] [c00000001001e9cc] __switch_to+0x12c/0x220
[  243.682042] [c00000001a933920] [c000000010ec5b40] __schedule+0x230/0x720
[  243.682059] [c00000001a9339e0] [c000000010ec6090] schedule+0x60/0x110
[  243.682075] [c00000001a933a10] [c000000010ecd948] schedule_timeout+0x168/0x1c0
[  243.682094] [c00000001a933af0] [c000000010ec7214] __wait_for_common+0x134/0x360
[  243.682113] [c00000001a933b90] [c000000010213370] __synchronize_srcu.part.0+0xa0/0xe0
[  243.682132] [c00000001a933c00] [c0000000105cc154] fsnotify_mark_destroy_workfn+0xc4/0x1a0
[  243.682151] [c00000001a933c70] [c00000001017acb8] process_one_work+0x298/0x580
[  243.682169] [c00000001a933d10] [c00000001017b048] worker_thread+0xa8/0x630
[  243.682185] [c00000001a933da0] [c000000010188348] kthread+0x1b8/0x1c0
[  243.682203] [c00000001a933e10] [c00000001000cd64] ret_from_kernel_thread+0x5c/0x64
[  366.561279] INFO: task systemd:1 blocked for more than 245 seconds.

The right solution should be in kernel, but since the patch [1] for SRCU
will not be merged into the mainline in near future, it had better to
have a userspace workaround to overcome this test blocker.

The workaround method is to pass the kernel parameter "srcutree.big_cpu_lim=0", so
that the SRCU system will always use srcu_node array.

[1]: https://lore.kernel.org/rcu/20221026032716.78674-1-kernelfans@gmail.com/T/#m6534975507c2abca497a94d81c7abbfea1d0978d

Signed-off-by: Pingfan Liu <piliu@redhat.com>
2023-01-06 11:26:03 +08:00
Lichen Liu
fcca486525 kdump.sysconfig*: add ignition.firstboot to KDUMP_COMMANDLINE_REMOVE
Resolves: bz2090533
Upstream: Fedora
Conflict: None

commit 218d9917c0
Author: Dusty Mabe <dusty@dustymabe.com>
Date:   Mon May 16 14:04:12 2022 -0400

    kdump.sysconfig*: add ignition.firstboot to KDUMP_COMMANDLINE_REMOVE

    For CoreOS based systems we use Ignition for provisioning machines
    in the initramfs on first boot. We trigger Ignition right now by
    the presence of `ignition.firstboot` in the kernel command line. The
    kernel argument is only present on first boot so after a reboot it
    no longer is in the kernel command line.

    If a kernel crash happens before the first reboot of a machine we
    want the `ignition.firstboot` kernel argument to be removed and not
    passed on to the crash kernel.

Signed-off-by: Lichen Liu <lichliu@redhat.com>
2022-05-27 10:08:59 +08:00
Pingfan Liu
888c24c90b kdump.sysconfig: make kexec_file_load as default option on ppc64le
Resolves: bz1881876
Upstream: Fedora
Conflict: None

commit a239a939237ced11c35d52d722a7eecb84091de6
Author: Pingfan Liu <piliu@redhat.com>
Date:   Thu Oct 21 10:13:10 2021 +0800

    sysconfig: make kexec_file_load as default option on ppc64le

    Signed-off-by: Pingfan Liu <piliu@redhat.com>

Signed-off-by: Pingfan Liu <piliu@redhat.com>
2021-11-12 09:47:42 +08:00
Tao Liu
2b7d3aa34d Disable CMA in kdump 2nd kernel
Resolves: bz1950885
Upstream: fedora
Conflict: none

commit d5fe96cd7a
Author: Tao Liu <ltao@redhat.com>
Date:   Tue Apr 27 17:58:40 2021 +0800

    Disable CMA in kdump 2nd kernel

    kexec-tools needs to disable CMA for kdump kernel cmdline,
    otherwise kdump kernel may run out of memory.

    This patch strips the inherited cma=, hugetlb_cma= cmd
    line from 1st kernel, and sets to be 0 for 2nd kernel.

    Signed-off-by: Tao Liu <ltao@redhat.com>
    Acked-by: Kairui Song <kasong@redhat.com>

Signed-off-by: Tao Liu <ltao@redhat.com>
2021-05-14 14:27:03 +08:00
DistroBaker
17a51515f0 Merged update from upstream sources
This is an automated DistroBaker update from upstream sources.
If you do not know what this is about or would like to opt out,
contact the OSCI team.

Source: https://src.fedoraproject.org/rpms/kexec-tools.git#4f492cf73ea11ff74f5b062e18fcea45cb5e7eeb
2020-11-20 12:35:49 +00:00
DistroBaker
5cac7c3f96 Merged update from upstream sources
This is an automated DistroBaker update from upstream sources.
If you do not know what this is about or would like to opt out,
contact the OSCI team.

Source: https://src.fedoraproject.org/rpms/kexec-tools.git#bfd06661e81465d077bac435c90b4082134adf19
2020-11-05 05:34:29 +00:00
Petr Šabata
f5bf4978d8 RHEL 9.0.0 Alpha bootstrap
The content of this branch was automatically imported from Fedora ELN
with the following as its source:
https://src.fedoraproject.org/rpms/kexec-tools#041ba89902961b5490a7143d9596dc00d732cba0
2020-10-15 14:45:57 +02:00