Upstream: fedora
Resolves: RHEL-70214
Conflict: Yes, the conflict is the same as the original c9s commit
c5aa4609 ("Introduce vmcore creation notification to kdump")
9ec61f6c ("Return the correct exit code of rebuild initrd")
Also this patch cherry-picked the ipv6 fixed in [1].
[1]: https://github.com/rhkdump/kdump-utils/pull/60/files
commit 24e76222c740def1d03a506652400fe55959e024
Author: Tao Liu <ltao@redhat.com>
Date: Fri Nov 29 16:15:18 2024 +1300
Re-introduce vmcore creation notification to kdump
Motivation
==========
People may forget to recheck to ensure kdump works, which as a result, a
possibility of no vmcores generated after a real system crash. It is
unexpected for kdump.
It is highly recommended people to test kdump after any system modification,
such as:
a. after kernel patching or whole yum update, as it might break something
on which kdump is dependent, maybe due to introduction of any new bug etc.
b. after any change at hardware level, maybe storage, networking,
firmware upgrading etc.
c. after implementing any new application, like which involves 3rd party modules
etc.
Though these exceed the range of kdump, however a simple vmcore creation
status notification is good to have for now.
Design
======
Kdump currently will check any relating files/fs/drivers modified before
determine if initrd should rebuild when (re)start. A rebuild is an
indicator of such modification, and kdump need to be tested. This will
clear the vmcore creation status specified in $VMCORE_CREATION_STATUS,
and as a result, a notification of vmcore creation test will be
outputted.
To test kdump, there is an entry for doing that by "kdumpctl test". It
will generate a timestamp string as the ID of the current test, along
with a "pending" status in $VMCORE_CREATION_STATUS, then a real crash &
dump process will be triggered.
After system reboot back to normal, a vmcore creation check will start at
"kdumpctl (re)start/status", and will report the results as
success/fail/manual status to users.
To achieve that, program will first check the status in $VMCORE_CREATION_STATUS.
If "pending" status if found, which means the test result is
undetermined and need a retrive from remote/local dump folder. Then if test
id is found in the dump folder and vmcore is complete, then "pending"
would be overwritten by "success", which indicates a successful kdump
test. If test id is found in the dump folder but vmcore is incomplete,
then it is a "fail" kdump test. If no test id is found, then it is a "manual"
status, which indicates users should check the test results manually.
If $VMCORE_CREATION_STATUS is already success/fail/manual status, it indicates
the test result has already been determined, so the program will not access
the remote/local dump folder again. This can limite any unnecessary
access to dump target, shorten the time consumption.
User should check for the root cause of fail/manual status when get
reports.
$VMCORE_CREATION_STATUS is used for recording the vmcore creation status of
the current env. The format is like:
<status> kdump_test_id=<timestamp sec>-<timestamp nanosec>
e.g:
success kdump_test_id=1729823462-938751820
Which means, there has been a successful kdump test at
$(date -d "@1729823462") timestamp for the current env. Timestamp
nanosec is only meaningful for uniquify id string.
Difference
==========
Previously there is one commit 88525ebf ("Introduce vmcore creation
notification to kdump") merged and addressing the same issue, but
implemented differently:
The prev one:
Save the $VMCORE_CREATION_STATUS to local drive during the 2nd kernel
dumping. If vmcore dumping target is different from $VMCORE_CREATION_STATUS's
drive, then the latter one need to be mounted in 2nd kernel.
This one:
Save the $VMCORE_CREATION_STATUS to local drive only in 1nd kernel, that
is, the test result is retrived after 2nd kernel dumping. So it doesn't
load or mount other drive in 2nd kernel.
The advantage:
Extra mounting in 2nd kernel will introduce higher risk of failure,
as a result, lower the success of vmcore dumping, which is
unaccepted. So keep the code for 2nd kernel as simple is preferred.
Usage
=====
[root@localhost ~]# kdumpctl restart
kdump: kexec: unloaded kdump kernel
kdump: Stopping kdump: [OK]
kdump: kexec: loaded kdump kernel
kdump: Starting kdump: [OK]
kdump: Notice: No vmcore creation test performed!
[root@localhost ~]# kdumpctl status
kdump: Kdump is operational
kdump: Notice: No vmcore creation test performed!
[root@localhost ~]# kdumpctl test
[root@localhost ~]# cat /var/lib/kdump/vmcore-creation.status
pending kdump_test_id=1729823462-938751820
[root@localhost ~]# kdumpctl status
kdump: Kdump is operational
kdump: Notice: Last successful vmcore creation on Fri Oct 25 02:31:02 AM UTC 2024
[root@localhost ~]# cat /var/lib/kdump/vmcore-creation.status
success kdump_test_id=1729823462-938751820
[root@localhost ~]# kdumpctl restart
kdump: kexec: unloaded kdump kernel
kdump: Stopping kdump: [OK]
kdump: kexec: loaded kdump kernel
kdump: Starting kdump: [OK]
kdump: Notice: Last successful vmcore creation on Fri Oct 25 02:31:02 AM UTC 2024
Note: the notification for kdumpctl (re)start/status can be disabled by
setting VMCORE_CREATION_NOTIFICATION in /etc/sysconfig/kdump. And fadump
is NOT supported for this feature.
Signed-off-by: Tao Liu <ltao@redhat.com>
Signed-off-by: Tao Liu <ltao@redhat.com>
Resolves: RHEL-70214
Upstream: fedora
Conflict: Yes, the conflict is the same as the original c9s commit
c5aa4609 ("Introduce vmcore creation notification to kdump")
9ec61f6c ("Return the correct exit code of rebuild initrd")
commit 96956928a66d9256cdf8bfed6a8963ddea35aac9
Author: Tao Liu <ltao@redhat.com>
Date: Fri Nov 29 14:42:01 2024 +1300
Revert "Introduce vmcore creation notification to kdump"
This patch will revert the following 2 patches:
88525ebf ("Introduce vmcore creation notification to kdump")
35449537 ("Return the correct exit code of rebuild initrd")
For the preparation of reimplementation of vmcore creation notification.
Signed-off-by: Tao Liu <ltao@redhat.com>
Signed-off-by: Tao Liu <ltao@redhat.com>
Upstream: fedora
Resolves: RHEL-32060
Conflict: Yes, there are several conflicts. 1) Upstream have moved
dracut-kdump.sh into kdump-utils/dracut/99kdumpbase/kdump.sh,
so the targeting files are changed. 2) There are several
patchsets([1] [2]) which not backported to rhel9, so some
formating conflicts encountered. But there is no functional
change been made for the patch backporting.
[1]: https://github.com/rhkdump/kdump-utils/pull/18/commits
[2]: https://github.com/rhkdump/kdump-utils/pull/33/commits
commit 88525ebf5e43cc86aea66dc75ec83db58233883b
Author: Tao Liu <ltao@redhat.com>
Date: Thu Sep 5 15:49:07 2024 +1200
Introduce vmcore creation notification to kdump
Motivation
==========
People may forget to recheck to ensure kdump works, which as a result, a
possibility of no vmcores generated after a real system crash. It is
unexpected for kdump.
It is highly recommended people to recheck kdump after any system
modification, such as:
a. after kernel patching or whole yum update, as it might break something
on which kdump is dependent, maybe due to introduction of any new bug etc.
b. after any change at hardware level, maybe storage, networking,
firmware upgrading etc.
c. after implementing any new application, like which involves 3rd party modules
etc.
Though these exceed the range of kdump, however a simple vmcore creation
status notification is good to have for now.
Design
======
Kdump currently will check any relating files/fs/drivers modified before
determine if initrd should rebuild when (re)start. A rebuild is an
indicator of such modification, and kdump need to be rechecked. This will
clear the vmcore creation status specified in $VMCORE_CREATION_STATUS.
Vmcore creation check will happen at "kdumpctl (re)start/status", and will
report the creation success/fail status to users. A "success" status indicates
previously there has been a vmcore successfully generated based on the current
env, so it is more likely a vmcore will be generated later when real crash
happens; A "fail" status indicates previously there was no vmcore
generated, or has been a vmcore creation failed based on current env. User
should check the 2nd kernel log or the kexec-dmesg.log for the failing reason.
$VMCORE_CREATION_STATUS is used for recording the vmcore creation status of
the current env. The format will be like:
success 1718682002
Which means, there has been a vmcore generated successfully at this
timestamp for the current env.
Usage
=====
[root@localhost ~]# kdumpctl restart
kdump: kexec: unloaded kdump kernel
kdump: Stopping kdump: [OK]
kdump: kexec: loaded kdump kernel
kdump: Starting kdump: [OK]
kdump: Notice: No vmcore creation test performed!
[root@localhost ~]# kdumpctl test
[root@localhost ~]# kdumpctl status
kdump: Kdump is operational
kdump: Notice: Last successful vmcore creation on Tue Jun 18 16:39:10 CST 2024
[root@localhost ~]# kdumpctl restart
kdump: kexec: unloaded kdump kernel
kdump: Stopping kdump: [OK]
kdump: kexec: loaded kdump kernel
kdump: Starting kdump: [OK]
kdump: Notice: Last successful vmcore creation on Tue Jun 18 16:39:10 CST 2024
The notification for kdumpctl (re)start/status can be disabled by
setting VMCORE_CREATION_NOTIFICATION in /etc/sysconfig/kdump
Signed-off-by: Tao Liu <ltao@redhat.com>
Signed-off-by: Tao Liu <ltao@redhat.com>
Related: bz2151504
Upstream: Fedora
Conflict: None
commit 12d9eff9dc
Author: Coiby Xu <coxu@redhat.com>
Date: Tue Mar 28 16:33:34 2023 +0800
Show how much time kdump has waited for the network to be ready
Relates: https://bugzilla.redhat.com/show_bug.cgi?id=2151504
Currently, when the network isn't ready, kdump would repeatedly print
the same info,
[ 29.537230] kdump[671]: Bad kdump network destination: 192.123.1.21
[ 30.559418] kdump[679]: Bad kdump network destination: 192.123.1.21
[ 31.580189] kdump[687]: Bad kdump network destination: 192.123.1.21
This is not user-friendly and users may think kdump has got stuck. So
also show much time has waited for the network to be ready,
[ 29.546258] kdump[673]: Waiting for network to be ready (50s / 10min)
...
[ 32.608967] kdump[697]: Waiting for network to be ready (56s / 10min)
Note kdump_get_ip_route no longer prints an error message and it's up to
the caller to determine the log level and print relevant messages. And
kdump_collect_netif_usage aborts when kdump_get_ip_route fails.
Reported-by: Martin Pitt <mpitt@redhat.com>
Signed-off-by: Coiby Xu <coxu@redhat.com>
Reviewed-by: Philipp Rudo <prudo@redhat.com>
Signed-off-by: Coiby Xu <coxu@redhat.com>
Resolves: bz2076416
Upstream: Fedora
Conflict: None
commit 9792994f2f
Author: Coiby Xu <coxu@redhat.com>
Date: Thu Sep 22 22:31:47 2022 +0800
Wait for the network to be truly ready before dumping vmcore
nm-wait-online-initrd.service installed by dracut's 35-networkmanager
module calls nm-online with "-s" which means it returns immediately when
NetworkManager logs "startup complete". Thus it doesn't truly wait for
network connectivity to be established [1]. Wait for the network to be
truly ready before dumping vmcore. There are two benefits brought by
this approach,
- ssh/nfs dumping won't fail because of that the network is not
ready e.g. [2][3]
- users don't need to use workarounds like rd.net.carrier.timeout to
make sure the network is ready
[1] https://bugzilla.redhat.com/show_bug.cgi?id=1485712
[2] https://bugzilla.redhat.com/show_bug.cgi?id=1909014
[3] https://bugzilla.redhat.com/show_bug.cgi?id=2035451
Signed-off-by: Coiby Xu <coxu@redhat.com>
Reviewed-by: Thomas Haller <thaller@redhat.com>
Reviewed-by: Philipp Rudo <prudo@redhat.com>
Signed-off-by: Coiby Xu <coxu@redhat.com>
Resolves: bz2076416
Upstream: Fedora
Conflict: None
commit 568623e69a
Author: Coiby Xu <coxu@redhat.com>
Date: Thu Sep 23 14:25:01 2021 +0800
Address the cases where a NIC has a different name in kdump kernel
A NIC may get a different name in the kdump kernel from 1st kernel
in cases like,
- kernel assigned network interface names are not persistent e.g. [1]
- there is an udev rule to rename the NIC in the 1st kernel but the
kdump initrd may not have that rule e.g. [2]
If NM tries to match a NIC with a connection profile based on NIC name
i.e. connection.interface-name, it will fail the above bases. A simple
solution is to ask NM to match a connection profile by MAC address.
Note we don't need to do this for user-created NICs like vlan, bridge and
bond.
An remaining issue is passing the name of a NIC via the kdumpnic dracut
command line parameter which requires passing ifname=<interface>:<MAC> to
have fixed NIC name. But we can simply drop this requirement. kdumpnic
is needed because kdump needs to get the IP by NIC name and use the IP
to created a dumping folder named "{IP}-{DATE}". We can simply pass the
IP to the kdump kernel directly via a new dracut command line parameter
kdumpip instead. In addition to the benefit of simplifying the code,
there are other three benefits brought by this approach,
- make use of whatever network to transfer the vmcore. Because as long
as we have the network to we don't care which NIC is active.
- if obtained IP in the kdump kernel is different from the one in the
1st kernel. "{IP}-{DATE}" would better tell where the dumped vmcore
comes from.
- without passing ifname=<interface>:<MAC> to kdump initrd, the
issue of there are two interfaces with the same MAC address for
Azure Hyper-V NIC SR-IOV [3] is resolved automatically.
[1] https://bugzilla.redhat.com/show_bug.cgi?id=1121778
[2] https://bugzilla.redhat.com/show_bug.cgi?id=810107
[3] https://bugzilla.redhat.com/show_bug.cgi?id=1962421
Signed-off-by: Coiby Xu <coxu@redhat.com>
Reviewed-by: Thomas Haller <thaller@redhat.com>
Reviewed-by: Philipp Rudo <prudo@redhat.com>
Signed-off-by: Coiby Xu <coxu@redhat.com>
related: bz2083475
upstream: fedora
conflict: none
commit bea6143178
Author: Tao Liu <ltao@redhat.com>
Date: Sat Oct 8 14:53:21 2022 +0800
Fix the sync issue for dump_fs
Previously the sync for dump_fs is problematic, it always
return success according to man 2 sync. So it cannot detect
the error of the dump target is full and not all of vmcore
data been written back the disk, which will leave the vmcore
imcomplete and report misleading log as "saving vmcore
complete".
In this patch, we will use "sync -f vmcore" instead, which
will return error if syncfs on the dump target fails. In
this way, vmcore sync related failures, such as autoextend
of lvm2 thinpool fails, can be detected and handled properly.
Signed-off-by: Tao Liu <ltao@redhat.com>
Reviewed-by: Coiby Xu <coxu@redhat.com>
Signed-off-by: Tao Liu <ltao@redhat.com>
upstream: fedora
resolves: bz2085347
conflict: yes, small conflict due to patch
"kdumpctl: drop DUMP_TARGET variable" not
backported to rhel9.
commit c743881ae6
Author: Tao Liu <ltao@redhat.com>
Date: Fri Sep 23 18:13:11 2022 +0800
virtiofs support for kexec-tools
This patch add virtiofs support for kexec-tools by introducing a new option
for /etc/kdump.conf:
virtiofs myfs
Where myfs is a variable tag name specified in qemu cmdline
"-device vhost-user-fs-pci,tag=myfs".
The patch covers the following cases:
1) Dumping VM's vmcore to a virtiofs shared directory;
2) When the VM's rootfs is a virtiofs shared directory and dumping the
VM's vmcore to its subdirectory, such as /var/crash;
3) The combination of case 1 & 2: The VM's rootfs is a virtiofs shared
directory and dumping the VM's vmcore to another virtiofs shared
directory.
Case 2 & 3 need dracut >= 057, otherwise VM cannot boot from virtiofs
shared rootfs. But it is not the issue of kexec-tools.
Reviewed-by: Philipp Rudo <prudo@redhat.com>
Signed-off-by: Tao Liu <ltao@redhat.com>
Signed-off-by: Tao Liu <ltao@redhat.com>
upstream: fedora
resolves: bz2003832
conflict: none
commit ee337c6f49
Author: Kairui Song <kasong@redhat.com>
Date: Mon Sep 13 03:38:14 2021 +0800
Add header comment for POSIX compliant scripts
To make things cleaner and more human readable, add a short comment for
the POSIX scripts.
Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Philipp Rudo <prudo@redhat.com>
Signed-off-by: Tao Liu <ltao@redhat.com>
upstream: fedora
resolves: bz2003832
conflict: none
commit 7c76611abb
Author: Kairui Song <kasong@redhat.com>
Date: Wed Sep 15 23:10:07 2021 +0800
dracut-kdump.sh: reformat with shfmt
This is done with `shfmt -w -s dracut-kdump.sh`. There is no behaviour
change.
Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Philipp Rudo <prudo@redhat.com>
Signed-off-by: Tao Liu <ltao@redhat.com>
upstream: fedora
resolves: bz2003832
conflict: none
commit b1339c3b8a
Author: Kairui Song <kasong@redhat.com>
Date: Wed Aug 18 21:06:52 2021 +0800
dracut-kdump.sh: make it POSIX compatible
POSIX doesn't support keyword `local`, so this commit reduced variable usage.
Heredoc ("<<<") operation is also not supported, so kdump.conf is now pre-parse
into a temp file. Also fixes many POSIX syntax errors.
Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Philipp Rudo <prudo@redhat.com>
Signed-off-by: Tao Liu <ltao@redhat.com>
upstream: fedora
resolves: bz2003832
conflict: none
commit 725027b735
Author: Kairui Song <kasong@redhat.com>
Date: Thu Aug 12 02:55:32 2021 +0800
dracut-kdump.sh: POSIX doesn't support pipefail
Set pipefail will cause POSIX shell to exit with failure. So only do
that in bash.
Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Philipp Rudo <prudo@redhat.com>
Signed-off-by: Tao Liu <ltao@redhat.com>
upstream: fedora
resolves: bz2003832
conflict: none
commit b1c794a2cf
Author: Kairui Song <kasong@redhat.com>
Date: Tue Sep 14 03:00:48 2021 +0800
dracut-kdump.sh: Use stat instead of ls to get vmcore size
ls output is fragile, so use stat instead.
Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Philipp Rudo <prudo@redhat.com>
Signed-off-by: Tao Liu <ltao@redhat.com>
upstream: fedora
resolves: bz2003832
conflict: none
commit 7a9823b42e
Author: Kairui Song <kasong@redhat.com>
Date: Tue Aug 3 13:23:26 2021 +0800
dracut-kdump.sh: simplify dump_ssh
There is a workaround for `scp` that it expects IPv6 address to be
quoted with [ ... ], only apply the workaround once and store the
updated `scp` address to reuse it.
Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Philipp Rudo <prudo@redhat.com>
Signed-off-by: Tao Liu <ltao@redhat.com>
upstream: fedora
resolves: bz2003832
conflict: none
commit 8f89e89071
Author: Kairui Song <kasong@redhat.com>
Date: Mon Aug 2 01:25:17 2021 +0800
dracut-kdump.sh: remove add_dump_code
`add_dump_code "<op>"` is just `DUMP_INSTRUCTION="<op>"`, no need a
extra wrapper for that.
Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Philipp Rudo <prudo@redhat.com>
Signed-off-by: Tao Liu <ltao@redhat.com>
upstream: fedora
resolves: bz2003832
conflict: none
commit 0675edbadb
Author: Kairui Song <kasong@redhat.com>
Date: Mon Aug 2 01:19:44 2021 +0800
dracut-kdump.sh: don't put KDUMP_SCRIPT_DIR in PATH
monitor_dd_progress is the only extra binary in KDUMP_SCRIPT_DIR, no
need to change PATH environment variable, just call it directly.
Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Philipp Rudo <prudo@redhat.com>
Signed-off-by: Tao Liu <ltao@redhat.com>
upstream: fedora
resolves: bz2003832
conflict: none
commit a1205effaa
Author: Kairui Song <kasong@redhat.com>
Date: Thu Aug 5 00:59:29 2021 +0800
kdump-lib-initramfs.sh: move dump related functions to kdump.sh
These dump related functions are only used by dracut-kdump.sh.
Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Philipp Rudo <prudo@redhat.com>
Signed-off-by: Tao Liu <ltao@redhat.com>
upstream: fedora
resolves: bz2003832
conflict: none
commit e7118d1de8
Author: Kairui Song <kasong@redhat.com>
Date: Mon Aug 2 00:50:22 2021 +0800
Merge kdump-error-handler.sh into kdump.sh
kdump-error-handler.sh does nothing except calling three functions,
it can be easily merged into kdump.sh by using a parameter to run the
error handling routine.
kdump-lib-initramfs.sh was created to hold the three shared functions
and related code, so by merging these two files, kdump-lib-initramfs.sh
can be simplified by a lot.
Following up commits will clean up kdump-lib-initramfs.sh.
Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Philipp Rudo <prudo@redhat.com>
Signed-off-by: Tao Liu <ltao@redhat.com>
upstream: fedora
resolves: bz2003832
conflict: none
commit a0282ab22c
Author: Kairui Song <kasong@redhat.com>
Date: Tue Aug 3 19:49:51 2021 +0800
kdump-lib.sh: add a config format and read helper
Add a helper `kdump_read_conf` to replace read_strip_comments.
`kdump_read_conf` does a few more things:
- remove trailing spaces.
- format the content, remove duplicated spaces between name and value.
- read from KDUMP_CONFIG_FILE (/etc/kdump.conf) directly, avoid pasting
"/etc/kdump.conf" path everywhere in the code.
- check if config file exists, just in case.
Also unify the environmental variable, now KDUMP_CONFIG_FILE stands for
the default config location.
This helps avoid some shell pitfalls about spaces when reading config.
Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Philipp Rudo <prudo@redhat.com>
Signed-off-by: Tao Liu <ltao@redhat.com>
Related: rhbz#1938165
Upstream: fedora
Conflict: none
commit 00785873ef
Author: Tao Liu <ltao@redhat.com>
Date: Fri Mar 19 18:07:51 2021 +0800
Fix incorrect vmcore permissions when dumped through ssh
Previously when dumping vmcore to a remote machine through ssh,
the files are created remotely and file permissions are taken
from the default umask value, which making the files accessible to
anyone on the remote machine.
This patch fixed the security issue by setting a customized umask value
before the file creation on the remote machine.
Signed-off-by: Tao Liu <ltao@redhat.com>
Acked-by: Kairui Song <kasong@redhat.com>
Signed-off-by: Tao Liu <ltao@redhat.com>