According to udev's man page, PROGRAM is either used to determine
device's name or whether the device matches the rule. So we should
use RUN insteand. Meanwhile, both RUN / PROGRAM only accepts very
short-running foreground tasks, but kdump restart may take a long
time if there are any device changes that will lead to image rebuild,
which may lead to buggy behavior.
On the other hand, memory / CPU hot plug should never trigger a
initramfs rebuild.
To solve this problem, we will use new introduced "kdumpctl reload"
instead, and use systemd-run to create a transient service unit for
the reload and run it in no-block mode, so udev won't be blocked by
anything.
We need to make systemd-run execute in non-blocking mode, and do not
synchronously wait for the operation to finish, because udev expect
the command line in RUN to be finished immediately, however, kdumpctl
reload may take 0.5-1s for an ordinary reload, or even slower on some
machines. So we give systemd-run an explicit --no-block option to run
in non-blocking mode. Without --no-blocking, systemd-run will verify,
enqueue and wait for the operation to finish. By using the --no-block
option, systemd-run will only verify and enqueue the unit then
return. In this way, we make sure the command is executed
asynchronously, and the status will be monitored and logged by
systemd, which is reliable and non-blocking.
Another thing to mention is that --no-block is only needed after
systemd-v220, before v220 systemd-run uses non-blocking mode by
default and --no-block option is not available on earlier systemd
versions.
Also reformat the udev rules to a more maintanceable format.
Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Dave Young <dyoung@redhat.com>
Add reload support to kdumpctl, reload will simply unload current
loaded kexec crash kernel and initramfs, and load it again.
Changes in /etc/sysconfig/kdump will take effect with kdumpctl
reload, but reloading will not check the content of
/etc/kdump.conf and won't rebuild anything. reload is fast, the only
time-consuming part of kdumpctl reload is loading kernel and initramfs
with kexec which is always necessary.
Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Dave Young <dyoung@redhat.com>
In dracut-049, a new squash module is introduced, it can reduce the
memory usage of kdump initramfs in the capture kernel, this helps a lot
on lowering the risk of OOM failure.
Tested with latest rawhide with NFS, SSH and local dump.
Signed-off-by: Kairui Song <kasong@redhat.com>
Currently the kdumpctl script doesn't check if the raw device is
formatted which might destroy existing data at the time of dump
capture.
This patch addresses this issue, by ensuring kdumpctl prints
a warning in case it finds the raw device to be formatted.
Signed-off-by: Kenneth D'souza <kdsouza@redhat.com>
Acked-by: Kairui Song <kasong@redhat.com>
Currently the script does not check if the dump target is read-only and would
always mount to read-write mode. This caused an issue with nfs mount as the
fstab options would be reconsidered while remounting to read-write mode.
The remount would fail with the below error as all options cannot be changed
runtime.
mount.nfs: mount(2): Invalid argument
mount.nfs: an incorrect mount option was specified
Which in result would not save the vmcore on the dump target.
This patch addresses this issue by checking the dump target status for read-only.
If yes, remount to read-write mode without reconsidering the fstab options.
Signed-off-by: Kenneth D'souza <kdsouza@redhat.com>
Acked-by: Kairui Song <kasong@redhat.com>
Resolves: bz1619122
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1619122
This patch fixes the "Unhandled rela relocation: R_X86_64_PLT32" error
that we are seeing with Fedora 29 (and newer kernels > 4.18) which
trying to run kexec/kdump on x86_64 machines.
The patch is being discussed upstream and has been ACK'ed by Baoquan and
myself (see <https://www.spinics.net/lists/kexec/msg21255.html>) and I
have also tested the same on Fedora 29/rawhide x86_64 machine as well:
Before the patch:
----------------
[root@hp-bl480c-01 ~]# kdumpctl restart
kexec: unloaded kdump kernel
Stopping kdump: [OK]
Unhandled rela relocation: R_X86_64_PLT32
kexec: failed to load kdump kernel
Starting kdump: [FAILED]
After the patch:
---------------
[root@hp-bl480c-01 ~]# kdumpctl restart
kexec: unloaded kdump kernel
Stopping kdump: [OK]
kexec: loaded kdump kernel
Starting kdump: [OK]
Suggested Upstream Fix:
In response to a change in binutils, commit b21ebf2fb4c
(x86: Treat R_X86_64_PLT32 as R_X86_64_PC32) was applied to
the linux kernel during the 4.16 development cycle and has
since been backported to earlier stable kernel series. The
change results in the failure message in $SUBJECT when
rebooting via kexec.
Fix this by replicating the change in kexec.
Signed-off-by: Chris Clayton <chris2553@googlemail.com>
Signed-off-by: Bhupesh Sharma <bhsharma@redhat.com>
Acked-by: Kairui Song <kasong@redhat.com>
Currently the kdumpctl script doesn't check if the path option is
set more than once due to which a vmcore is not captured.
This patch addresses this issue by ensuring that only one path
is specified in /etc/kdump.conf file.
Signed-off-by: Kenneth D'souza <kdsouza@redhat.com>
Acked-by: Kairui Song <kasong@redhat.com>
If nofail or nobootwait option is used, systemd's local-fs.target won't
wait for the mounting to complete, and kdump might start before the
required mount point is ready and then fail.
The host might use nofail for reasons like the device may get unpluged,
and if the device is not mounted and it is set as kdump target as the same
time then kdump service won't start, we will never enter the capture
kernel. By the time we have entered the capture kernel, the target device
must exist and ready to use, or else kdump would fail anyway. So force
remove nofail and nobootwait option.
Also drop rootflags=nofail option, as we don't depend on rootfs anymore
if the dump target don't required it. So the nofail option is no longer
needed.
Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Dave Young <dyoung@redhat.com>
We test if to include the drm module or not by testing if there are any
drm entry in sysfs. But there is an exception for hyper-v, DRM module
take care of hyperv's framebuffer driver as well but hyperv_fb will
not create any drm entry. So currently we got black screen on
hyperv guest.
Fix by detect hyperv's special entry as well.
Signed-off-by: Kairui Song <kasong@redhat.com>
Kdump anaconda has been included as a subpackage for a long time, which
is not a good practice, as the anaconda plugin should be built as
noarch and it does not belong to kexec-tools. We have created a new
package 'kdump-anaconda-addon', so remove it here.
The release version should be bumped later so that kdump-anaconda-addon
could mark previous versions as obsoleted.
Signed-off-by: Kairui Song <kasong@redhat.com>
armv7hl build failed because no makedumpfile* built but the latest commit
tries to install them.
Exclude armv7hl in the code chunk.
Signed-off-by: Dave Young <dyoung@redhat.com>
kexec_test seems to be no longer used upstream, so we had introduced
the 'kexec-tools-2.0.3-disable-kexec-test.patch' earlier to disable the
same from fedora kexec-tools as well.
However an earlier patch "Remove obsolete kdump tool" now explicitly
installs needed files via appropriate logic in .spec file, so we can
drop this patch now to reduce the maintenance burden.
Signed-off-by: Bhupesh Sharma <bhsharma@redhat.com>
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1441677
Testing: On x86_64 Fedora machine. After this patch kdump utility and related
man page cannot be found on this machine:
[root@tyan-gt24-09 ~]# which kdump
/usr/bin/which: no kdump in
(/usr/lib64/qt-3.3/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin)
[root@tyan-gt24-09 ~]# man kdump
No manual entry for kdump
Update the fedora 'kexec-tools.spec' to not install the obsolete
kdump tool.
I have submitted an upstream patch to obsolete the kdump tool from
upstream kexec-tools (which has been accepted), but after an internal
discussion we decided not to backport the upstream 'kexec-tools' patch
(which does the same) for fedora, as we would prefer to manage the
changes directly in the .spec file itself.
Signed-off-by: Bhupesh Sharma <bhsharma@redhat.com>
This commit basically reverts commit c755499fad,
and make use of new introduced tri-state hostonly mode.
Following dracut commits merged multipath-hostonly into multipath
module, and introduced a tri-state hostonly mode.
commit 35e86ac117acbfd699f371f163cdda9db0ebc047
Author: Kairui Song <kasong@redhat.com>
Date: Thu Jul 5 16:20:04 2018 +0800
Merge 90-multipath-hostonly and 90-multipath
commit a695250ec7db21359689e50733c6581a8d211215
Author: Kairui Song <kasong@redhat.com>
Date: Wed Jul 4 17:21:37 2018 +0800
Introduce tri-state hostonly mode
multipath-hostonly module was introduced only for kdump, because kdump
need a more strict hostonly policy for multipath device to save memory.
Now multipath module will provide the behave we wanted by setting
hostonly mode to strict.
Currently, we only rebuilt kdump initramfs on config file change,
fs change, or watchdog related change. This will not cover the case
that hardware changed but fs layout and other configurations still
stays the same, and kdump may fail.
To cover such case, we can detect and compare loaded kernel modules,
if a hardware change requires the image to be rebuilt, loaded kernel
modules must have changed.
Starting from commit 7047294 dracut will record loaded kernel modules
when the image is built if hostonly mode is enabled. With this patch,
kdumpctl will compare the recorded value with currently loaded kernel
modules, and rebuild the image on change.
"kdumpctl start" will be a bit slower, as we have to call lsinitrd one
more time to get the loaded kernel modules list. I measure the time
consumption and we have an overall 0.2s increased loading time.
Time consumption of command "kdumpctl restart":
Before:
real 0m0.587s
user 0m0.481s
sys 0m0.102s
After:
real 0m0.731s
user 0m0.591s
sys 0m0.133s
Time comsumption of command "kdumpctl restart" with image rebuild:
Before (force rebuild):
real 0m10.972s
user 0m8.966s
sys 0m1.318s
After (inserted ~100 new modules):
real 0m11.220s
user 0m9.387s
sys 0m1.337s
Signed-off-by: Kairui Song <kasong@redhat.com>
Kdump always use _proto=dhcp for both ipv4 and ipv6. But for ipv6
the dhcp address assignment is not like ipv4, there are different ways
for it, stateless and stateful, see below document:
https://fedoraproject.org/wiki/IPv6Guide
In case stateless, kernel can do the address assignment, dracut use
_proto=auto6; for stateful case, dracut use _proto=dhcp6.
But it is hard to decide whether stateless or stateful takes effect,
hence, dracut introduces ip=either6 option, which can try both of these
method automatically for us. For detail, refer to dracut:
commit 67354ee 40network: introduce ip=either6 option
We do not see bug reports before because for the most auto6 cases
kernel assign ip address before dhclient, kdump just happened to work.
Signed-off-by: Pingfan Liu <piliu@redhat.com>
When using fence_kdump, module-setup will create a kdump.conf with
fence_kdump_nodes. The node name comes from the cluster xml, which may
use the hostname alias. Later in kdump stage, "fence_kdump_send alias_1
alias_2" sends out notification to peers. Hence it requires /etc/hosts
and nsswitch.conf to make alias work.
Signed-off-by: Pingfan Liu <piliu@redhat.com>
Kdump service starts too late, so early crashes will have no chance
to get kdump kernel booting, this will cause crash information to be
lost. It is necessary to add a dracut module in order to load crash
kernel and initramfs as early as possible. You can provide "rd.early
kdump" in grub commandline to enable, then the early kdump will load
those files like the normal kdump, which is disabled by default.
For the normal kdump service, it can check whether the early kdump
has loaded the crash kernel and initramfs. It has no conflict with
the early kdump.
If you rebuild the new initramfs for early kdump, the new initramfs
size will become large, because it will put the vmlinuz and kdump
initramfs into the new initramfs.
In addition, early kdump doesn't support fadump.
Signed-off-by: Lianbo Jiang <lijiang@redhat.com>
Reviewed-by: Kazuhito Hagio <khagio@redhat.com>
Acked-by: Dave Young <dyoung@redhat.com>
we move some common functions from kdumpctl to kdump-lib.sh, the
functions could be used in other modules, such as early kdump.
It has no bad effect.
Signed-off-by: Lianbo Jiang <lijiang@redhat.com>
Reviewed-by: Kazuhito Hagio <khagio@redhat.com>
Acked-by: Dave Young <dyoung@redhat.com>
port from rhel, original patch is contributed by Minfei Huang:
Using /sys to determines crashkernel actual size is confusing since
there is no unit of measure.
Add a new command "kdumpctl showmem" to show the reserved memory kindly.
Signed-off-by: Pingfan Liu <piliu@redhat.com>
Signed-off-by: Minfei Huang <mhuang@redhat.com>
Acked-by: Bhupesh Sharma <bhsharma@redhat.com>
Acked-by: Dave Young <dyoung@redhat.com>
This reverts commit 2f4149f276.
It is not proved to be right to get auto6 or dhcpv6 in 1st kernel,
pingfan is working on a dracut fix to do some fallback in 2nd kernel initramfs.
So revert this commit
For ppc64(le), the default behavior of kexec always copy "root=" param,
but if dump target is ssh, there will no tools installed in kdump
rd, which help to mount root. As a result, kdump service will fail to
start. So explicitly disable the default behavior with --dt_no_old_root
option.
Signed-off-by: Pingfan Liu <piliu@redhat.com>
Acked-by: Dave Young <dyoung@redhat.com>
When core_collector is changed, the kdump initramfs needs to
be rebuilt before it is loaded.
Signed-off-by: Lianbo Jiang <lijiang@redhat.com>
Acked-by: Dave Young <dyoung@redhat.com>
Kdump always use _proto=dhcp for both ipv4 and ipv6. But for ipv6
the dhcp address assignment is not like ipv4, there are different ways
for it, stateless and stateful, see below document:
https://fedoraproject.org/wiki/IPv6Guide
In case stateless, kernel can do the address assignment, dracut use
_proto=auto6; for stateful case, dracut use _proto=dhcp6.
We do not see bug reports before because for the most auto6 cases
kernel assign ip address before dhclient, kdump just happened to work.
Here we use auto6 if possible first. And we take the assumption that
host use auto6 if /proc/sys/net/ipv6/conf/$netdev/autoconf is enabled
Signed-off-by: Pingfan Liu <piliu@redhat.com>
When using "dracut_args --mount" to specify dump target, e.g. nfs like:
path /
core_collector makedumpfile -d 31
dracut_args --mount "host:/path /var/crash nfs defaults"
kdump service should neither guarantees the correctness, nor relabels it.
For current code, since dracut_args dump targets are likely not mounted
so kdump service mistakenly relabel the rootfs, which is meanless and
takes very long time.
Signed-off-by: Pingfan Liu <piliu@redhat.com>
This reverts commit 2040103bd7.
Reason is it's based on the environment of 1st kernel where all
present devices could be active and initialized during bootup.
Then all pci devices will request irqs. While kdump only brings
up those devices which are necessary for vmcore dumping. So this
commit is not meaningful and helpless to very large extent. And
it will print out 'Warning' when calculated result is larger than
1 cpu, actually it's a false positive report most of the time.
So revert the commit, and can check the git history for later
reference.
[dyoung]: on some machine this warning message shows up but
later we found the irq numbers with and without nr_cpus=1 is
quite different so this need more investigation since
the formula is not accurate.
Signed-off-by: Baoquan He <bhe@redhat.com>
Acked-by: Dave Young <dyoung@redhat.com>