This patch introduce a new kdump-capture.service which is used to run
kdump.sh.
kdump-capture.service has OnFailure=emergency.target and
OnFailureIsolate=yes set. When kdump.sh fails, the kdump emergency
service will be triggered and enter the error handling path.
In 2nd kernel, the default target for systemd is initrd.target, so we
put kdump-capture.service in initrd.target.wants/ and by that, system
will start kdump-capture as part of the boot process.
kdump.sh used to run in dracut-pre-pivot hook. Now kdump-capture.service
is placed after dracut-pre-pivot.service and other dependencies are all
copied from dracut-pre-pivot.service. So the start point of
kdump.sh will be almost the same as it used to be.
Signed-off-by: WANG Chao <chaowang@redhat.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Acked-by: Dave Young <dyoung@redhat.com>
Now upon failure kdump script might not be called at all and it might
not be able to execute default action. It results in a hang.
Because we disable emergency shell and rely on kdump.sh being invoked
through dracut-pre-pivot hook. But it might happen that we never call
into dracut-pre-pivot hook because certain systemd targets could not
reach due to failure in their dependencies. In those cases error
handling code does not run and system hangs. For example:
sysroot-var-crash.mount --> initrd-root-fs.target --> initrd.target \
--> dracut-pre-pivot.service --> kdump.sh
If /sysroot/var/crash mount fails, initrd-root-fs.target will not be
reached. And then initrd.target will not be reached,
dracut-pre-pivot.service wouldn't run. Finally kdump.sh wouldn't run.
To solve this problem, we need to separate the error handling code from
dracut-pre-pivot hook, and every time when a failure shows up, the
separated code can be called by the emergency service.
By default systemd provides an emergency service which will drop us into
shell every time upon a critical failure. It's very convenient for us to
re-use the framework of systemd emergency, because we don't have to
touch the other parts of systemd. We can use our own script instead of
the default one.
This new scheme will overwrite emergency shell and replace with kdump
error handling code. And this code will do the error handling as needed.
Now, we will not rely on dracut-pre-pivot hook running always. Instead
whenever error happens and it is serious enough that emergency shell
needed to run, now kdump error handler will run.
dracut-emergency is also replaced by kdump error handler and it's
enabled again all the way down. So all the failure (including systemd
and dracut) in 2nd kernel could be captured, and trigger kdump error
handler.
dracut-initqueue is a special case, which calls "systemctl start
emergency" directly, not via "OnFailure=emergency". In case of failure,
emergency is started, but not in a isolation mode, which means
dracut-initqueue is still running. On the other hand, emergency will
call dracut-initqueue again when default action is dump_to_rootfs.
systemd would block on the last dracut-initqueue, waiting for the first
instance to exit, which leaves us hang. It looks like the following:
dracut-initqueue (running)
--> call dracut-emergency:
--> dracut-emergency (running)
--> kdump-error-handler.sh (running)
--> call dracut-initqueue:
--> blocking and waiting for the original instance to exit.
To fix this, I'd like to introduce a wrapper emergency service. This
emegency service will replace both the systemd and dracut emergency. And
this service does nothing but to isolate to real kdump error handler
service:
dracut-initqueue (running)
--> call dracut-emergency:
--> dracut-emergency isolate to kdump-error-handler.service
--> dracut-emergency and dracut-initqueue will both be stopped
and kdump-error-handler.service will run kdump-error-handler.sh.
In a normal failure case, this still works:
foo.service fails
--> trigger emergency.service
--> emergency.service isolates to kdump-error-handler.service
--> kdump-error-handler.service will run kdump-error-handler.sh
Signed-off-by: WANG Chao <chaowang@redhat.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Acked-by: Dave Young <dyoung@redhat.com>
Now when mount in /etc/fstab fails, systemd would not consider it as
critical and it would continue to boot. In fact, emergency service is
triggered, but not in a isolation mode, and it results in the emergency
service getting shutdown at some point later of the boot process. We
need isolation otherwise we won't see any emergency service.
That is because in kdump initramfs, mount units specified in /etc/fstab
are required by "local-fs.target". When any of these mounts fails,
local-fs.target fails.
For kdump initramfs, we need to isolate to emergency service on any of
the mount failure, that said, every service should be stopped and onlu
emergency service would run. But local-fs.target won't trigger that on
its failure. That means in case of mount failure, local-fs.target also
enters failure state, but all the service will continue without any
interruption.
After digging looking into source code of systemd-fstab-generator. I
find "x-initrd.mount" using in initramfs mount, will make the mount
units required by "initrd-root-fs.target" rather than it's used to be
"local-fs.target".
"initrd-root-fs.target" is suitable to us because if it fails, it will
isolate to emergency service. That means in case of any mount failure,
the emergeny service will start and everything else will stop. We want
this effect because we need to take kdump fail-safe action when there's
a mount failure.
From systemd unit point of view, "initrd-root-fs.target" has
OnFailureIsolate=yes, but "local-fs.target" doesn't. From
systemd.unit(5):
OnFailureIsolate=
Takes a boolean argument. If true, the unit listed in OnFailure=
will be enqueued in isolation mode, i.e. all units that are not its
dependency will be stopped. If this is set, only a single unit may
be listed in OnFailure=. Defaults to false.
NOTE: Harald who contributed "x-initrd.mount" in systemd, confirmed that
this feature will stay.
Signed-off-by: WANG Chao <chaowang@redhat.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
This patch does the following change in 2nd kernel:
- dump target is mounted under /sysroot
With this change, we don't need to track what we've mounted in 2nd
kernel. We can just umount recursively every mount in /sysroot by
command:
umount -R /sysroot
It's very convenient to do so, because it's hard to track what we've
mounted when we're in error handling path (later patches). So mount
everything under /sysroot is reasonable and practical for us.
Also clean up a bit along with this patch.
Signed-off-by: WANG Chao <chaowang@redhat.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Acked-by: Dave Young <dyoung@redhat.com>
Extract functions from kdump.sh, and construct kdump-lib-initramfs.sh as
kdump common functions/varaibles library.
kdump-lib-initramfs.sh will include kdump-lib.sh, because it will use
the functions from there. IOW, kdump-lib-initramfs.sh will be a superset
of kdump-lib.sh
So after this cleanup:
- scripts running in 1st kernel only have to include kdump-lib.sh
- scripts running in 2nd kernel only have to include kdump-lib-initramfs.sh
Signed-off-by: WANG Chao <chaowang@redhat.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Acked-by: Dave Young <dyoung@redhat.com>
Recently somebody reported an issue where vmcore-dmesg.txt was saved
successfully but later saving vmcore failed to due to lack of space on disk.
System rebooted but after reboot there was nothing on disk. Not even
vmcore-dmesg.txt.
Issue a sync after saving vmcore-dmesg.txt to solve this issue.
I think this is happening because we are doing "reboot -f" instead of going
through systemd reboot path. Anyway, doing a sync now should take care of
this.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Acked-by: WANG Chao <chaowang@redhat.com>
Resending this patch by updating 'fadump control flow' to reflect
the latest changes.
This patch adds fadump howto document to kexec-tools. The document
is prepared in reference to kexec-kdump-howto.txt document.
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
The script dracut-kdump.sh is responsible for capturing vmcore during
second kernel boot. Currently this script gets installed into kdump
initrd as part of kdumpbase dracut module.
With fadump support, 'dracut-kdump.sh' script also gets installed into
default initrd to capture vmcore generated by firmware assisted dump.
Thus in fadump case, the same initrd is going to be used for normal
boot as well as boot after system crash. Hence a check is required to
see if it is a normal boot or boot after crash.
A new node "ibm,kernel-dump" is added, to the device tree, by firmware
to notify kernel if it is booting after crash. The below patch adds a
check for this node before executing steps to capture vmcore. This
check will help bypassing the vmcore capture steps during normal boot
process.
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
The current kdump infrastructure builds a separate initrd which then
gets loaded into memory by kexec-tools for use by kdump kernel. But
firmware assisted dump (FADUMP) does not use kexec-based approach.
After crash, firmware reboots the partition and loads grub loader
like the normal booting process does. Hence in the FADUMP approach,
the second kernel (after crash) will always use the default initrd
(OS built). So, to support FADUMP, change is required, as in to add
dump capturing steps, in this initrd.
The current kdumpctl script implementation already has the code to
build initrd using mkdumprd. This patch uses the new '--rebuild'
option introduced, in dracut, to incrementally build the initramfs
image. Before rebuilding, we may need to probe the initrd image for
fadump support, to avoid rebuilding the initrd image multiple times
unnecessarily. This can be done using "lsinitrd" tool with the newly
proposed '--mod' option & inspecting the presence of "kdumpbase" in
the list of modules of default initrd image. We rebuild the image if
only "kdumpbase" module is missing in the initrd image. Also, before
rebuilding, a backup of default initrd image is taken.
Kexec-tools package in rhel7 is now enhanced to insert a out-of-tree
kdump module for dracut, which is responsible for adding vmcore
capture steps into initrd, if dracut is invoked with "IN_KDUMP"
environment variable set to 1. mkdumprd script exports "IN_KDUMP=1"
environment variable before invoking dracut to build kdump initrd.
This patch relies on this current mechanism of kdump init script.
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
During service kdump stop, if firmware assisted dump is enabled
and running, then stop firmware assisted dump by echo'ing 0 to
'/sys/kernel/fadump_registered' file.
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
During service kdump start, if firmware assisted dump is not enabled then
fallback to starting of existing kexec based kdump. If firmware assisted
is enabled but not running, then start firmware assisted dump by echo'ing
1 to '/sys/kernel/fadump_registered' file.
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
This patch enables kdump script to check if firmware-assisted dump is
enabled or not by reading value from '/sys/kernel/fadump_enabled'. The
determine_dump_mode() routine sets dump_mode to 'fadump', if fadump is
enabled. By default, dump_mode is set to 'kdump' mode.
Modify status routine to check if firmware assisted dump is registered
or not by reading value from '/sys/kernel/fadump_registered' file. If
it is set to '1' then return status=0 else return status=1.
0 <= Firmware assisted is enabled and running
1 <= Firmware assisted is enabled but not running
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
We met a problem that eth0 ends up being eth1 and eth1 being eth0
between 1st and 2nd kernel. Because we pass ifname=eth0:$mac to force
it's named eth0 and since "eth0"is already taken by the other NIC, udev
fails to bring up the NIC we want, thus kdump fails.
kernel assigned network interface names are not persistent. So if first
kernel is using kernel assigned interface names, then force it to use
"kdump-" prefixed names in second kernel.
For ethX, we put a prefix "kdump-" before it, so in 2nd kernel, ethX
will name to "kdump-ethX". So that we can avoid the naming conflict.
We only need to change the ethernet card name, that means, for bridge,
vlan, bond, team devices' names , we never prefix them. Because these
names are assigned when they're created by userspace.
Signed-off-by: WANG Chao <chaowang@redhat.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
We handle different types of device for vlan. For each type, it should
write different options for vlan.conf in each control path.
Signed-off-by: WANG Chao <chaowang@redhat.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
This cleanup patch removes unnecessary keyword "function" at all places in
kdumpctl script. Also, corrects couple of typos in the script.
Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Vivek suggested we should display message while waiting for the lock,
because the waiting could be long and user will have no idea what's
going on.
So we will repeat the following message every 5 seconds while waiting:
"Another app is currently holding the kdump lock; waiting for it to exit..."
Thanks Vivek for providing a more comprehensive message.
Signed-off-by: WANG Chao <chaowang@redhat.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
This is a backport of the following upstream commit.
commit 0b732828091a545185ad13d0b2e6800600788d61
Author: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp>
Date: Tue Jun 10 13:57:29 2014 +0900
[PATCH 3/3] Stop maximizing the bitmap buffer to reduce the risk of OOM.
We tried to maximize the bitmap buffer to get the best performance,
but the performance degradation caused by multi-cycle processing
looks very small according to the benchmark on 2TB memory:
https://lkml.org/lkml/2013/3/26/914
This result means we don't need to make an effort to maximize the
bitmap buffer, it will just increase the risk of OOM.
This patch sets a small fixed value (4MB) as a safety limit,
it may be safer and enough in most cases.
Signed-off-by: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp>
Signed-off-by: Baoquan He <bhe@redhat.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
This is a backport of the following upstream commit.
commit 2648a8f7caa63e3ec82fd4bce471cec0a895b704
Author: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp>
Date: Mon Jun 9 17:48:30 2014 +0900
[PATCH 2/3] Move counting pfn_memhole for cyclic mode.
In cyclic mode, memory holes are checked in initialize_2nd_bitmap_cyclic()
in both the kdump path and the ELF path, so pfn_memhole should be
counted there.
Signed-off-by: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp>
Signed-off-by: Baoquan He <bhe@redhat.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
This is a backport of the following upstream commit.
commit 16b94ab7fad6744d8b77f2b26838f220307e3118
Author: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp>
Date: Mon Jun 9 17:44:43 2014 +0900
[PATCH 1/3] Remove the 1st bitmap buffer from the ELF path in cyclic mode.
We can create the 2nd bitmap without creating the 1st bitmap by commit
363d53fc8, so we don't need to create the 1st bitmap in cyclic mode
in the ELF path since it isn't used. Thus, we can use the whole bitmap
buffer only for the 2nd bitmap like the kdump path.
Signed-off-by: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp>
Signed-off-by: Baoquan He <bhe@redhat.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
This is a backport of the following upstream commit. It is about freeing
the wrong bitmap thing, it could increase the risk of OOM when system is
in an edge of OOM.
commit 0e7b1a6e3c1919c9222b662d458637ddf802dd04
Author: Arthur Zou <zzou@redhat.com>
Date: Wed May 7 17:54:16 2014 +0900
[PATCH v3] Fix free bitmap_buffer_cyclic error.
Description:
In create_dump_bitmap() and write_kdump_pages_and_bitmap_cyclic(),
What should be freed is info->partial_bitmap instead of info->bitmap.
Solution:
Add two functions to free the bitmap_buffer_cyclic. info->partial_bitmap1
is freed by free_bitmap1_buffer_cyclic(). info->partial_bitmap2 is
freed by free_bitmap2_buffer_cyclic(). At the same time, remove
thoes frees that free partial_bitmap1 or partial_bitmap2 at the end
of main() because partial_bitmap1 and partial_bitmap2 has been freed
after dump file has been written out, so there is no need to free it
again at the end of main.
Signed-off-by: Arthur Zou <zzou@redhat.com>
Signed-off-by: Baoquan He <bhe@redhat.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
This is a backport of the following upstream commit. Late back ported
commit depends on it.
commit 9dc6440c63320066bc6344c6e3ca3c3af88bcc42
Author: Petr Tesarik <ptesarik@suse.cz>
Date: Thu Apr 24 10:58:43 2014 +0900
[PATCH v3] Introduce the mdf_pfn_t type.
Replace unsigned long long with mdf_pfn_t where:
a. the variable denotes a PFN
b. the variable is a number of pages
The number of pages is converted to a mdf_pfn_t, because it is a result
of subtracting two PFNs or incremented in a loop over a range of PFNs,
so it can get as large as a PFN.
Note: The mdf_ (i.e. makedumpfile) prefix is used to prevent possible
conflicts with other software that defines a pfn_t type.
Signed-off-by: Petr Tesarik <ptesarik@suse.cz>
Signed-off-by: Baoquan He <bhe@redhat.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com
Backport from the following commit from upstream makedumpfile:
commit 45fc42c
Author: WANG Chao <chaowang@redhat.com>
Date: Tue Jun 10 14:11:27 2014 +0900
[PATCH] Fix Makefile for eppic_makedumpfile.so build.
When libeppic isn't installed on a standard location, building
eppic_makedumpfile.so with -leppic directly doesn't work.
Add LDFLAGS to build arguments, so that one can pass LDFLAGS="-Ldir
-Idir" to tell where to search for libeppic library and its header
files.
For example, if eppic source is installed on the same directory level
with makedumpfile as the following:
makedumpfile
|--- arch
+--- eeppic_scripts
eppic
|--- applications
+--- libeppic
After compiling libeppic, one can use the following command to build
eppic_makedumpfile.so:
make LDFLAGS="-I../eppic/libeppic -L../eppic/libeppic" eppic_makedumpfile.so
Signed-off-by: WANG Chao <chaowang@redhat.com>
With this patch, we don't need use a fedora-specific patch for building
eppic_makedumpfile.so.
Signed-off-by: WANG Chao <chaowang@redhat.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
NetworkManager changed the format of ifcfg-device files. They may define
static IP addresses with the following format:
IPADDR0=192.168.122.100
PREFIX0=24
There may be up to 255 ip addresses for a network device - each with a unique
number tagged to the end of IPADDR and PREFIX.
Prior to this fix, kdump only handled static ip addresses defined with
IPADDR=192.168.122.100
PREFIX=24
ie. without the number.
The solution is to use "ip" commands to find the correct network information.
Tested with both static and dynamic IP addresses.
v2: Fixed a local variable that was set incorrectly
v3: Fix iscsi case
Signed-off-by: Marc Milgram <mmilgram@redhat.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Acked-by: Baoquan He <bhe@redhat.com>
Acked-by: WANG Chao <chaowang@redhat.com>
Rename the subpackage kdump-anaconda-addon to kexec-tools-anaconda-addon
to keep consistency and make fedpkg build happy
Because every time fedpkg builds a new release the package version number
should increase. But kdump-annaconda-addon just keep same version, so let's
rename it to kexec-tools-annaconda-addon here kexec-tools- is a default prefix.
For version let's use default top level version.
At the same time, rename the kdump-anaconda-addon directory name to anaconda-addon
to make it more standard. Using the current data instead of version number as a
surfix of kdump-anaconda-addon tarball just like kexec-tools-po did.
Signed-off-by: Arthur Zou <zzou@redhat.com>
Acked-by: Dave Young <dyoung@redhat.com>
makedumpfile_eppic.so (provided by kexec-tools-eppic) is built against
makedumpfile (provided by kexec-tools). kexec-tools-eppic must depend on
the same version.release of kexec-tools, otherwise there could be a ABI
compatibility issue.
Signed-off-by: WANG Chao <chaowang@redhat.com>
Just merge the kdump-anadonda-addon.pot to po files of original
firstboot so that we can reuse most of the translation. But there
still are three sentences that has no translation.
Signed-off-by: Arthur Zou <zzou@redhat.com>
Acked-by: Dave Young <dyoung@redhat.com>
Currently this work is done by firstboot. Now we move to anaconda addon
to configurate in the system installation process.
Signed-off-by: Arthur Zou <zzou@redhat.com>
Acked-by: Dave Young <dyoung@redhat.com>
During debugging of another problem issues were noted with the kdump udev
rules. The kdump service is restarted on memory add and remove events.
These are the wrong events for these types of devices and result in an overly
aggressive restarting of the kdump service.
There are four udev events to consider, "add", "remove", "online", and
"offline". The remove event is a complete removal from the system -- neither
the hardware nor the kernel know about the hardware; it has been physically
removed. The add event is associated with hardware being physically added to
the system. The kernel has some limited knowledge of the device, however,
it is not avaiable for the kernel to use until it is brought online. Online
events refer to the device being available for the kernel to use. Opposite
to that is the offline event, which occurs when a device is no longer in
use by the kernel.
Note that in all four events the kernel *may* have some remaining information
stored about the device.
In the case of memory hotplug, kdump should be restarted when a memory module
is onlined or offlined. This is because the memory is not in use by the
kernel until the memory is onlined, and it is unused when the memory is
offlined.
Making these modifications results in smooth service on systems that do
heavy memory onlining and offlining.
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
In current systemd implementation, nofail mount will not block
local-fs.target, which means our kdump.sh (in dracut-pre-pivot.service)
can't wait for nofail mount. And kdump.sh could run early than nofail
mount happens.
For short term, let's stop passing nofail to mount. As for
sysroot.mount, since we have explicitly specify to wait for it, "nofail"
isn't a problem.
Signed-off-by: WANG Chao <chaowang@redhat.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>