We only allow one instance of kdump service running at each time by
flock /var/lock/kdump which is opened as fd 9 in kdumpctl script.
However a leaking fd issue has been discovered by SELinux:
When executing a specific shell command (not the shell built-in but
provided by other packages, in this case - restorecon) in kdumpctl,
current shell will fork a new subshell for executing and
the subshell will inherit open fd 9 from parent shell. And SELinux
detects that subshell is holding the open fd and consider fd 9 is
leaked.
To avoid this kind of leaking, the most easy way seems to be breaking our
kdumpctl code out into two parts:
- A top level parent shell, which is only used to deal with the lock and
invoking the subshell below.
- A 2nd tier level subshell, which is closing the inherited open fd at
very first and doing the rest of the kdumpctl job. So that it isn't
leaking fd to its subshell when executing like restorecon, etc.
To be easy to understand, the callgraph is roughly like below:
[..]
--> open(9)
--> flock(9)
--> fork
--> close(9) <-- we close 9 right here
--> main() <-- we're now doing the real job
--> [..]
--> fork()
--> restorecon <-- we don't leak fd 9 to child process
--> [..]
--> [..]
As shown above, a wrapper main() is added as the 2nd tier level shell in
this kind of call model. So we can completely avoid leaking fd to
subshell.
Signed-off-by: WANG Chao <chaowang@redhat.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Description:
Currently we only added memdebug code before different dracut
hooks ie. pre-udev pre-pivot etc. Add memdebug in kdump.sh before
capturing vmcore is also good for debugging.
solution:
Add make_trace_mem before saving vmcore.
Signed-off-by: arthur <zzou@redhat.com>
Acked-by: Dave Young <dyoung@redhat.com>
There's a kernel bug for mapping mem ranges which end with
an address not aligned to page boundry. It's still not resolved
in upstream, so let's disable mmap read for now as a workaround.
Once upstream got a right fix we can revert this patch.
Signed-off-by: Dave Young <dyoung@redhat.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Description of Problem:
This is a REGRESSION issue.
At fedora makedumpfile has been updated toward v1.5.4. Unfortunately,
this version fails calculating phys_base on sadump format and then
fails converting vmcore.
x86_64 kernel is relocatable kernel and there can be a gap between
the physical address statically assigned to kernel data and texts
and the address that is really assigned to each object corresponding
to the kernel symbols. The gap is phys_base. makedump calculates the
phys_base in an ad-hoc way that comparing the addresses of some of
occurrences of "Linux kernel" strings in certain range of vmcore.
Resolution:
Fix patch has already been posted in upstream. so just back port.
The commit ID are:
commit e23dc0a1aa5fa7a4429f72ff1c2fe87a87291065
commit 92563d7a7a5175ef78c4a94ee269b1b455331b4c
Signed-off-by: arthur <zzou@redhat.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Boot with "crashkernel=128M,high", kernel now uses "Crash kernel" in
/proc/iomem for crash kernel memory reservations at both low and high:
commit 157752d
Author: Yinghai Lu <yinghai@kernel.org>
kexec: use Crash kernel for Crash kernel low
But kexec is still scanning for "Crash kernel low" in /proc/iomem, and
will fail immediately when load/unload crash kernel.
So let's pull the following commit from kexec upstream to make
it compatible with our kernel:
commit e25e6e7
Author: Yinghai Lu <yinghai@kernel.org>
kdump, x86: Process multiple Crash kernel in /proc/iomem
(This patch from upstream is untouched and can be applied cleanly)
Signed-off-by: WANG Chao <chaowang@redhat.com>
Acked-by: Baoquan He <bhe@redhat.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Description of Problem:
In cyclic mode, makedumpfile recalculates cyclic buffer size as the
largest multiple of the largest block size managed by buddy
allocator, i.e. 4MB, smaller than the cyclic buffer size in order to
enable to process each unit of blocks managed by buddy allocator in
each cycle.
However, makedumpfile does two wrong things in the recalculations:
1) While updating size of cyclic buffer, makedumpfile doesn't update
length of range of cycle in page frame numbers, due to which, if
cyclic buffer size is updated, because cyclic buffer size is always
reduced during udpate, some buffer overrun can happen on the cyclic
buffer. This can cause segmentation violation in the worst case.
2) roundup() is used to calculate bitmap size for maximum block size
managed by buddy allocator, here divideup() is correct, due to
which, although memory filtering is not affected, cyclic buffer size
get too much aligned and less efficient.
Fix patches has already been posted and merged in makedumpfile
development devel branch.
git://git.code.sf.net/p/makedumpfile/code
f8c8218856effc43ea01cd9394761cfb8aeaa8df
a785fa7dd7a7bd7dcbb017d0bea8848243b0924f
Signed-off-by: Baoquan He <bhe@redhat.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Acked-by: WANG Chao <chaowang@redhat.com>
The dumpfile header has this field, which was inherited from
the old "diskdump" facility:
struct disk_dump_header {
...
unsigned int max_mapnr; /* = max_mapnr */
...
and which, among other things, is used by the crash utility as a
delimiter to determine whether a physical address read request is
legitimate. And obviously the field cannot handle PFN values greater
than 32-bits.
The makedumpfile source code does have its own max_mapnr representation
in its DumpInfo structure in "makedumpfile.h":
struct DumpInfo {
...
unsigned long long max_mapnr; /* number of page descriptor */
...
But in its "diskdump_mod.h" file, it carries forward the old diskdump
header format, which has the 32-bit field:
struct disk_dump_header {
...
unsigned int max_mapnr; /* = max_mapnr */
...
And here in "makedumpfile.c", the inadvertent truncation occurs
when the PFN is greater than 32-bits:
int
write_kdump_header(void)
{
...
dh->max_mapnr = info->max_mapnr;
...
Now upstream has below commit to fix this, back port it:
commit 8e124174b62376b17ac909bc68622ef07bde6840
Author: Jingbai Ma <jingbai.ma@hp.com>
Date: Fri Oct 18 18:53:38 2013 +0900
Signed-off-by: Baoquan He <bhe@redhat.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Acked-by: WANG Chao <chaowang@redhat.com>
In mkdumprd, strip_comments is not implemented correctly. Since arguments
passed, strip_comments only take $1 and misses others. This caused
problems. Such as below line, current code will only get "makedumpfile"
and pass it to $config_val finally, then parameters for makedumpfile
are missed.
core_collector makedumpfile -c --message-level 1 -d 31
Now modify function strip_comments.
Signed-off-by: Baoquan He <bhe@redhat.com>
Acked-by: WANG Chao <chaowang@redhat.com>
Acked-by: Dave Young <dyoung@redhat.com>
In 2.0.4, Cliff from HP posted 2 patches:
e35aa29 kexec: include reserved e820 sections in crash kernel
4932034 kexec: lengthen the kernel command line image
However, with both of them kdump kernel may fail to boot, and
are useless because of restriction in kernel side. In upstream,
they have been reverted. Now back port these 2 revert commits.
Also since the commit 1a4e90b has dependency, back port commit
dc607e4 which is depended on by commit 1a4e90b too.
1a4e90b Revert "kexec: include reserved e820 sections in crash kernel"
dc607e4 kexec: i386: Add cmdline_add_memmap_internal() to reduce the code duplication
8274916 Revert: "kexec: lengthen the kernel command line image"
Currently we have two issues against mounting filesystems by systemd.
1. If any failure in sysroot.mount, initrd.target won't be reached.
2. If any failure in mounting /etc/fstab, initrd.target won't be reached
Our kdump.sh is in dracut-pre-pivot hook which is ordered after
initrd.target. That means if systemd doesn't reach initrd.target,
pre-pivot service will not run.
Based on above, we can conclude that in order to run kdump.sh,
initrd.target must be reached.
To fix issue 1), we can add rootflags=nofail to 2nd kernel cmdline, so
that initrd.target will not require sysroot.mount. initrd.target
wouldn't care about the failures in sysroot.mount. That means
initrd.target can always be reached whether or not sysroot.mount fails.
So when initrd.target is reached, kdump.sh can be run.
To fix issue 2), we can append "nofail" mount options to every entry in
/etc/fstab. It has almost the same affects as to sysroot.mount.
initrd.target can be reached whether or not mount /etc/fstab fails. So
when initrd.target is reached, kdump.sh can be run.
If the mount failures block kdump from working properly (for example,
the dump target isn't mounted), the error handling will be done by
"default" action specified in /etc/kdump.conf. Otherwise kdump will
ignore the mount failures and dump as expected.
Signed-off-by: WANG Chao <chaowang@redhat.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Acked-by: Dave Young <dyoung@redhat.com>
From: Wade Mealing <wmealing@redhat.com>
The RHEL 5 release of mkdumprd allowed for comments in the kdump config
file as shown below:
net 192.168.1.1 # this is the comment part
This patch strips them out during processing, but leaves the configuration
file in original condition.
Signed-off-by: Wade Mealing <wmealing@redhat.com>
Signed-off-by: Baoquan He <bhe@redhat.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Add function strip_comments into kdump-lib.sh, since it's used by
several files.
Signed-off-by: Wade Mealing <wmealing@redhat.com>
Signed-off-by: Baoquan He <bhe@redhat.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Currently in the whole kdump framework, we have some common functions
used across not only mkdumprd context and dracut context, but also 1st
kernel and 2nd kernel. We defined these functions at each script, which
is obviously not decent.
So let's introduce kdump-lib.sh for the shared functions and put it
to /lib/kdump/kdump-lib.sh.
It starts small, as you can see, only 3 functions are extracted. But in
the future more and more common functions can be added.
Signed-off-by: WANG Chao <chaowang@redhat.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Now kdump.service runs "After" network.target. But network.target
doesn't mean network is setup and online[1]. We should use
network-online.target instead for ssh/nfs dump.
And also because nfs dump requires a mounted nfs when rebuilding kdump
initrd, kdump.service should also run "After" remote-fs.target (this
means all remote fs configured in /etc/fstab is mounted).
The downside of this patch is we always need to wait for network-online.target
even when dump target is a local disk. If network fails to come up,
kdump.service have to be stuck until network-online.target timeout (90
seconds by default).
Signed-off-by: WANG Chao <chaowang@redhat.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
In kdump_setup_bridge/bond/team(), we use _dev as a global variable.
That causes following issues when network is br0 over bond0:
-> kdump_setup_bridge br0: _dev to be "bond0" as a brif
-> kdump_setup_bond bond0: _dev is modified to be eth0 as a bond slave
-> (jump back) kdump_setup_bridge br0: we really need _dev is
"bond0" not "eth0".
_dev must be a local variable because it has been used multiple places.
Signed-off-by: WANG Chao <chaowang@redhat.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Acked-by: Dave Young <dyoung@redhat.com>
There will be a race condition if multiple kdumpctl instances are
running at the same time.
By introducing a global mutex lock, only one instance can acquire this
lock and run, others will be waiting for the lock in queue. Now each
kdump instance will be run in serial order and there won't any race
condition.
This is a patch backported from RHEL6.
Signed-off-by: WANG Chao <chaowang@redhat.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Acked-by: Dave Young <dyoung@redhat.com>
Chaowang measured the selinux load_policy memory usage, it need ~50M
It's too much under kdump 2nd kernel, it cause more OOM then before.
Here is the findings from Vivek:
- If we don't load policy or don't do restorecon, kernel automatically
uses a label for file as specified by file
/sys/fs/selinux/initial_contexts/file
On my system this value is "system_u:object_r:file_t:s0". Kernel
enforces this label on a file if it is not labeled. That's the reason
that you see above label on vmcore file when selinux policy was not
loaded in second kernel or restorecon was not done.
Note: I did some testing with rhel6 and there also I see file_t context.
Not sure why that's the case.
- Relabeling of root file system over boot happens if there is a file
/.autorelabel present. This file is touched by systemd service
fedora-autorelabel-mark.service. And this file comes from initscritps
package.
So if this service thinks that system was booted with selinux disabled
it will put this file on root and when next time system boots with
selinux enabled, relabeling is enforced by fedora-autorelabel.service
service.
- In our case relabeling is not happening after saving vmcore because
there does not seem be any fedora-autorelabel-mark.service running
from initramfs context. Looks like this service runs after switching
to real root.
Aug 08 10:44:13 vm9-f19 systemd[1]: Started Mark the need to relabel after reboot.
- selinux poicy is now loaded by systemd after root switch has taken
place.
Aug 08 10:44:10 vm9-f19 systemd[1]: Successfully loaded SELinux policy in 357.693ms.
So now we know that why selinux relabeling is not taking place. Reason
being that systemd service which marks the file system for autorelabeling
does not run from initramfs context.
And it might not make to run this service from initramfs context before
switch root. In general it makes sense to first switch to root, load
selinux policy if needed and then check whether to mark this filesystem
for relabel or not. Ideally root is mourted read only before that. It is
just that we break this rule for kdump. So as long as we make sure we
relabel files created by kdump after booting back, things should be fine.
Since we will relabel the vmcore dir after reboot so let's remove
the selinux dracut module dependency to avoid load_policy in 2nd kernel.
If in the future load_policy memory usage shrinks to an acceptable level
or there's a better solution we can add selinux load_policy back later.
Signed-off-by: Dave Young <dyoung@redhat.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
do_dump() takes care of dump procedure. It'll error out if failing to
save vmcore.
Signed-off-by: WANG Chao <chaowang@redhat.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Currently in initrd, hardware clock is always considered to use UTC time
format and system time zone is also UTC. Thus system time isn't correct
if hw clock is localtime or we're using other time zone in real root.
To fix this, install /etc/adjtime and /etc/localtime to initrd.
Previously, this functionality was implemented in dracut base module:
commit 77364fd
Author: WANG Chao <chaowang@redhat.com>
base: setup correct system time and time zone in initrd
But some people complains about a normal boot initrd needs to rebuild
every time if time zone is changed. So let's fix it on our side.
Signed-off-by: WANG Chao <chaowang@redhat.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
2nd kernel has very limited memory. Allocating huge pages will probably
trigger OOM. So let's remove hugepages and hugepagesz kernel parameters
for 2nd kernel when 1st kernel are using them.
If user wants huge pages cmdline in 2nd kernel, he/she can still specify
it through KERNEL_COMMANDLINE_APPEND in /etc/sysconfig/kdump.
This patch adds a new function remove_cmdline_param(). It takes a list
of kernel parameters as its arguments and remove them from given kernel
cmdline.
update:
1. Add description of remove_cmdline_param() per Vivek.
2. Remove_cmdline_param() will take kernel cmdline as $1, then strip it
and print the result.
Signed-off-by: WANG Chao <chaowang@redhat.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Currently when action_on_fail is enabled, the emergency_shell won't be called
either. In kdump even though user specify the default action as emergency_shell,
dracut still skip it. Now change the implementation of action_on_fail to depend
on a file which is created by kdump when making kdump initrd, then remove it
at the beginning of kdump. This can solve the explicit emergency_shell problem.
Signed-off-by: Baoquan He <bhe@redhat.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Acked-by: WANG Chao <chaowang@redhat.com>
This reverts commit 9e964ff4c6.
Currently, because of dracut implementation, in kdump 'default shell' will
call emergency_shell of dracut. If action_on_fail is enabled, emergency_shell
is skipped. Then 'default shell' won't work either.
Here revert the old commit 9e964ff4 so that take other implementation.
Signed-off-by: Baoquan He <bhe@redhat.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Acked-by: WANG Chao <chaowang@redhat.com>
Currently in kdump.sh, we redirect stdout to stderr, because dracut
pre-pivot service (which kdump.sh is running within) only output stderr
to console. That behavior is defined in dracut-pre-pivot.service:
[Service]
...
StandardInput=null
StandardOutput=syslog
StandardError=syslog+console
...
But during testing, it has been observed that systemd will cache stderr
buffer, and first record to syslog (and it's own journal), then copy the
logs to /dev/console. And this practice is somehow unexpected in our
kdump script. We may have suppressed stdout/stderr that hasn't been
write to /dev/console before we run a force reboot.
With this change of redirecting stdout/stderr to /dev/console, kdump.sh
will output everything immediately to console, not cached/hidden by
systemd.
Signed-off-by: WANG Chao <chaowang@redhat.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Build on ppc/ppc64 failed after makedumpfile-1.5.4 is pulled, since the
variable vmap_area_list is not defined. Back port below commit from
upstream to add it.
commit 150b58eb299066c65ef7713a93effc35c00be03a
Author: Baoquan He <bhe@redhat.com>
Date: Mon Jul 15 20:37:14 2013 +0800
[PATCH] Add vmap_area_list definition for ppc/ppc64.
vmap_area_list is added to get vmalloc_start for ppc/ppc64, but its
definition is missing, now add them.
Signed-off-by: Baoquan He <bhe@redhat.com>
Currently some functions are used in subshell to assign string to a
variable. For example:
_mnt=$(to_mount "$1")
In this case if we call perror_exit in the subshell, subshell will exit
1, but the parent process (mkdumprd) won't exit.
So we should handle the exit code of a subshell if the subshell calls
perror_exit over a failure.
Signed-off-by: WANG Chao <chaowang@redhat.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
kvm virtio-blk device, for example /dev/vdb, doesn't have serial id by
default. So there's no persistent device node under /dev/disk/ for
/dev/vdb.
In case no persistent dev for dump target, we should use the original
device name directly, not failing the mkdumprd.
v2: update warn message from Vivek.
Signed-off-by: WANG Chao <chaowang@redhat.com>
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
dump_to_rootfs is a special handling of dump_fs. It's better we merge them
together to cleaup code.
Now dump_fs() function takes two types of $1, a mount point like
/sysroot or a dump target device like /dev/mapper/vg-lv_kdump.
v2: remove -F option in makedumpfile case from Vivek
Signed-off-by: WANG Chao <chaowang@redhat.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
After dump the vmcore, explicitly commit changed cache to disk in case
umount fail or chances we'll have an incomplete vmcore.
Signed-off-by: WANG Chao <chaowang@redhat.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Percent signs in .spec files get expanded as macros. Currently in kexec-tools.spec,
'%{dist}' are appended to changelog item. This older changelog is not correctly for
rhel7 with this. Let's remove it to make it clearer.
stdout if line buffered, thus even it's redirect to stderr, it will not show
on console automaticly. Because monitor_dd_progress is only for rawdump
currently, so I think we can just use "echo" instead of "echo -n".
Another problem is sometimes CURRENT_SIZE does not get value when it's used
in $(($CURRENT_SIZE / 1048576)), fix this issue as well.
Signed-off-by: Dave Young <dyoung@redhat.com>
Acked-by: WANG Chao <chaowang@redhat.com>
When using makedumpfile as core_collector, makedumpfile will show its
own progress bar, it will mix with the monitor_dd_progress and cause confusion.
In this patch just call monitor_dd_progress when core_collector is not
makedumpfile
Signed-off-by: Dave Young <dyoung@redhat.com>
Acked-by: WANG Chao <chaowang@redhat.com>
Fedora 18 changes the way how to work with services in spec files.
It introduces new macros - %systemd_post, %systemd_preun and
%systemd_postun. These macros are functionally equivalent to the
manual scriptlets used in older versions of Fedora.
By using the new unified RPM macros the .spec file code is
simplified a lot.
Signed-off-by: Baoquan He <bhe@redhat.com>
Acked-by: WANG Chao <chaowang@redhat.com>
Currently umount fs happens right after saving vmcore. Therefore vmcore
isn't directly accessible in kdump_post script or shell. This patch moves
the umount fs operation down to the very last part, right before kdump
exits.
The patch adds a global variable MOUNTS to keep track which filesystem
is used. And umount these filesystems at do_default_action() and
do_final_action().
Signed-off-by: WANG Chao <chaowang@redhat.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Certain dracut module will mount fs under real root(/sysroot/ or
$NEWROOT/). Thus root fs can not be umounted by `umount /sysroot/`.
We should use `umount -R /sysroot/` to recursively umount root and
its submounts.
v2: do the same for dump_fs() from Baoquan
Signed-off-by: WANG Chao <chaowang@redhat.com>
Signed-off-by: Baoquan He <bhe@redhat.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Replace $1/$2 with local variable names in dump_raw() and dump_ssh()
Signed-off-by: WANG Chao <chaowang@redhat.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
When makedumpfile failed, it could still generate a invalid vmcore. It's
better to suffix these invalid vmcore files with "-incomplete", as we do
in RHEL6.
Signed-off-by: WANG Chao <chaowang@redhat.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>