Commit Graph

1653 Commits

Author SHA1 Message Date
Kairui Song
b82c35c842 Partially Revert "Don't mount the dump target unless needed"
This partially reverts commit 6dee286467.

There are reports that NFSv3 is failing after this commit, and after
more debug, I found NFSv4 may not work properly if
"nfs4_disable_idmapping" is set to 0.

The cause of the failure is that kdump.sh runs after dracut's pre-pivot
and clean up hook, many dracut module will install hooks to kill some
running services, so if the dump target requires a service to be running
but it's killed, mount will fail.

Dracut ensures the configured mount points are ready before pre-pivot.
After pre-pivot, any further mounting operation may not work as expected.

Although there is no report of other type of dump target failure except
NFSv3, it's better to revert this, to avoid other potential risk, and wait
for a proper fix for that systemd/kernel issue.

Else, this may bring more trouble for further development.

But still keep the change in kdump-lib-initramfs.sh for better
robustness.

Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Pingfan Liu <piliu@redhat.com>
2020-05-11 14:22:22 +08:00
Hari Bathini
d624a0326f fadump: update fadump-howto.txt with some troubleshooting help
Add recommendations on how much memory is required for FADump. Also,
mention the optimizations applied to default initrd when FADump is
used and how to workaround it.

Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Acked-by: Pingfan Liu <piliu@redhat.com>
2020-05-11 14:08:14 +08:00
Lianbo Jiang
ce0305d4f9 Add a new option 'rd.znet_ifname' in order to use it in udev rules
In most cases, it always provides a persistent MAC address. But for
the s390 Arch, sometimes, kernel could run in the LPAR mode and it
doesn't provide a persistent MAC address, which caused the kdump
failure.

Currently, some rules rely on the persistent MAC address, for the
above case, which won't work in kdump kernel because non-persistent
MAC could not match with udev rules.

To fix this issue, need to add a new option 'rd.znet_ifname' in order
to provide extra parameters such as 'ifname' and 'subchannels' for
some rules, which ensures kdump can also work appropriately without
the persistent MAC. Please refer to the following commit in dracut:

872eb69936bd ("95znet: Add a rd.znet_ifname= option")

Signed-off-by: Lianbo Jiang <lijiang@redhat.com>
Acked-by: Kairui Song <kasong@redhat.com>
2020-04-27 18:21:22 +08:00
Kazuhito Hagio
74081a2b64 Don't unmount the dump target just after saving vmcore
Since commit 6dee286467 ("Don't mount the dump target unless needed"),
dump_fs function unmounts the dump target just after saving vmcore.

This broke the condition that it's mounted when executing "kdump_post",
which had been stable since RHEL5, and a certain tool which uses the
kdump_post hook to save information of 2nd kernel to the dump target
started to fail.

As unmounting it is done by systemd-shutdown before reboot without
the umount command as below, so let's don't unmount it in dump_fs.

  systemd-shutdown[1]: Unmounting file systems.
  [547]: Remounting '/sysroot' read-only in with options '(null)'.
  EXT4-fs (dm-0): re-mounted. Opts: (null)
  [548]: Unmounting '/sysroot'.

Signed-off-by: Kazuhito Hagio <k-hagio-ab@nec.com>
Acked-by: Kairui Song <kasong@redhat.com>
2020-04-27 16:30:59 +08:00
Pingfan Liu
f33f30eb61 dracut-module-setup.sh: fix breakage in get_pcs_fence_kdump_nodes()
pcs cluster and cluster cib-upgrade may throw some information and disturb
the parsing. Mute them

Signed-off-by: Pingfan Liu <piliu@redhat.com>
Acked-by: Kairui Song <kasong@redhat.com>
2020-04-20 11:24:46 +08:00
Pingfan Liu
6348398743 dracut-module-setup.sh: ensure cluster info is ready before query
There is a race issue between "pcs" and "kdumpctl restart"

-1. set up cluster
 # pcs cluster setup --start mycluster node1 node2
 # pcs stonith create kdump fence_kdump pcmk_reboot_action="off"
 # pcs stonith level add 1 node1 kdump
 # pcs stonith level add 1 node2 kdump

-2. Then here comes the command _immediately_ in kdumpctl
 # pcs cluster cib

But due to some pcs internal mechanism, "pcs cluster cib" can not
fetch the updated info in time.

Fix these issue by forcing the upgrade of cib.

Signed-off-by: Pingfan Liu <piliu@redhat.com>
Acked-by: Kairui Song <kasong@redhat.com>
2020-04-08 15:46:06 +08:00
Kairui Song
b5b252ae27 Release 2.0.20-12
Signed-off-by: Kairui Song <kasong@redhat.com>
2020-04-02 01:48:46 +08:00
Kairui Song
3b09c4910d Remove adjust_bind_mount_path call
If user configured target is used, path should be used as the absolute
path within the dump target direct, and user should be fully aware of
the path structure within the target device. The adjust_bind_mount_path
call here make it very hard to control the behavior.

Especially, if it's a cross device bind mount, this will likely create a
invalid path in the target. And for atomic case, adjust_bind_mount_path call
here assumes user will always pass root device as the explicitly configured
dump target, which is not true.

If user configured target device is used, the path is always be the
absolute path inside of given target. If user don't know about the path
structure in the target device, then user should either use the path
based config, or carefully exam the target device before using it as a
dump target.

Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Lianbo Jiang <lijiang@redhat.com>
2020-03-30 22:06:46 +08:00
Kairui Song
bde4b7af3b No longer treat atomic/silverblue specially
This commit remove almost all special workaround for atomic, and treat
all bind mounts in any environment equally.

Use a helper get_bind_mount_directory_from_path to get the bind mount
source path of given path.

is_atomic function now only used to determine the right /boot path
for atomic/silverblue environment.

And remove get_mntpoint_from_path(), it's the only function that never
ignore bind mount, and it have no caller after this clean up.

Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Lianbo Jiang <lijiang@redhat.com>
2020-03-30 22:06:37 +08:00
Kairui Song
b5b0b90521 mkdumprd: Simplify handling of user specified target
For user specified target, the config value is used as the dump target,
and SAVE_PATH (path in kdump.conf) value is used as the dump path within
the dump target, no need to do anything extra with the path value.

Current code logic is not only complicated, it also wrongly generate
an redundantly long path in atomic/silverblue environment.

The right way is only check two things, and do nothing else:

 1. The path exists within the target;
 2. The target is large enough to hold to contain the vmcore.

Currently checking the target still requires it to be mounted so it will
error out if it's not mounted. Will implement some auto mount as next
step.

Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Lianbo Jiang <lijiang@redhat.com>
2020-03-30 22:06:28 +08:00
Kairui Song
fd7e7be483 mkdumprd: Use get_save_path instead of parsing config
get_save_path provides default value fail back and error check, no need
to repeat it again.

Also remove a redundant echo and grep in get_save_path

Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Lianbo Jiang <lijiang@redhat.com>
2020-03-30 22:06:07 +08:00
Kairui Song
c1c7f004c8 Remove is_dump_target_configured
It's basically same with is_user_configured_dump_target and only have
one caller. And the name is confusing, the dump target is always
configured, it's either user configured or path based.

Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Lianbo Jiang <lijiang@redhat.com>
2020-03-30 22:05:31 +08:00
Pingfan Liu
3be5f74df0 dracut-module-setup.sh: improve get_alias()
In /etc/hosts, the alias name can come at the 2nd column, regardless of the
recommendation.

E.g. the following format is valid although not recommended
cat /etc/hosts
        127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
        ::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
        192.168.22.21	fastvm-rhel-7-6-21	fastvm-rhel-7-6-21.localdomain
        192.168.22.22	fastvm-rhel-7-6-22	fastvm-rhel-7-6-22.localdomain

        192.168.22.21   node1_hb
        192.168.22.22   node2_hb

So filtering out both 2nd and 3rd column for matching.

Signed-off-by: Pingfan Liu <piliu@redhat.com>
Acked-by: Kairui Song <kasong@redhat.com>
2020-03-24 15:36:26 +08:00
Kairui Song
4b43ba063d Release 2.0.20-11
Signed-off-by: Kairui Song <kasong@redhat.com>
2020-03-24 01:47:45 +08:00
Kairui Song
424ac0bf80 Fix a potential syntax error
Process substitution is not POSIX standard syntax, so if bash is configured
to strictly follow POSIC, this will fail.

Just use a POSIX friendly syntax instead.

Fixes: bz1708321

Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Lianbo Jiang <lijiang@redhat.com>
2020-03-23 10:26:20 +08:00
Kairui Song
e78639b46f Use read_strip_comments to filter the installed kdump.conf
This help remove redundant spaces and tailing comment in installed
kdump.conf, currently installed kdump.conf always contain extra empty
lines.

Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Lianbo Jiang <lijiang@redhat.com>
2020-03-23 10:26:08 +08:00
Kairui Song
632c369ec2 kdumpctl: fix driver change detection on latest Fedora
Now modinfo will return "(builtin)" instead of empty string for builtin
module. Sync the code logic.

Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Pingfan Liu <piliu@redhat.com>
2020-03-23 10:24:57 +08:00
Kairui Song
2fbcdf41e3 kdumpctl: check hostonly-kernel-modules.txt for kernel module
Since Dracut commit a0d9ad6 loaded-kernel-modules is renamed to
hostonly-kernel-modules and contains all hostonly modules. So check
hostonly-kernel-modules instead for module change.

Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Pingfan Liu <piliu@redhat.com>
2020-03-18 15:11:43 +08:00
Kairui Song
c9c50f9a36 dracut-module-setup.sh: Ensure initrd.target.wants dir exists
Latest dracut release stopped creating
$systemdsystemunitdir/initrd.target.wants dir for us, so ensure it
exists before creating the symlink.

Signed-off-by: Kairui Song <kasong@redhat.com>
Tested-and-Reviewed-by: Bhupesh Sharma <bhsharma@redhat.com>
2020-03-18 15:10:59 +08:00
Bhupesh Sharma
760beb7e57 mkdumprd: Use DUMP_TARGET which printing error message during ssh
When building kdump initramfs for a SSH dump target, mkdumprd would
check whether it has the write permission on the SSH Server's
$DUMP_TARGET.

However $DUMP_TARGET is missing in the actual error message when the
user doesn't not have the write permission. For example:

 # kdumpctl restart
   kexec: unloaded kdump kernel
   Stopping kdump: [OK]
   Could not create temporary directory on :/home/bhsharma/test. Make
   sure user has write permission on destination
   mkdumprd: failed to make kdump initrd
   Starting kdump: [FAILED]

This patch using $1 value passed to mkdumprd, to print the
$DUMP_TARGET inside mkdir_save_path_ssh() function to
fix the issue.

Signed-off-by: Bhupesh Sharma <bhsharma@redhat.com>
Acked-by: Kairui Song <kasong@redhat.com>
2020-03-18 14:28:41 +08:00
HAGIO KAZUHITO(萩尾 一仁)
64d30f54b9 kdump-lib.sh: Fix is_user_configured_dump_target()
Currently, is_user_configured_dump_target() doesn't work as expected
due to lack of grep -E option.

As a result, kdump service with a ssh dump configuration can unnecessarily
fail to start due to the non-existence of a directory at where the path
option specifies on the local system:

  kdumpctl[9760]: Rebuilding /boot/initramfs-5.4.19-200.fc31.x86_64kdump.img
  kdumpctl[9760]: Dump path /var/crash/ssh does not exist.
  kdumpctl[9760]: mkdumprd: failed to make kdump initrd
  kdumpctl[9760]: Starting kdump: [FAILED]

Signed-off-by: Kazuhito Hagio <k-hagio-ab@nec.com>
Acked-by: Kairui Song <kasong@redhat.com>
2020-03-12 23:32:25 +08:00
Kazuhito Hagio
a1c28126ef mkdumprd: Use makedumpfile --check-params option
In order to check whether the specified makedumpfile parameters are
valid or not when generating initramfs, use the --check-params option,
which was recently added.

With the patch, kdumpctl can point out mistakes in core_collector
option and failed.  For example, if there is an practical mistake
that dump_level is -1:

  # cat /etc/kdump.conf
  core_collector makedumpfile -l --message-level 1 -d -1

  # kdumpctl start
  Detected change(s) in the following file(s):

    /etc/kdump.conf
  Rebuilding /boot/initramfs-5.4.19-200.fc31.x86_64kdump.img
  Dump_level(-1) is invalid.
  makedumpfile parameter check failed.
  mkdumprd: failed to make kdump initrd
  Starting kdump: [FAILED]

Signed-off-by: Kazuhito Hagio <k-hagio-ab@nec.com>
Acked-by: Kairui Song <kasong@redhat.com>
2020-03-12 23:32:22 +08:00
Kazuhito Hagio
476a2b50f6 makedumpfile: Introduce --check-params option
Backport from the upstream makedumpfile devel branch.

commit 989152e113bfcb4fbfbad6f3aed6f43be4455919
Author: Kazuhito Hagio <k-hagio-ab@nec.com>
Date:   Tue Feb 25 16:04:55 2020 -0500

    [PATCH] Introduce --check-params option

    Currently it's difficult to check whether a makedumpfile command-line
    is valid or not without an actual panic.  This is inefficient and if
    a wrong configuration is not tested, you will miss the vmcore when an
    actual panic occurs.

    In order for kdump facilities like kexec-tools to be able to check
    the specified command-line parameters in advance, introduce the
    --check-params option that only checks them and exits immediately.

    Signed-off-by: Kazuhito Hagio <k-hagio-ab@nec.com>

Signed-off-by: Kazuhito Hagio <k-hagio-ab@nec.com>
Acked-by: Kairui Song <kasong@redhat.com>
2020-03-12 23:32:18 +08:00
HAGIO KAZUHITO(萩尾 一仁)
b229a9773c Improves the early-kdump-howto.txt document in several points:
(1) explain early kdump a little clearer in "Introduction"
(2) move the notes out to a new "Notes" section for readability and
    add a note about need of reconfiguration after kernel update
(3) change journalctl -x option to -b option because -x is unnecessary
    and -b will make it very faster if persistent journal is available
(4) shorten the example messages for readability
(5) add a note to "Limitation" about the earliness of early kdump

Signed-off-by: Kazuhito Hagio <k-hagio-ab@nec.com>
Acked-by: Lianbo Jiang <lijiang@redhat.com>
2020-03-12 23:31:34 +08:00
Kairui Song
16d2e4274d Release 2.0.20-10
Signed-off-by: Kairui Song <kasong@redhat.com>
2020-02-13 16:19:54 +08:00
s.morishima@fujitsu.com
a878ad0100 Add --force option to step 2 in early-kdump-howto.txt
For step2 in early-kdump-howto.txt, --force option of dracut
is necessary to rebuild system initramfs. Without --force option,
executing step2 fails because system initramfs already exists.

Signed-off-by: Shigeki Morishima <s.morishima@jp.fujitsu.com>
Acked-by: Kairui Song <kasong@redhat.com>
2020-02-13 14:17:18 +08:00
s.morishima@fujitsu.com
2af3537dfa Fix typo in early-kdump-howto.txt
Signed-off-by: Shigeki Morishima <s.morishima@jp.fujitsu.com>
Acked-by: Kairui Song <kasong@redhat.com>
2020-02-13 14:17:12 +08:00
Bhupesh Sharma
a01270b64e kexec-tools/module-setup: Ensure eth devices get IP address for VLAN
Currently while trying to save vmcore via vlan eth interface, the Kdump
kernel fails with network unreachable message.

This is because mkdumprd produces a vlan config that does not get
ip address for vlan on eth device.

Fix the same via this patch.

Signed-off-by: Bhupesh Sharma <bhsharma@redhat.com>
Acked-by: Kairui Song <kasong@redhat.com>
2020-02-13 14:13:59 +08:00
Hari Bathini
e3f2f926dd powerpc: enable the scripts to capture dump on POWERNV platform
With FADump support added on POWERNV paltform, enable the scripts to
capture /proc/vmcore. Also, if CONFIG_OPAL_CORE is enabled, OPAL core
is preserved and exported on POWERNV platform. So, offload OPAL core,
if it is available.

Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Acked-by: Kairui Song <kasong@redhat.com>
2020-02-06 22:13:06 +08:00
Lianbo Jiang
6a20bd5447 kdump-lib: switch to the kexec_file_load() syscall on x86_64 by default
UEFI Secure boot is a signature verification mechanism, designed to
prevent malicious code being loaded and executed at the early boot
stage. This makes sure that code executed is trusted by firmware.

Previously, with kexec_file_load() interface, kernel prevents unsigned
kernel image from being loaded if secure boot is enabled. So kdump will
detect whether secure boot is enabled firstly, then decide which interface
is chosen to execute, kexec_load() or kexec_file_load(). Otherwise unsigned
kernel loading will fail if secure boot enabled, and kexec_file_load() is
entered.

Now, the implementation of kexec_file_load() is adjusted in below commit.
With this change, if CONFIG_KEXEC_SIG_FORCE is not set, unsigned kernel
still has a chance to be allowed to load under some conditions.

commit 99d5cadfde2b ("kexec_file: split KEXEC_VERIFY_SIG into KEXEC_SIG
and KEXEC_SIG_FORCE")

And in the current Fedora, the CONFIG_KEXEC_SIG_FORCE is not set, only the
CONFIG_KEXEC_SIG and CONFIG_BZIMAGE_VERIFY_SIG are set on x86_64 by default.
It's time to spread kexec_file_load() onto all systems of x86_64, including
Secure-boot platforms and legacy platforms. Please refer to the following
form.

.----------------------------------------------------------------------.
| .                    |     signed kernel     |    unsigned kernel    |
|    .      types      |-----------------------|-----------------------|
|       .              |Secure boot|  Legacy   |Secure boot|  Legacy   |
|          .           |-----------|-----------|-----------|-----------|
| options     .        | prev| now | prev| now |     |     | prev| now |
|                .     |(file|(file|(only|(file| prev| now |(only|(file|
|                    . |load)|load)|load)|load)|     |     |load)|load)|
|----------------------|-----|-----|-----|-----|-----|-----|-----|-----|
|KEXEC_SIG=y           |     |     |     |     |     |     |     |     |
|SIG_FORCE is not set  |succ |succ |succ |succ |  X  |  X  |succ |succ |
|BZIMAGE_VERIFY_SIG=y  |     |     |     |     |     |     |     |     |
|----------------------|-----|-----|-----|-----|-----|-----|-----|-----|
|KEXEC_SIG=y           |     |     |     |     |     |     |     |     |
|SIG_FORCE is not set  |     |     |     |     |     |     |     |     |
|BZIMAGE_VERIFY_SIG is |fail |fail |succ |fail |  X  |  X  |succ |fail |
|not set               |     |     |     |     |     |     |     |     |
|----------------------|-----|-----|-----|-----|-----|-----|-----|-----|
|KEXEC_SIG=y           |     |     |     |     |     |     |     |     |
|SIG_FORCE=y           |succ |succ |succ |fail |  X  |  X  |succ |fail |
|BZIMAGE_VERIFY_SIG=y  |     |     |     |     |     |     |     |     |
|----------------------|-----|-----|-----|-----|-----|-----|-----|-----|
|KEXEC_SIG=y           |     |     |     |     |     |     |     |     |
|SIG_FORCE=y           |     |     |     |     |     |     |     |     |
|BZIMAGE_VERIFY_SIG is |fail |fail |succ |fail |  X  |  X  |succ |fail |
|not set               |     |     |     |     |     |     |     |     |
|----------------------|-----|-----|-----|-----|-----|-----|-----|-----|
|KEXEC_SIG is not set  |     |     |     |     |     |     |     |     |
|SIG_FORCE is not set  |     |     |     |     |     |     |     |     |
|BZIMAGE_VERIFY_SIG is |fail |fail |succ |succ |  X  |  X  |succ |succ |
|not set               |     |     |     |     |     |     |     |     |
 ----------------------------------------------------------------------
Note:
[1] The 'X' indicates that the 1st kernel(unsigned) can not boot when the
    Secure boot is enabled.

Hence, in this patch, if on x86_64, let's use the kexec_file_load() only.
See if anything wrong happened in this case, in Fedora firstly for the
time being.

Signed-off-by: Lianbo Jiang <lijiang@redhat.com>
Acked-by: Kairui Song <kasong@redhat.com>
2020-02-06 21:57:14 +08:00
Kairui Song
1b89cc245f Release 2.0.20-9
Signed-off-by: Kairui Song <kasong@redhat.com>
2020-01-29 14:44:16 +08:00
Kairui Song
47f2a819d5 Fix builing failure on Fedora 32
Signed-off-by: Kairui Song <kasong@redhat.com>
2020-01-29 14:43:47 +08:00
Kairui Song
71d20dc825 Release 2.0.20-8
Signed-off-by: Kairui Song <kasong@redhat.com>
2020-01-29 08:48:04 +08:00
Kairui Song
69b920db5d Update makedumpfile to 1.6.7
Signed-off-by: Kairui Song <kasong@redhat.com>
2020-01-29 08:47:42 +08:00
Kairui Song
cee618593c Add a hook to wait for kdump target in initqueue
The dracut initqueue may quit immediately and won't trigger any hook if
there is no "finished" hook still pending (finished hook will be deleted
once it return 0).

This issue start to appear with latest dracut, latest dracut use
network-manager to configure the network,
network-manager module only install "settled" hook, and we didn't
install any other hook. So NFS/SSH dump will fail. iSCSI dump works
because dracut iscsi module will install a "finished" hook to detect if
the iscsi target is up.

So for NFS/SSH we keep initqueue running until the host successfully get
a valid IP address, which means the network is ready.

Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Pingfan Liu <piliu@redhat.com>
2020-01-29 08:12:45 +08:00
Kairui Song
24b00298d0 Always install sed and awk
sed and awk is heavily used everywhere in the code, but it's not
explicitely installed by kdump dracut module. If the module in dracut
stop installing them (which already happened with latest dracut
upstream), kdump will break.

Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Pingfan Liu <piliu@redhat.com>
2020-01-03 16:35:19 +08:00
Kairui Song
bcdcf35759 Fix potential ssh/nfs kdump failure of missing "ip" command
For ssh/nfs dump, kdump need the 'ip' tool to get the host ip address
for naming the vmcore. But kdump-module-setup.sh never installed this
tool. kdump-module-setup.sh worked so far as dracut network module will
help install it.

After dracut changed to use 35network-manager for network setup, "ip"
command won't be installed in second kernel by default. So need to
ensure "ip" is installed when installing kdump dracut module.

Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Pingfan Liu <piliu@redhat.com>
2020-01-03 16:35:15 +08:00
Kairui Song
b76c567325 kdump-lib.sh: Fix is_nfs_dump_target
Previously is_nfs_dump_target didn't cover the case that 'path <value>'
where <value> points to a nfs mount point. This function is never used
in first kernel before so so far it worked.

Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Pingfan Liu <piliu@redhat.com>
2020-01-03 16:35:12 +08:00
Kairui Song
03111c797b Always use get_save_path to get the 'path' option
This help deduplicate the code. Use a single function instead of
repeat the same logic.

Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Pingfan Liu <piliu@redhat.com>
2020-01-03 16:35:07 +08:00
Kairui Song
d10f202996 kdump-lib: Don't abuse echo, and clean up
Some of echo $(...) code segment is pointless, just call the command
directly, and remove some useless if / return statement.

Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Pingfan Liu <piliu@redhat.com>
2020-01-03 16:34:34 +08:00
Kairui Song
d01c6de338 Release 2.0.20-7
Signed-off-by: Kairui Song <kasong@redhat.com>
2019-12-29 19:36:08 +08:00
Kairui Song
b301d6c4f4 Fix building failure due to makedumpfile's compile flag
From: Pingfan Liu <piliu@redhat.com>

makedumpfile: remove -lebl

-lebl has been removed from elfutils.

Signed-off-by: Kairui Song <kasong@redhat.com>
2019-12-28 03:12:27 +08:00
Kairui Song
68dcfcfb47 mkdumprd: Fix dracut args parsing
Previous commit f13eab6 ('mkdumprd: simplify dracut args parsing')
break dracut arguments parsing for some use case, this should fix it
well.

Passed nfs/local/iscsi/ssh dump test, and with extra dracut_argss.

Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Dave Young <dyoung@redhat.com>
2019-12-27 12:29:43 +08:00
Kairui Song
ee742fcf6b Release 2.0.20-6
Signed-off-by: Kairui Song <kasong@redhat.com>
2019-11-28 17:53:30 +08:00
Kairui Song
d13fd38df4 kdump-error-handler.service: Remove ExecStopPost
Currently "systemctl --fail --no-block default" will be executed on
kdump-error-handler exit due to the config. This may makes systemd try
to isolate to the default target again.

The execute chain will be like:

initrd.target -> kdump.sh (failed) -> kdump-error-handler.service ->
failure_action -> final_action -> ExecStopPost (go to initrd.target again)

Currently, reboot/shutdown/halt is called by either by executing
failure_action or final_action. So the loop will be stopped. However
nont of the reboot/shutdown/halt call is blocking so it might lead to
race issue.

Just drop the ExecStopPost to fix this potential issue.

Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Pingfan Liu <piliu@redhat.com>
Tested-by: HATAYAMA Daisuke <d.hatayama@fujitsu.com>
2019-11-28 17:46:00 +08:00
Kairui Song
f13eab60cb mkdumprd: simplify dracut args parsing
Previously dracut_args is stored as an array in the shell code,
so we can just pass this array as argument to dracut.
But when trying to append new dracut argument, we have to manually
handle the quotes and spaces to ensure the appended argument is
splitted into array elements in the right way.

This is complex and hard to read or maintain. Instead, just store the
dracut_args as a plain string, and let xargs help parse it and call
dracut.

Simply passing $dracut_args or "$dracut_args" will either let dracut
consider the whole string as a single argument, or loss all the
quotes. xargs can handle it well and cover more corner cases.

Eg. one corner case before, if we have:
dracut_args --mount "/dev/sda1 /mnt/test xfs rw"

Kdump will fail, because the function add_dracut_arg() will wrongly
change the arguments into: (Notice the extra space.)
dracut_args --mount " /dev/sda1 /mnt/test xfs rw"

Instead of fixing it just use xargs instead. Tested with above config
and multiple other dracut_args values.

Resolves: bz1700136
Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Dave Young <dyoung@redhat.com>
2019-11-27 23:45:54 +08:00
Kairui Song
5633e83318 Always set vm.zone_reclaim_mode = 3 in kdump kernel
By default kernel have vm.zone_reclaim_mode = 0 and large page
allocation might fail as kernel is very conservative on memory
reclaiming. If the page allocation failure is not handled carefully
it could lead to more serious problems.

This issue can be reproduced by change with following steps:

- Fill up page cache use:
  # dd if=/dev/urandom of=/test bs=1M count=1300

- Now the memory is filled with write cache:
  # free -m
                total        used        free      shared  buff/cache   available
  Mem:           1790         184         132           2        1473        1348
  Swap:          2119           7        2112

- Insert a module which simply calls "kmalloc(SZ_1M, GFP_KERNEL)" for
  512 times: (Notice: vmalloc don't have such problem)
  # insmod debug_module.ko

- Got following allocation failure:
  insmod: page allocation failure: order:8, mode:0x40cc0(GFP_KERNEL|__GFP_COMP), nodemask=(null),cpuset=/,mems_allowed=0

- Clean up and repeat again with vm.zone_reclaim_mode = 3, OOM is not
  observed.

In kdump kernel there is usually only one online CPU and limited memory,
so we set vm.zone_reclaim_mode = 3 to let kernel reclaim memory more
aggresively to avoid such issue.

Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Pingfan Liu <piliu@redhat.com>
2019-11-13 11:35:46 +08:00
Hari Bathini
0a9aabaadd kdumpctl: make reload fail proof
When large amount of memory, about 1TB, is removed with DLPAR memory
remove operation, kdump reload could fail due to race condition with
device tree property update. In such scenario, the subsequent kdump
reload requests would also fail as reload() only proceeds if current
load status is active. Since the possibility of this race condition
couldn't be wished away due to the nature of the scenario, workaround
it by proceeding to load even if current load status is not active as
long as kdump service is active, which kdump udev rules already check
for.

Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Acked-by: Kairui Song <kasong@redhat.com>
2019-11-12 13:22:52 +08:00
Pingfan Liu
d9d4483b7a spec: move binaries from /sbin to /usr/sbin
Before this patch
$rpm -ql kexec-tools | grep sbin
/sbin/kexec
/sbin/makedumpfile
/sbin/mkdumprd
/sbin/vmcore-dmesg

After this patch
$rpm -ql kexec-tools | grep sbin
/usr/sbin/kexec
/usr/sbin/makedumpfile
/usr/sbin/mkdumprd
/usr/sbin/vmcore-dmesg

Signed-off-by: Pingfan Liu <piliu@redhat.com>
Acked-by: Kairui Song <kasong@redhat.com>
2019-11-08 15:23:33 +08:00
Kairui Song
39352d0cfc Don't execute final_action if failure_action terminates the system
If failure_action is shutdown/reboot/halt, final_action is pointless as
the system will be already stopping. And if final_action is different
from failure_action, it will trigger a systemd race problem and cause
unexpected behavior to occur.

So let the error handler stop and exit after performing failure_action
successfully if failure_action is one of shutdown/reboot/halt.
This way, final_action will not be executed.

Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Pingfan Liu <piliu@redhat.com>
2019-11-01 11:21:58 +08:00