Commit Graph

927 Commits

Author SHA1 Message Date
Xunlei Pang
9df0cbbeed kdumpctl: use "apicid" other than "initial apicid"
We met a problem on AMD machines, when using "nr_cpus=4" for
kdump, and crash happens on cpus other than cpu0, kdump kernel
will fail to boot and eventually reset.

After some debugging, we found that it stuck at the kernel path
do_boot_cpu()-> ... ->wakeup_secondary_cpu_via_init():
 apic_icr_write(APIC_INT_LEVELTRIG|APIC_INT_ASSERT|APIC_DM_INIT,
            phys_apicid);
that is, it stuck at sending INIT from AP to BP and reset, which
is actually what "disable_cpu_apicid=X" tries to solve. Printing
the value of @phys_apicid showed that it was the value of "apicid"
other that of "initial apicid" showed by /proc/cpuinfo.

As described in x86 specification:
"In MP systems, the local APIC ID is also used as a processor ID by the
BIOS and the operating system. Some processors permit software to modify
the APIC ID. However, the ability of software to modify the APIC ID is
processor model specific. Because of this, operating system software
should avoid writing to the local APIC ID register. The value returned by
bits 31-24 of the EBX register (when the CPUID instruction is executed with a
source operand value of 1 in the EAX register) is always the Initial APIC ID
(determined by the platform initialization). This is true even if software
has changed the value in the Local APIC ID register."

From kernel commit 151e0c7de("x86, apic, kexec: Add disable_cpu_apicid
kernel parameter"), we can see in generic_processor_info(), it uses
a)read_apic_id() and b)@apicid to compare with @disabled_cpu_apicid.

a)@apicid which is actually @phys_apicid above-mentioned is from the
  following calltrace(on the problematic AMD machine):
    generic_processor_info+0x37/0x300
    acpi_register_lapic+0x30/0x90
    acpi_parse_lapic+0x40/0x50
    acpi_table_parse_entries_array+0x171/0x1de
    acpi_boot_init+0xed/0x50f
  The value of @apicid(from acpi MADT) is equal to the value of "apicid"
  showed by /proc/cpuinfo as proved by our debug printk.
b)read_apic_id() gets the value from LAPIC ID register which is "apicid"
  as well.

While the value of "initial apicid" is from cpuid instruction.

One example of "apicid" and "initial apicid" of cpu0 from /proc/cpuinfo
on AMD machine:
  apicid          : 32
  initial apicid  : 0

Therefore, we should assign /proc/cpuifo "apicid" to "disable_cpu_apicid=X".

We've never met such issue before, because we usually tested "nr_cpus=1",
and mostly on Intel machines, and "apicid" and "initial apicid" have the
same value in most cases on Intel machines.

Signed-off-by: Xunlei Pang <xlpang@redhat.com>
Acked-by: Dave Young <dyoung@redhat.com>
2017-07-19 09:50:04 +08:00
Dave Young
06c776f36b Release 2.0.15-3 2017-07-14 16:07:11 +08:00
Xunlei Pang
a431a7e354 module-setup: fix 99kdumpbase network dependency
I noticed that network is still enabled for local dumping,
like the following kdump boot message on my test machine
using local disk as the dump target:
  tg3.c:v3.137 (May 11, 2014)
  tg3 0000:02:00.0 eth2: Tigon3 [partno(BCM95720) rev
  (PCI Express) MAC address c8:1f:66:c9:35:0d
  tg3 0000:02:00.0 eth2: attached PHY is 5720C

After some debugging, found it due to a misuse in code below:
  if [ is_generic_fence_kdump -o is_pcs_fence_kdump ]; then
      _dep="$_dep network"
  fi
The "if" condition always results in "true", and should be
changed as follows:
  if is_generic_fence_kdump -o is_pcs_fence_kdump; then
      _dep="$_dep network"
  fi

After this, network won't be involved in non-network dumping,
as for dumpings require network such as nfs/ssh/iscsi/fcoe/etc,
dracut will add network accordingly. And kdump initramfs size
can be reduced from 24MB to 17MB tested on some real hardware,
and from 19MB to 14MB on my kvm. Moreover, it could avoid the
network (driver) initialization thereby saving us more memory.

Signed-off-by: Xunlei Pang <xlpang@redhat.com>
Acked-by: Dave Young <dyoung@redhat.com>
2017-07-14 14:54:59 +08:00
Xunlei Pang
fb522e972c mkdumprd: omit dracut modules in case of network dumping
In case of only network target, we can clearly and safely
remove more unnecessary modules to reduce initramfs size,
and to enhance stability.

Signed-off-by: Xunlei Pang <xlpang@redhat.com>
Acked-by: Dave Young <dyoung@redhat.com>
2017-07-14 14:54:48 +08:00
Xunlei Pang
821d1af080 mkdumprd: omit dracut modules in case of no dm target
In case of on dm related target, we can clearly and safely
remove many unnecessary modules to reduce initramfs size,
and to enhance stability.

Signed-off-by: Xunlei Pang <xlpang@redhat.com>
Acked-by: Dave Young <dyoung@redhat.com>
2017-07-14 14:54:40 +08:00
Xunlei Pang
54a5bcc4ee mkdumprd: omit crypt when there is no crypt kdump target
Resolves: bz1451717
https://bugzilla.redhat.com/1451717

When there is no crypt related kdump target, we can safely
omit "crypt" dracut module, this can avoid the pop asking
disk password during kdump boot in some cases.

This patch introduces omit_dracut_modules() before calling
dracut, we can omit more modules to reduce initrd size in
the future.

We don't want to omit any module for fadump, thus we move
is_fadump_capable() into kdump-lib.sh as a helper to use.

Signed-off-by: Xunlei Pang <xlpang@redhat.com>
Acked-by: Dave Young <dyoung@redhat.com>
2017-07-14 14:54:31 +08:00
Xunlei Pang
cb38b32dfc kdumpctl: use generated rd.lvm.lv=X
Resolves: bz1451717
https://bugzilla.redhat.com/1451717

When there is any "rd.lvm.lv=X", "lvm" dracut module will
try to recognize all the lvm volumes which is unnecessary
and probably cause trouble for us.

See https://bugzilla.redhat.com/show_bug.cgi?id=1451717#c2

Remove all the rd.lvm.lv=X inherited from the kernel cmdline,
and generate the corresponding cmdline as needed for kdump.
Because prepare_cmdline() is only used by kdump, we don't need
to add any fadump judgement(also remove the existing judgement
in passing).

Currently, we don't handle "rd.lvm.vg=X", we can add it in
when there is some bug reported in the future.

Signed-off-by: Xunlei Pang <xlpang@redhat.com>
Acked-by: Dave Young <dyoung@redhat.com>
2017-07-14 14:54:23 +08:00
Xunlei Pang
9e7b64dc90 mkdumprd: change for_each_block_target() to use get_kdump_targets()
Resolves: bz1451717
https://bugzilla.redhat.com/1451717

Now that we have get_kdump_targets(), use it to simplify the code.

Signed-off-by: Xunlei Pang <xlpang@redhat.com>
Acked-by: Dave Young <dyoung@redhat.com>
2017-07-14 14:54:15 +08:00
Xunlei Pang
3ee00cd384 kdump-lib.sh: introduce get_kdump_targets()
Resolves: bz1451717
https://bugzilla.redhat.com/1451717

We need to know all the kdump targets including the dump
target and root in case of "dump_to_rootfs".

This is useful for us to do some extra work related to the
type of different targets.

Signed-off-by: Xunlei Pang <xlpang@redhat.com>
Acked-by: Dave Young <dyoung@redhat.com>
2017-07-14 14:54:06 +08:00
Xunlei Pang
1bc78e025f kdump-lib.sh: fix improper get_block_dump_target()
Resolves: bz1451717
https://bugzilla.redhat.com/1451717

This patch improves get_block_dump_target as follows:
-Consider block device in the special "--dracut-args --mount ..."
 in get_user_configured_dump_disk().
-Consider save path instead of root fs in get_block_dump_target(),
 and move it into kdump-lib.sh because we will have another user
 in the following patch.
-For nfs/ssh dumping, there is no need to check the root device.
-Move get_save_path into kdump-lib.sh.

After this patch, get_block_dump_target() can always return the
correct block dump target specified.

Signed-off-by: Xunlei Pang <xlpang@redhat.com>
Acked-by: Dave Young <dyoung@redhat.com>
2017-07-14 14:53:43 +08:00
Ziyue Yang
0933f89f65 kdumpctl: fix infinite loop caused by running under bash
Description of problem
(https://bugzilla.redhat.com/show_bug.cgi?id=1465735):
Run `kdumpctl status` as normal user, get below error messages:

Another app is currently holding the kdump lock; waiting for it to exit...
flock: 9: Bad file descriptor
Another app is currently holding the kdump lock; waiting for it to exit...
flock: 9: Bad file descriptor
...

The bug is caused by behavior difference between bash
and sh (bash in posix).

In the function single_instance_lock in kdumpctl script,
there is

exec 9>/var/lock/kdump

which will fail in user mode. However, this fail will cause
script exiting under bash but not exiting under sh, causing
infinite loop because the flock will always fail.

According to the 16th item in
ftp://ftp.gnu.org/old-gnu/Manuals/bash-2.02/html_node/bashref_66.html

If a POSIX.2 special builtin returns an error status, a non-
interactive shell exits.

And according to
https://www.gnu.org/software/bash/manual/html_node/Special-Builtins.html

exec is one of the POSIX.2 special builtin's.

This patch fixes the bug by checking exec return value.

Fixes: 9fb2996d05 ("kdumpctl: change the shebang header to use /bin/bash")
Signed-off-by: Ziyue Yang <ziyang@redhat.com>
Acked-by: Dave Young <dyoung@redhat.com>
Reviewed-by: Xunlei Pang <xlpang@redhat.com>
2017-07-14 14:47:37 +08:00
Dave Young
151e0b5345 Release 2.0.15-2 2017-06-28 14:36:01 +08:00
Pratyush Anand
7845cdd89c aarch64: Add makedumpfile executable
Add makedumpfile executable for aarch64 as well.

Signed-off-by: Pratyush Anand <panand@redhat.com>
Acked-by: Dave Young <dyoung@redhat.com>
2017-06-28 14:35:42 +08:00
Dave Young
ff28c8fe08 Release kexec-tools 2.0.15-1 2017-06-23 11:33:25 +08:00
Dave Young
d6dfe2cc1e Release 2.0.14-13 2017-06-15 09:43:11 +08:00
Bhupesh Sharma
f513f035f3 kexec-tools.spec: Fix whitespace errors
This patch fixes the whitespace errors reported by
'rpmlint' or 'fedpkg lint' when they are run on kexec-tools srpm:

kexec-tools.spec:242: W: mixed-use-of-spaces-and-tabs (spaces: line 107,
tab: line 242)

Signed-off-by: Bhupesh Sharma <bhsharma@redhat.com>
Acked-by: Dave Young <dyoung@redhat.com>
2017-06-15 09:41:33 +08:00
Benjamin Berg
f6303a2a93 dracut-module-setup: Fix test for inclusion of DRM modules
The /sys/modules/*/drivers sysfs entries do not exist anymore on newer
kernels which means that the DRM moduels would never be included.
Instead check if there is any device with a "drm" sysfs directory to
decide on whether DRM modules need to be included.

Acked-by: Dave Young <dyoung@redhat.com>
2017-06-15 09:40:03 +08:00
Pingfan Liu
dcbab02752 kdump.conf.5: clarify the fence_kdump_nodes option
fence_kdump_nodes should include list of cluster node(s) except localhost.

Signed-off-by: Pingfan Liu <piliu@redhat.com>
Reviewed-by: Xunlei Pang <xlpang@redhat.com>
Acked-by: Dave Young <dyoung@redhat.com>
2017-06-09 09:47:22 +08:00
Dave Young
70a9f58336 Release 2.0.14-12 2017-05-18 16:50:48 +08:00
Pingfan Liu
1ca4c9f009 kdumpctl: for fence_kdump, the ipaddr of this node should be excluded from list
kdump should not send fence_kdump notifications to local host, because
the role of the falied node (i.e local host) is to send fence_kdump
notifications to other nodes to tell them I'm kdumping, tell to itself is
nonsense. And we have excluded hostname of local host but when one use ip
address we also need exclude it.

Signed-off-by: Pingfan Liu <piliu@redhat.com>
Reviewed-by: Xunlei Pang <xlpang@redhat.com>
Acked-by: Dave Young <dyoung@redhat.com>
2017-05-18 16:44:42 +08:00
Dave Young
b74479521c Release 2.0.14-11 2017-05-12 11:00:23 +08:00
Xunlei Pang
9fb2996d05 kdumpctl: change the shebang header to use /bin/bash
We met one issue that when changing softlink of "/usr/bin/sh"
to point to "ksh" instead of the default "bash", kdumpctl will
not work and go wrong.

kdumpctl is expected to run under bash like dracut, we should
change its shebang header from "#!/bin/sh" to "#!/bin/bash".

Signed-off-by: Xunlei Pang <xlpang@redhat.com>
Acked-by: Dave Young <dyoung@redhat.com>
2017-05-12 09:58:51 +08:00
Xunlei Pang
2b4b7a6374 kdumpctl: call strip_comments only when necessary to speedup
The "time kdumpctl start" command shows that strip_comments()
consumes lots of cpu time. By only calling it when necessary,
it saves us nearly half second.

Tested on my Fedora kvm machine.
Before this patch:
$ time kdumpctl start
kexec: loaded kdump kernel
Starting kdump: [OK]

real	0m1.849s
user	0m1.497s
sys	0m0.462s

After this patch:
$ time kdumpctl start
kexec: loaded kdump kernel
Starting kdump: [OK]

real	0m1.344s
user	0m1.195s
sys	0m0.195s

Signed-off-by: Xunlei Pang <xlpang@redhat.com>
Acked-by: Dave Young <dyoung@redhat.com>
2017-05-12 09:57:29 +08:00
Xunlei Pang
7f725ef13a Revert "kdumpctl: improve "while read" time for /etc/kdump.conf"
Resolves: bz1449801

"cat $KDUMP_CONFIG_FILE|grep -v "^#"|while read ..." use pipes
to invoke subshells, as a result we met "the dreaded inaccessible
variables within a subshell problem" as described in the book
"Advanced Bash-Scripting Guide".

It cause regressions, so revert it.

Signed-off-by: Xunlei Pang <xlpang@redhat.com>
Acked-by: Dave Young <dyoung@redhat.com>
2017-05-12 09:57:29 +08:00
Dave Young
c3ab408727 Release 2.0.14-10
Speedup kdump service startup
2017-05-05 16:16:25 +08:00
Xunlei Pang
38c0b4aa3a kdumpctl: improve "while read" time for /etc/kdump.conf
I found using "cat $KDUMP_CONFIG_FILE|grep -v "^#"|while read ..."
instead of "while read ... do ...; done < $KDUMP_CONFIG_FILE" will
make the script run faster, it saves us nearly half second.

Signed-off-by: Xunlei Pang <xlpang@redhat.com>
Acked-by: Pratyush Anand <panand@redhat.com>
Acked-by: Dave Young <dyoung@redhat.com>
2017-05-05 16:14:33 +08:00
Xunlei Pang
88a3385e96 kdumpctl: update check_dump_fs_modified() to use "lsinitrd -f"
We use faster "lsinitrd XXX -f usr/lib/dracut/build-parameter.txt"
instead of "lsinitrd XXX | grep "^Arguments:" | head -1", this can
save us around one second.

Signed-off-by: Xunlei Pang <xlpang@redhat.com>
Acked-by: Pratyush Anand <panand@redhat.com>
Acked-by: Dave Young <dyoung@redhat.com>
2017-05-05 16:14:33 +08:00
Xunlei Pang
2a5f362521 kdumpctl: improve check_wdt_modified()
Use the logic of dracut 04watchdog/module-setup.sh to check,
then we only need to compare the content of 00-watchdog.conf,
so we can save one operation of lsinitrd.

Signed-off-by: Xunlei Pang <xlpang@redhat.com>
Acked-by: Pratyush Anand <panand@redhat.com>
Acked-by: Dave Young <dyoung@redhat.com>
2017-05-05 16:14:33 +08:00
Xunlei Pang
c63c0a1084 kdumpctl: remove is_mode_switched()
handle_mode_switch() can ensure the correct logic, so remove
the needless is_mode_switched(). This helps to save one slow
lsinitrd operation for each boot.

Improved backup_default_initrd() to judge DEFAULT_INITRD.

Signed-off-by: Xunlei Pang <xlpang@redhat.com>
Acked-by: Pratyush Anand <panand@redhat.com>
Acked-by: Dave Young <dyoung@redhat.com>
2017-05-05 16:14:33 +08:00
Xunlei Pang
071ea2a277 kdumpctl: bail out earlier in case of no reserved memory
Some cloud people complained that VM boot speed is slower than
Ubuntu and other distributions, after some debugging, we found
that one of the causes is kdump service starts too slow(7seconds
according to the test result on the test VM), actually there is
no "crashkernel=X" specified. Although kdump service is parallel,
it affects the speed more or less especially on VMs with few cpus,
which is unacceptable. It is even worse when kdump initramfs is
built out in case of no reserved memory at first boot.

Commit afa4a35d3 ("kdumpctrl: kdump feasibility should fail if no
crash memory") can actually solve this issue.

This patch is a supplement of above-mentioned commit, we bail out
start() even earlier in case of no reserved memory.

Also made some cosmatic changes for check_crash_mem_reserved().

1) Before this patch
$ time kdumpctl start
No memory reserved for crash kernel.
Starting kdump: [FAILED]

real    0m0.282s
user    0m0.184s
sys     0m0.146s

2) After this patch
$ time kdumpctl start
No memory reserved for crash kernel
Starting kdump: [FAILED]

real    0m0.010s
user    0m0.008s
sys     0m0.001s

Signed-off-by: Xunlei Pang <xlpang@redhat.com>
Acked-by: Pratyush Anand <panand@redhat.com>
Acked-by: Dave Young <dyoung@redhat.com>
2017-05-02 17:23:57 +08:00
Dave Young
841ea5be0b Release 2.0.14-9 2017-04-27 14:01:45 +08:00
Bhupesh Sharma
a284fa9005 kdump: Introduce 'force_no_rebuild' option
This patch introduces the 'force_no_rebuild' option
inside the 'kdump.conf' and its handling inside the 'kdumpctl'
script.

There might be several use cases, where a system admin
decides that he doesn't need to rebuild the kdump initrd
and wants to use an existing version of the same. In such cases,
he can set the 'force_no_rebuild' option inside 'kdump.conf'
to 1, to force the 'kdumpctl' script not to rebuild the kdump
initrd.

Signed-off-by: Bhupesh Sharma <bhsharma@redhat.com>
Acked-by: Xunlei Pang <xlpang@redhat.com>
Acked-by: Dave Young <dyoung@redhat.com>
2017-04-27 13:59:49 +08:00
Pingfan Liu
0165cfa332 kdump-lib-initramfs.sh: ignore the failure of echo
The kdump-capture.service will fail, if the following conds are meet up.
-1. boot up a VM with the following cmd:
qemu-kvm   -name 'avocado-vt-vm1'  -sandbox off    -machine pc   -nodefaults  -vga cirrus \
    -drive id=drive_image1,if=none,snapshot=off,aio=native,cache=none,format=qcow2,file=$guest_img \
    -device virtio-blk-pci,id=image1,drive=drive_image1,bootindex=0,bus=pci.0,addr=04 \
    -device virtio-net-pci,mac=9a:4d:4e:4f:50:51,id=id3DveCw,vectors=4,netdev=idgW5YRp,bus=pci.0,addr=05 \
    -netdev tap,id=idgW5YRp \
    -m 2048  \
    -smp 4,maxcpus=4,cores=2,threads=1,sockets=2  \
    -cpu 'SandyBridge',+kvm_pv_unhalt \
    -vnc :0  \
    -rtc base=utc,clock=host,driftfix=slew  \
    -boot order=cdn,once=c,menu=off,strict=off \
    -enable-kvm \
    -monitor stdio \
    -qmp tcp:localhost:4444,server,nowait
-2. in kernel cmdline with the following options: console=tty0 console=ttyS0,

Because the  "-nodefaults" option in qemu cmd excludes the emulation of serial port, the ttyS0 will
have no real backend device. We can observe such issue in 1st kernel by:
	echo teststring > /dev/console or
	echo teststring > /dev/ttyS0,
It gets the error "-bash: echo: write error: Input/output error".
Such conds cause small issue in 1st kernel, but it is a big problem for kdump-capture and emergency
service.

This patch aims to work aroundthe issue in kdump-capture service:
dump_fs() return value will affect the following code in dracut-kdump.sh
	DUMP_RETVAL=$?    <---
	do_kdump_post $DUMP_RETVAL
	if [ $? -ne 0 ]; then
	    echo "kdump: kdump_post script exited with non-zero status!"
	fi

Although kdump-capture saves the vmcore successfully, but it exit 1 and
fall on emergency service.

Signed-off-by: Pingfan Liu <piliu@redhat.com>
Reviewed-by: Xunlei Pang <xlpang@redhat.com>
Acked-by: Dave Young <dyoung@redhat.com>
2017-04-27 13:59:49 +08:00
Baoquan He
c3602e32a2 kdump.sysconfig/x86_64: Add nokaslr to kdump kernel cmdline
KASLR is to enhance security on OS kernel. While kdump kernel is
working after normal kernel corrupted. There's no need to do kaslr
in kdump kernel, so add 'nokaslr' to disable kaslr.

Signed-off-by: Baoquan He <bhe@redhat.com>
Acked-by: Dave Young <dyoung@redhat.com>
2017-04-27 13:59:49 +08:00
Dave Young
414b2a7dbb Release 2.0.14-8 2017-04-11 16:33:38 +08:00
Xunlei Pang
3b311653f2 Revert "kdumpctl: filter 'root' kernel parameter when running in live images"
This reverts commit 892bea7aa

We already eliminated the root filesystem by removing "root=X"
in case of non-root dumping, and for livecd it must be non-root
dumping according to "live-image-kdump-howto.txt".

So it's time to revert this commit.

Also update "live-image-kdump-howto.txt", make sure users do not
configure "default dump_to_rootfs".

Signed-off-by: Xunlei Pang <xlpang@redhat.com>
Acked-by: Pratyush Anand <panand@redhat.com>
Acked-by:Dave Young <dyoung@redhat.com>
2017-04-11 16:03:12 +08:00
Xunlei Pang
b40c1f96cf kdumpctl: remove "root=X" for kdump boot
Since the current dracut of Fedora already supports not always
mounting root device, we can remove "root=X" from the command
line directly, and always get the dump target specified in
"/etc/kdump.conf" and mount it. If the dump target is located
at root filesystem, we will add the root mount info explicitly
from kdump side instead of from dracut side.

For example, in case of nfs/ssh/usb/raw/etc(non-root) dumping,
kdump will not mount the unnecessary root fs after this change.

This patch removes "root=X" via the "KDUMP_COMMANDLINE_REMOVE"
(if "default dump_to_rootfs" is specified, don't remove "root=X"),
and mounts non-root target under "/kdumproot", the root target
still under "/sysroot"(to be align with systemd sysroot.mount).

After removing "root=X", we now add root fs mount information
explicitly from the kdump side.

Changed check_dump_fs_modified() a little to avoid rebuild when
dump target is root, since we add root fs mount explicitly now.

Signed-off-by: Xunlei Pang <xlpang@redhat.com>
Acked-by: Pratyush Anand <panand@redhat.com>
Acked-by:Dave Young <dyoung@redhat.com>
2017-04-11 16:02:12 +08:00
Xunlei Pang
bf5b3da107 kdumpctl: fix a bug in remove_cmdline_param()
For the following scripts:
  cmdline="root=/dev/mapper/fedora-root rd.lvm.lv=fedora/root rw"
  remove_cmdline_param $cmdline "root"

  cmdline="root=nfs4:192.168.122.9:/ ip=ens3:dhcp rw"
  remove_cmdline_param $cmdline "root"

The current implementation will get the wrong results:
  "rd.lvm.lv=fedora/ rw"
  ":/ ip=ens3:dhcp rw"

After this patch we can get the correct results:
  "rd.lvm.lv=fedora/root rw"
  "ip=ens3:dhcp rw"

Signed-off-by: Xunlei Pang <xlpang@redhat.com>
Acked-by: Pratyush Anand <panand@redhat.com>
Acked-by:Dave Young <dyoung@redhat.com>
2017-04-11 16:01:50 +08:00
Pratyush Anand
21dcf7e3b1 kdumpctl: fix status check when CONFIG_CRASH_DUMP is not enabled in kernel
When kexec_crash_loaded does not exist, means kdump was not enabled in
kernel we get

$ kdumpctl status
cat: /sys/kernel/kexec_crash_loaded: No such file or directory
/usr/bin/kdumpctl: line 879: [: ==: unary operator expected
Kdump is not operational

After this patch:
$ kdumpctl status
Perhaps CONFIG_CRASH_DUMP is not enabled in kernel
Kdump is not operational

Signed-off-by: Pratyush Anand <panand@redhat.com>
Reviewed-by: Bhupesh Sharma <bhsharma@redhat.com>
Reviewed-by: Xunlei Pang <xlpang@redhat.com>
Acked-by: Dave Young <dyoung@redhat.com>
2017-04-11 16:01:25 +08:00
Dave Young
961bda9152 Release 2.0.14-7 2017-03-31 11:57:36 +08:00
Xunlei Pang
5c87d73cf3 kdump-emergency: fix "Transaction is destructive" emergency failure
We met a problem that the kdump emergency service failed to
start when the target dump timeout(we passed "rd.timeout=30"
to kdump), it reported "Transaction is destructive" messages:

  [ TIME ] Timed out waiting for device dev-mapper-fedora\x2droot.device.
  [DEPEND] Dependency failed for Initrd Root Device.
  [ SKIP ] Ordering cycle found, skipping System Initialization
  [DEPEND] Dependency failed for /sysroot.
  [DEPEND] Dependency failed for Initrd Root File System.
  [DEPEND] Dependency failed for Reload Configuration from the Real Root.
  [ SKIP ] Ordering cycle found, skipping System Initialization
  [ SKIP ] Ordering cycle found, skipping Initrd Default Target
  [DEPEND] Dependency failed for File System Check on /dev/mapper/fedora-root.
  [  OK  ] Reached target Initrd File Systems.
  [  OK  ] Stopped dracut pre-udev hook.
  [  OK  ] Stopped dracut cmdline hook.
           Starting Setup Virtual Console...
           Starting Kdump Emergency...
  [  OK  ] Reached target Initrd Default Target.
  [  OK  ] Stopped dracut initqueue hook.
  Failed to start kdump-error-handler.service: Transaction is destructive.
  See system logs and 'systemctl status kdump-error-handler.service' for details.
  [FAILED] Failed to start Kdump Emergency.
  See 'systemctl status emergency.service' for details.
  [DEPEND] Dependency failed for Emergency Mode.

This is because in case of root failure, initrd-root-fs.target
will trigger systemd emergency target which requires the systemd
emergency service actually is kdump-emergency.service, then our
kdump-emergency.service starts kdump-error-handler.service with
"systemctl isolate"(see 99kdumpbase/kdump-emergency.service, we
replace systemd's with this one under kdump).

This will lead to systemd two contradictable jobs queued as an
atomic transaction:
job 1) the emergency service gets started by initrd-root-fs.target
job 2) the emergency service gets stopped due to "systemctl isolate"
thereby throwing "Transaction is destructive".

In order to solve it, we can utilize "IgnoreOnIsolate=yes" for both
kdump-emergency.service and kdump-emergency.target. Unit with attribute
"IgnoreOnIsolate=yes" won't be stopped when isolating another unit,
they can keep going as expected in case be triggered by any failure.

We add kdump-emergency.target dedicated to kdump the similar way
as did for kdump-emergency.service(i.e. will replace systemd's
emergency.target with kdump-emergency.target under kdump), and
adds "IgnoreOnIsolate=yes" into both of them.

Signed-off-by: Xunlei Pang <xlpang@redhat.com>
Acked-by: Dave Young <dyoung@redhat.com>
Acked-by: Pratyush Anand <panand@redhat.com>
[bhe: improve the patch log about IgnoreOnIsolate="]
2017-03-31 11:54:30 +08:00
Xunlei Pang
ae45e6f1bb mkdumprd: reduce lvm2 memory under kdump
We replace "reserved_memory = XXXX"(default value is 8192) with
"reserved_memory = 1024" in /etc/lvm/lvm.conf used by "lvm2", it
can save 7MB peak memory consumption, so lower the possibility of
OOM under kdump.

For kdump, we don't have too many lvm targets, lvm2 locates in the
RAM(rootfs), so don't need that much memory, as discussed with lvm
people, they agreed that we use 1MB under kdump as long as there
are not that many lvm targets invloved.

We modify /etc/lvm/lvm.conf when "99kdumpbase" install() is executed,
because it is parsed after "90lvm" by dracut.

We add the code unconditionally with &>/dev/null to ignore errors, it
doesn't matter in case of "lvm" not included(i.e. there is no lvm.conf).

Signed-off-by: Xunlei Pang <xlpang@redhat.com>
Acked-by: Dave Young <dyoung@redhat.com>
2017-03-31 11:53:57 +08:00
Dave Young
00ed76e7e2 Release 2.0.14-6 2017-03-17 10:21:49 +08:00
Pratyush Anand
86a81de3e5 Fix makedumpfile --mem-usage /proc/kcore
Patches have been taken from kexec-tools and makedumpfile to fix issue
with `makedumpfile --mem-usage /proc/kcore`.

Two of the patches is from kexec-tools and rest are from makedumpfile.
All the patches have been acked upstream and applies without conflict.

Kexec-tools patches:
(kexec-tools-2.0.14-x86-x86_64-Fix-format-warning-with-die.patch), which
fixes koji build issue.

kexec-tools-2.0.14-build_mem_phdrs-check-if-p_paddr-is-invalid.patch fixes
the regresssion caused by kernel /proc/kcore fix to use -1 as default value
of p_paddr for pt_loads. Without his patch kexec -p will fail with latest
kernel.

Other makedumpfile patches are backported to support --mem-usage while
kernel kaslr being enabled. Details please see the patch log of the individual
patches.

All the patches are backport of upstream commits.

Patches has been tested with kernel 4.11.0-0.rc1.git0.1.fc26.x86_64.

    # makedumpfile --mem-usage /proc/kcore -f
    The kernel version is not supported.
    The makedumpfile operation may be incomplete.

    TYPE            PAGES                   EXCLUDABLE      DESCRIPTION
    ----------------------------------------------------------------------
    ZERO            1960                    yes             Pages filled
    with zero
    NON_PRI_CACHE   22850                   yes             Cache pages
    without private flag
    PRI_CACHE       1517                    yes             Cache pages with
    private flag
    USER            32522                   yes             User process
    pages
    FREE            1898981                 yes             Free pages
    KERN_DATA       78721                   no              Dumpable kernel
    data

    page size:              4096
    Total pages on system:  2036551
    Total size on system:   8341712896       Byte

We won't need to pass -f once fedora kernel is rebased with v4.12.

Signed-off-by: Pratyush Anand <panand@redhat.com>
Acked-by: Dave Young <dyoung@redhat.com>
2017-03-17 10:14:17 +08:00
Dave Young
bbe10baeb2 Release 2.0.14-5 2017-03-09 16:59:31 +08:00
Pingfan Liu
14ec322b4b kdump-lib.sh: fix incorrect usage with pipe as input for grep -q in is_pcs_fence_kdump()
For -q option, as man grep says:  Exit immediately with zero status if
any match is found, even if an error was detected.
So when matching, the read side of pipe is closed by "grep -q", while
the write side still try to write more data, which cause SIGPIPE to the
process, and the shell can not exit with 0. It depends on the kernel's
implementation of pipe to decide how much data written by the producer
can trigger the bug.

Bash test script:
  #!/bin/sh
  set -o pipefail
  dd if=/dev/zero of=text.file bs=1M count=1
  sed -i '1s/^/keyword /' text.file
  cat text.file | grep -q keyword
  echo $?

Notice the "set -o pipefail" is set by dracut, so
mkdumprd -> dracut -> dracut-module-setup.sh -> is_pcs_fence_kdump()
trigger the bug.

Signed-off-by: Pingfan Liu <piliu@redhat.com>
Acked-by: Dave Young <dyoung@redhat.com>
Acked-by: Pratyush Anand <panand@redhat.com>
2017-03-08 13:07:20 +08:00
Pingfan Liu
ad404977eb Document: fix incorrect link in fadump-how.txt
The file fadump-howto.txt has an incorrect link for further information
about sysrq usage. Fix it.

Signed-off-by: Pingfan Liu <piliu@redhat.com>
Acked-by: Dave Young <dyoung@redhat.com>
2017-03-08 13:07:20 +08:00
Dave Young
00bd88cd58 Release 2.0.14-4 2017-01-23 16:00:59 +08:00
Tong Li
1c27a3d827 drop kdump script rhcrashkernel-param in kexec-tools repo
Resolves: bz1399436

Since currently crashkernel= will be handled in kdump anaconda addon
we can safely remove rhcrashkernel-param callback.

Signed-off-by: Tong Li <tonli@redhat.com>
Acked-by: Dave Young <dyoung@redhat.com>
2017-01-23 15:59:47 +08:00
Xunlei Pang
2040103bd7 kdumpctl: sanity check of nr_cpus for x86_64 in case running out of vectors
Check the number of cpus for x86_64 kdump kernel to boot with.
We met an issue on x86_64: kdump runs out of vectors with the
default "nr_cpus=1", when requesting tons of irqs.

This patch detects such situation and warns users about the risk.

The total number of vectors percpu is 256 defined by x86 architecture.
The available vectors can be allocated to io devices percpu starts
from FIRST_EXTERNAL_VECTOR(see kernel code), and some high-numbered
ones are consumed by some system interrupts. As a result, the vectors
for io device are within [FIRST_EXTERNAL_VECTOR, FIRST_SYSTEM_VECTOR),
with one known exception, 0x80 within the range is reserved specially
as the syscall vector.

FIRST_EXTERNAL_VECTOR is invariably 32, while FIRST_SYSTEM_VECTOR can
vary between different kernel versions. E.g. FIRST_SYSTEM_VECTOR gets
0xef(with CONFIG_X86_LOCAL_APIC on)for linux-4.10, that is 17 vectors
reserved, considering it may increase in the future and the special
vectors, we use a flexible variance and assume there are 32 reserved
from FIRST_EXTERNAL_VECTOR. Then the max vectors for device interrupts
percpu is: (256-32)-32=192, we acquire the number N of device interrupts
from /proc/irq/, then the number of minimal cpus required is calculated:
(N + 192 - 1) / 192

Signed-off-by: Xunlei Pang <xlpang@redhat.com>
Acked-by: Pratyush Anand <panand@redhat.com>
Acked-by: Dave Young <dyoung@redhat.com>
2017-01-23 15:52:24 +08:00