Commit Graph

1112 Commits

Author SHA1 Message Date
Kairui Song
b301d6c4f4 Fix building failure due to makedumpfile's compile flag
From: Pingfan Liu <piliu@redhat.com>

makedumpfile: remove -lebl

-lebl has been removed from elfutils.

Signed-off-by: Kairui Song <kasong@redhat.com>
2019-12-28 03:12:27 +08:00
Kairui Song
68dcfcfb47 mkdumprd: Fix dracut args parsing
Previous commit f13eab6 ('mkdumprd: simplify dracut args parsing')
break dracut arguments parsing for some use case, this should fix it
well.

Passed nfs/local/iscsi/ssh dump test, and with extra dracut_argss.

Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Dave Young <dyoung@redhat.com>
2019-12-27 12:29:43 +08:00
Kairui Song
ee742fcf6b Release 2.0.20-6
Signed-off-by: Kairui Song <kasong@redhat.com>
2019-11-28 17:53:30 +08:00
Kairui Song
d13fd38df4 kdump-error-handler.service: Remove ExecStopPost
Currently "systemctl --fail --no-block default" will be executed on
kdump-error-handler exit due to the config. This may makes systemd try
to isolate to the default target again.

The execute chain will be like:

initrd.target -> kdump.sh (failed) -> kdump-error-handler.service ->
failure_action -> final_action -> ExecStopPost (go to initrd.target again)

Currently, reboot/shutdown/halt is called by either by executing
failure_action or final_action. So the loop will be stopped. However
nont of the reboot/shutdown/halt call is blocking so it might lead to
race issue.

Just drop the ExecStopPost to fix this potential issue.

Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Pingfan Liu <piliu@redhat.com>
Tested-by: HATAYAMA Daisuke <d.hatayama@fujitsu.com>
2019-11-28 17:46:00 +08:00
Kairui Song
f13eab60cb mkdumprd: simplify dracut args parsing
Previously dracut_args is stored as an array in the shell code,
so we can just pass this array as argument to dracut.
But when trying to append new dracut argument, we have to manually
handle the quotes and spaces to ensure the appended argument is
splitted into array elements in the right way.

This is complex and hard to read or maintain. Instead, just store the
dracut_args as a plain string, and let xargs help parse it and call
dracut.

Simply passing $dracut_args or "$dracut_args" will either let dracut
consider the whole string as a single argument, or loss all the
quotes. xargs can handle it well and cover more corner cases.

Eg. one corner case before, if we have:
dracut_args --mount "/dev/sda1 /mnt/test xfs rw"

Kdump will fail, because the function add_dracut_arg() will wrongly
change the arguments into: (Notice the extra space.)
dracut_args --mount " /dev/sda1 /mnt/test xfs rw"

Instead of fixing it just use xargs instead. Tested with above config
and multiple other dracut_args values.

Resolves: bz1700136
Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Dave Young <dyoung@redhat.com>
2019-11-27 23:45:54 +08:00
Kairui Song
5633e83318 Always set vm.zone_reclaim_mode = 3 in kdump kernel
By default kernel have vm.zone_reclaim_mode = 0 and large page
allocation might fail as kernel is very conservative on memory
reclaiming. If the page allocation failure is not handled carefully
it could lead to more serious problems.

This issue can be reproduced by change with following steps:

- Fill up page cache use:
  # dd if=/dev/urandom of=/test bs=1M count=1300

- Now the memory is filled with write cache:
  # free -m
                total        used        free      shared  buff/cache   available
  Mem:           1790         184         132           2        1473        1348
  Swap:          2119           7        2112

- Insert a module which simply calls "kmalloc(SZ_1M, GFP_KERNEL)" for
  512 times: (Notice: vmalloc don't have such problem)
  # insmod debug_module.ko

- Got following allocation failure:
  insmod: page allocation failure: order:8, mode:0x40cc0(GFP_KERNEL|__GFP_COMP), nodemask=(null),cpuset=/,mems_allowed=0

- Clean up and repeat again with vm.zone_reclaim_mode = 3, OOM is not
  observed.

In kdump kernel there is usually only one online CPU and limited memory,
so we set vm.zone_reclaim_mode = 3 to let kernel reclaim memory more
aggresively to avoid such issue.

Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Pingfan Liu <piliu@redhat.com>
2019-11-13 11:35:46 +08:00
Hari Bathini
0a9aabaadd kdumpctl: make reload fail proof
When large amount of memory, about 1TB, is removed with DLPAR memory
remove operation, kdump reload could fail due to race condition with
device tree property update. In such scenario, the subsequent kdump
reload requests would also fail as reload() only proceeds if current
load status is active. Since the possibility of this race condition
couldn't be wished away due to the nature of the scenario, workaround
it by proceeding to load even if current load status is not active as
long as kdump service is active, which kdump udev rules already check
for.

Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Acked-by: Kairui Song <kasong@redhat.com>
2019-11-12 13:22:52 +08:00
Pingfan Liu
d9d4483b7a spec: move binaries from /sbin to /usr/sbin
Before this patch
$rpm -ql kexec-tools | grep sbin
/sbin/kexec
/sbin/makedumpfile
/sbin/mkdumprd
/sbin/vmcore-dmesg

After this patch
$rpm -ql kexec-tools | grep sbin
/usr/sbin/kexec
/usr/sbin/makedumpfile
/usr/sbin/mkdumprd
/usr/sbin/vmcore-dmesg

Signed-off-by: Pingfan Liu <piliu@redhat.com>
Acked-by: Kairui Song <kasong@redhat.com>
2019-11-08 15:23:33 +08:00
Kairui Song
39352d0cfc Don't execute final_action if failure_action terminates the system
If failure_action is shutdown/reboot/halt, final_action is pointless as
the system will be already stopping. And if final_action is different
from failure_action, it will trigger a systemd race problem and cause
unexpected behavior to occur.

So let the error handler stop and exit after performing failure_action
successfully if failure_action is one of shutdown/reboot/halt.
This way, final_action will not be executed.

Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Pingfan Liu <piliu@redhat.com>
2019-11-01 11:21:58 +08:00
Kairui Song
5e76e53a70 module-setup.sh: Simplify the network setup code
Merge kdump_setup_netdev into kdump_install_net.

kdump_install_net is a wrapper of calling kdump_setup_netdev, and
it do following three extra things:

  1. Sanitize and resolve the hostname
  2. Resolve the route to the destination
  3. Set the default gateway for once

There is currently only one caller of kdump_setup_netdev, the iscsi
network setup code, and it's doing 1 and 2 by itself. And there should
only be one default gateway in kdump enviroment, so applying 3 here is
fine.

And the comment of kdump_install_net is wrong and obsoleted, update the
comment too.

Just merge kdump_setup_netdev into kdump_install_net and always use
kdump_install_net instead.

Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Pingfan Liu <piliu@redhat.com>
2019-10-24 17:00:02 +08:00
Kairui Song
18ee888eab mkdumprd: ensure ssh path exists before check size
check_size checks if the specified dump path on the ssh target have
enough space. And if the path doesn't exits, it will fail and exit.

mkdir_save_path_ssh should be called first to check if the path
exists, and create the path if it doesn't exits, so the size check
can always work properly.

Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Pingfan Liu <piliu@redhat.com>
2019-10-24 16:59:22 +08:00
Pingfan Liu
882b920c2f module-setup: re-fix 99kdumpbase network dependency
In commit a431a7e354 (module-setup: fix 99kdumpbase network dependency),
the statement for OR operation is still wrong.

The OR condition statement should be:  if a || b

Signed-off-by: Pingfan Liu <piliu@redhat.com>
Acked-by: Kairui Song <kasong@redhat.com>
2019-10-22 16:09:43 +08:00
Pingfan Liu
72ed97683f kdumpctl: bail out immediately if host key verification failed
In kdump.conf, if sshkey points to an invalid ssh key, 'kdumpctl restart'
can bail out immediately instead of retry.

Signed-off-by: Pingfan Liu <piliu@redhat.com>
Acked-by: Kairui Song <kasong@redhat.com>
2019-10-22 15:14:37 +08:00
Kairui Song
e7a207d166 Release 2.0.20-5
Signed-off-by: Kairui Song <kasong@redhat.com>
2019-10-15 13:54:32 +08:00
Kairui Song
6dee286467 Don't mount the dump target unless needed
For fadump, this helps to reduce the risk of boot failure, and
may also help speed up the boot by a bit.

For normal kdump, this will delay the dump target mounting, and no
longer depend on systemd to do the mounting job.

And currently there is a failure that caused by some mount handling
bug with kernel and systemd that is failing the system booting:

[FAILED] Failed to mount /kdumproot/home.
See 'systemctl status kdumproot-home.mount' for details.
[DEPEND] Dependency failed for Local File Systems.
[  OK  ] Reached target Remote File Systems (Pre).
[  OK  ] Reached target Remote File Systems.
         Starting udev Coldplug all Devices...
         Starting Create Volatile Files and Directories...
         Starting Kdump Emergency...

This patch can bypass it. The fix of root cause is still WIP, but this
patch itself is a nice to have optimization so it's reasonable to do so.

Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Pingfan Liu <piliu@redhat.com>
2019-09-29 17:12:54 +08:00
Kairui Song
367ca85d1f Merge #3 kdump-lib: strip grub device from kdump_bootdir 2019-09-25 02:40:45 +00:00
Kairui Song
022166630f Merge #2 Add systemd-udev require. 2019-09-25 02:40:35 +00:00
Yuval Turgeman
4714c7c8a3 kdump-lib: strip grub device from kdump_bootdir
When trying to setup kdump for fedora-coreos, kdumpctl start fails to
find the correct boot directory since BOOT_IMAGE start with the grub
device name

Signed-off-by: Yuval Turgeman <yturgema@redhat.com>
2019-09-24 12:19:43 +03:00
Kairui Song
e31d5baf59 Release 2.0.20-4
Signed-off-by: Kairui Song <kasong@redhat.com>
2019-09-24 14:58:09 +08:00
Pingfan Liu
e07fc3e071 kdumpctl: echo msg when waiting for connection
Print some message during the long wait period to reflect the process.
The message will look like:

  Network dump target is not usable, waiting for it to be ready
  ...

Signed-off-by: Pingfan Liu <piliu@redhat.com>
Acked-by: Kairui Song <kasong@redhat.com>
2019-09-24 13:17:16 +08:00
Kazuhito Hagio
a0db00d575 makedumpfile: Fix inconsistent return value from find_vmemmap()
Backport from the makedumpfile devel branch in upstream.

commit 8425342a52b23d462f10bceeeb1c8a3a43d56bf0
Author: Kazuhito Hagio <k-hagio@ab.jp.nec.com>
Date:   Fri Sep 6 09:50:34 2019 -0400

    [PATCH] Fix inconsistent return value from find_vmemmap()

    When -e option is given, the find_vmemmap() returns FAILED(1) if
    it failed on x86_64, but on architectures other than that, it is
    stub_false() and returns FALSE(0).

              if (info->flag_excludevm) {
                      if (find_vmemmap() == FAILED) {
                              ERRMSG("Can't find vmemmap pages\n");

      #define find_vmemmap()          stub_false()

    As a result, on the architectures other than x86_64, the -e option
    does some unnecessary processing with no effect, and marks the dump
    DUMP_DH_EXCLUDED_VMEMMAP unexpectedly.

    Also, the functions for the -e option return COMPLETED or FAILED,
    which are for command return value, not for function return value.

    So let's fix the issue by following the common style that returns
    TRUE or FALSE, and avoid confusion.

    Signed-off-by: Kazuhito Hagio <k-hagio@ab.jp.nec.com>

Signed-off-by: Kazuhito Hagio <k-hagio@ab.jp.nec.com>
Acked-by: Kairui Song <kasong@redhat.com>
2019-09-20 16:37:44 +08:00
Kazuhito Hagio
bdd3061883 makedumpfile: Fix exclusion range in find_vmemmap_pages()
Backport from the makedumpfile devel branch in upstream.

commit b461971bfac0f193a0c274c3b657d158e07d4995
Author: Kazuhito Hagio <k-hagio@ab.jp.nec.com>
Date:   Thu Aug 29 14:51:56 2019 -0400

    [PATCH] Fix exclusion range in find_vmemmap_pages()

    In the function, since pfn ranges are literally start and end, not start
    and end+1, if the struct page of endpfn is at the last in a vmemmap page,
    the vmemmap page is dropped by the following code, and not excluded.

        npfns_offset = endpfn - vmapp->rep_pfn_start;
        vmemmap_offset = npfns_offset * size_table.page;
        // round down to page boundary
        vmemmap_offset -= (vmemmap_offset % pagesize);

    We can use (endpfn+1) here to fix.

    Signed-off-by: Kazuhito Hagio <k-hagio@ab.jp.nec.com>

Signed-off-by: Kazuhito Hagio <k-hagio@ab.jp.nec.com>
Acked-by: Kairui Song <kasong@redhat.com>
2019-09-20 16:37:39 +08:00
Kazuhito Hagio
68f9e69a16 makedumpfile: x86_64: Fix incorrect exclusion by -e option with KASLR
Backport from the makedumpfile devel branch in upstream.

commit aa5ab4cf6c7335392094577380d2eaee8a0a8d52
Author: Kazuhito Hagio <k-hagio@ab.jp.nec.com>
Date:   Thu Aug 29 12:26:34 2019 -0400

    [PATCH] x86_64: Fix incorrect exclusion by -e option with KASLR

    The -e option uses info->vmemmap_start for creating a table to determine
    the positions of page structures that should be excluded, but it is a
    hardcoded value even with KASLR-enabled vmcore.  As a result, the option
    excludes incorrect pages from it.

    To fix this, get the vmemmap start address from info->mem_map_data.

    Signed-off-by: Kazuhito Hagio <k-hagio@ab.jp.nec.com>

Signed-off-by: Kazuhito Hagio <k-hagio@ab.jp.nec.com>
Acked-by: Kairui Song <kasong@redhat.com>
2019-09-20 16:37:23 +08:00
Pingfan Liu
680c0d3414 kdumpctl: distinguish the failed reason of ssh
On a host with ipaddr not ready before kdump service, ssh return errno 255.
While if no ssh-key, ssh also return errno 255. For both of cases, the
current kdump code promote user to run 'kdumpctl propagate'. This confuses
user who already installs ssh-key.

In order to tell these two cases from each other, the ssh warning message
should be involved, and parsed.

For the no ssh-key case , warning message is "Permission denied" or "No
such file or directory". For the other, warning message is "Network
Unreachable"

This patch also does a slight change to enlarge the timeout from 60s to
180s. This value can meet test at the time being

Signed-off-by: Pingfan Liu <piliu@redhat.com>
Acked-by: Kairui Song <kasong@redhat.com>
2019-09-02 17:06:21 +08:00
Kairui Song
3e8526cf04 kexec-kdump-howto.txt: Add notes about device dump
Currently there are two issues with device dump:
  - It may use too much memory
  - kdump won't automatically include required driver in second kernel

User should manually reserve enough memory, and include the required
driver by using extra_modules. Add some notes about the issues in
kexec-kdump-howto.txt

Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Dave Young <dyoung@redhat.com>
2019-09-02 17:06:16 +08:00
Kairui Song
ff329689b3 Disable device dump by default
Device dump may use a log of memory and cause OOM issue, so append
"novmcoredd" option for second kernel and disable it by default.
To use device dump, user should remove the vmcoredd parameter
manually.

Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Dave Young <dyoung@redhat.com>
2019-09-02 17:06:09 +08:00
Kairui Song
75297d6f20 dracut-module-setup: fix bond ifcfg processing
Bond options in ifcfg is space separated, dracut expected it to be comma
separated, so it have to be parsed and converted during initramfs
building.

The currently parsing and convert pattern is flawed, for example:
" downdelay=0 miimon=100 mode=802.3ad updelay=0 "

is converted to :
":,downdelay=0 miimon=100 mode=802.3ad updelay=0 "

should be:
":downdelay=0,miimon=100,mode=802.3ad,updelay=0"

So fix this issue by using more simple but robust method for processing
the options.

Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Dave Young <dyoung@redhat.com>
2019-09-02 17:05:43 +08:00
Pingfan Liu
a5ea190af2 dracut-module-setup: filter out localhost for generic_fence_kdump
The localhost is filtered out in case of is_pcs_fence_kdump, do it too in
case of is_generic_fence_kdump.

Signed-off-by: Pingfan Liu <piliu@redhat.com>
Acked-by: Kairui Song <kasong@redhat.com>
2019-08-28 14:19:17 +08:00
Pingfan Liu
f0b5493b2e dracut-module-setup: get localhost alias by manual
'hostname -A' can not get the alias, meanwhile 'hostname -a' is deprecated.
So we should do it by ourselves.

The parsing is based on the format of /etc/hosts, i.e.
  IP_address canonical_hostname [aliases...]

Signed-off-by: Pingfan Liu <piliu@redhat.com>
Acked-by: Kairui Song <kasong@redhat.com>
2019-08-28 14:19:01 +08:00
Kairui Song
d9c0c2f68f Release 2.0.20-3
Signed-off-by: Kairui Song <kasong@redhat.com>
2019-08-12 18:08:37 +08:00
Pingfan Liu
c1a06343df kdumpctl: wait a while for network ready if dump target is ssh
If dump target is ipv6 address, a host should have ipv6 address ready
before starting kdump service. Otherwise, kdump service fails to start due
to the failure "ssh dump_server_ip mkdir -p $SAVE_PATH".
And user can see message like:
 "Could not create root@2620:52:0:10da:46a8:42ff:fe23:3272/var/crash"

I observe a long period (about 30s) on some machine before they got ipv6
address dynamiclly, which is never seen on ipv4 host.

Hence kdump service has a dependency on ipv6 address. But there is no good
way to resolve it. One way is asking user to run the cmd "nmcli connection
modify eth0 ipv6.may-fail false". But this will block systemd until ipv6
address is ready. Despite doing so, kdump can try its best (wait 1 minutes
after it starts up) before failure.

How to implement the wait is arguable. It will involve too many technique
details if explicitly waiting on ipv6 address, instead, just lean on 'ssh'
return value to see the availability of network.

Signed-off-by: Pingfan Liu <piliu@redhat.com>
Acked-by: Kairui Song <kasong@redhat.com>
2019-08-12 16:13:08 +08:00
Kazuhito Hagio
15f6d2627f makedumpfile: Increase SECTION_MAP_LAST_BIT to 4
Backport from the makedumpfile devel branch in upstream.

commit 7bdb468c2c99dd780c9a5321f93c79cbfdce2527
Author: Kazuhito Hagio <k-hagio@ab.jp.nec.com>
Date:   Tue Jul 23 12:24:47 2019 -0400

    [PATCH] Increase SECTION_MAP_LAST_BIT to 4

    kernel commit 326e1b8f83a4 ("mm/sparsemem: introduce a SECTION_IS_EARLY
    flag") added the flag to mem_section->section_mem_map value, and it caused
    makedumpfile an error like the following:

      readmem: Can't convert a virtual address(fffffc97d1000000) to physical address.
      readmem: type_addr: 0, addr:fffffc97d1000000, size:32768
      __exclude_unnecessary_pages: Can't read the buffer of struct page.
      create_2nd_bitmap: Can't exclude unnecessary pages.

    To fix this, SECTION_MAP_LAST_BIT needs to be updated. The bit has not
    been used until the addition, so we can just increase the value.

    Signed-off-by: Kazuhito Hagio <k-hagio@ab.jp.nec.com>

Signed-off-by: Kazuhito Hagio <k-hagio@ab.jp.nec.com>
Acked-by: Dave Young <dyoung@redhat.com>
Acked-by: Kairui Song <kasong@redhat.com>
2019-08-06 11:15:14 +08:00
Kazuhito Hagio
076b839dd4 makedumpfile: Do not proceed when get_num_dumpable_cyclic() fails
Backport from the makedumpfile devel branch in upstream.

commit c1b834f80311706db2b5070cbccdcba3aacc90e5
Author: Kazuhito Hagio <k-hagio@ab.jp.nec.com>
Date:   Tue Jul 23 11:50:52 2019 -0400

    [PATCH] Do not proceed when get_num_dumpable_cyclic() fails

    Currently, when get_num_dumpable_cyclic() fails and returns FALSE in
    create_dump_bitmap(), info->num_dumpable is set to 0 and makedumpfile
    proceeds to write a broken dumpfile slowly with incorrect progress
    indicator due to the value.

    It should not proceed when get_num_dumpable_cyclic() fails.

    Signed-off-by: Kazuhito Hagio <k-hagio@ab.jp.nec.com>

Signed-off-by: Kazuhito Hagio <k-hagio@ab.jp.nec.com>
Acked-by: Dave Young <dyoung@redhat.com>
Acked-by: Kairui Song <kasong@redhat.com>
2019-08-06 11:14:40 +08:00
Kairui Song
4a0f9763c0 Don't forward and drop journalctl logs for fadump
fadump will alter the normal boot initramfs and we don't want a normal
boot to foward and drop the journalctl logs.

Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Dave Young <dyoung@redhat.com>
2019-08-06 11:14:15 +08:00
Kairui Song
80de723566 Release 2.0.20-2
Signed-off-by: Kairui Song <kasong@redhat.com>
2019-08-02 14:51:21 +08:00
Kairui Song
cb1e5463b5 x86: Fix broken multiboot2 buliding for i386
When building for i386, an error occured:

kexec/arch/i386/kexec-x86.c:39:22: error: 'multiboot2_x86_probe'
undeclared here (not in a function); did you mean 'multiboot_x86_probe'?
39 |  { "multiboot2-x86", multiboot2_x86_probe, multiboot2_x86_load,
   |                      ^~~~~~~~~~~~~~~~~~~~
   |                      multiboot_x86_probe

kexec/arch/i386/kexec-x86.c:39:44: error: 'multiboot2_x86_load'
undeclared here (not in a function); did you mean 'multiboot_x86_load'?
39 |  { "multiboot2-x86", multiboot2_x86_probe, multiboot2_x86_load,
   |                                            ^~~~~~~~~~~~~~~~~~~
   |                                            multiboot_x86_load
kexec/arch/i386/kexec-x86.c:40:4: error: 'multiboot2_x86_usage'
 undeclared here (not in a function); did you mean 'multiboot_x86_usage'?
40 |    multiboot2_x86_usage },
   |    ^~~~~~~~~~~~~~~~~~~~
   |    multiboot_x86_usage

Fix this issue by putting the definition in the right header, also tidy
up Makefile.

Signed-off-by: Kairui Song <kasong@redhat.com>
2019-08-02 11:24:03 +08:00
Pingfan Liu
88bbab963f dracut-module-setup.sh: skip alias of localhost in get_pcs_fence_kdump_nodes()
The current code only exclude the hostname, while localhost can have alias in
/etc/hosts. All of the alias should be excluded from the fence dump node to
avoid deadlock issue.

Signed-off-by: Pingfan Liu <piliu@redhat.com>
Acked-by: Kairui Song <kasong@redhat.com>
2019-08-01 13:38:53 +08:00
Kairui Song
03fd19454b Release 2.0.20-1
Rebase to latest upstream and make a release

Signed-off-by: Kairui Song <kasong@redhat.com>
2019-07-31 15:54:46 +08:00
Kairui Song
4b7198f651 Update makedumpfile to 1.6.6
Signed-off-by: Kairui Song <kasong@redhat.com>
2019-07-31 15:54:42 +08:00
Kairui Song
17981c14ec kexec-tools.spec: Use a macro for makedumpfile version
Don't repeat it again and again and make it easier to maintain.

Signed-off-by: Kairui Song <kasong@redhat.com>
2019-07-31 15:53:26 +08:00
Fedora Release Engineering
603cd09b76 - Rebuilt for https://fedoraproject.org/wiki/Fedora_31_Mass_Rebuild
Signed-off-by: Fedora Release Engineering <releng@fedoraproject.org>
2019-07-25 11:23:07 +00:00
Kairui Song
cf5d362dca dracut-module-setup.sh: Don't use squash module for fadump
Squash module is used to save memory. For fadump this is not neccessary
and may slow down the build time, and make it more fragile.

fadump initramfs is used for normal boot as well, although squash module
is capable of being used for generic normal boot, but there are cases
where is doesn't work well. So disable it and make fadump more robust.

Signed-off-by: Kairui Song <kasong@redhat.com>
Tested-by: Hari Bathini <hbathini@linux.ibm.com>
Acked-by: Dave Young <dyoung@redhat.com>
2019-07-16 16:46:23 +08:00
Kairui Song
5b26c1f8b2 Forward logs in kdump kernel to console directly
Don't use any log storage and forward to console directly, this make
console output more useful, and also save more memory. On a fresh
installed Fedora 30 it saved ~5M of memory, and the amount of log being
printed to console is still accetable.

Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Dave Young <dyoung@redhat.com>
2019-07-16 14:11:16 +08:00
Kairui Song
b998b90197 kdump.sysconfig/x86_64: Disable HEST by default
Some firmware will provide a ACPI HEST table with massive amount of
entries, and the way how kernel handles these entries will consume a lot
of memory which will lead to OOM issue in kdump kernel. During testing
on certain machine, disable HEST saved ~60M of memory.

Kdump is only for emergency use in case of a kernel panic, so temporarily
disable hardware error report & recovery related feature is acceptable
in general. So disable HEST support in kdump kernel to save memory.

Currently such issue is only observed on x86_64, so limit this change to
x86_64 only.

Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Dave Young <dyoung@redhat.com>
2019-07-09 14:11:36 +08:00
Pingfan Liu
ace23737ab dracut-kdump-capture.service: Use OnFailureJobMode instead of deprecated OnFailureIsolate
systemd has the following message "OnFailureIsolate is deprecated. Please
use OnFailureJobMode= instead"

Changing the file to meet systemd's requirement

Signed-off-by: Pingfan Liu <piliu@redhat.com>
Acked-by: Kairui Song <kasong@redhat.com>
2019-06-14 09:44:09 +08:00
Vasiliy Glazov
c4a2ecb6e4 Add systemd-udev require.
It is needed to proper owning of /usr/lib/udev/rules.d directory.
2019-06-11 07:46:47 +00:00
Lianbo Jiang
b1250de389 makedumpfile: x86_64: Add support for AMD Secure Memory Encryption
Backport from the makedumpfile devel branch in upstream.

commit d222b01e516bba73ef9fefee4146734a5f260fa1 (HEAD -> devel)
Author: Lianbo Jiang <lijiang@redhat.com>
Date:   Wed Jan 30 10:48:53 2019 +0800

    [PATCH] x86_64: Add support for AMD Secure Memory Encryption

    On AMD machine with Secure Memory Encryption (SME) feature, if SME is
    enabled, page tables contain a specific attribute bit (C-bit) in their
    entries to indicate whether a page is encrypted or unencrypted.

    So get NUMBER(sme_mask) from vmcoreinfo, which stores the value of
    the C-bit position, and drop it to obtain the true physical address.

    Signed-off-by: Lianbo Jiang <lijiang@redhat.com>

Signed-off-by: Lianbo Jiang <lijiang@redhat.com>
Acked-by: Kairui Song <kasong@redhat.com>
2019-06-10 16:13:38 +08:00
Bhupesh Sharma
fd1eccf920 aarch64/kdump.sysconfig: Make config options similar to x86_64
Looking at the difference of the x86_64 and aarch64 kdump.sysconfig
options for Fedora, one can see the following options which are
different:

Present in kdump.sysconfig.x86_64 but not in kdump.sysconfig.aarch64:
---------------------------------------------------------------------
 cgroup_disable=memory
 mce=off
 numa=off
 udev.children-max=2
 panic=10
 acpi_no_memhotplug
 transparent_hugepage=never
 nokaslr

Present in kdump.sysconfig.aarch64 but not in kdump.sysconfig.x86_64:
---------------------------------------------------------------------
 swiotlb=noforce

After going through all the options, it makes sense to add the
following options added to kdump.sysconfig.aarch64:

  KDUMP_COMMANDLINE_APPEND="cgroup_disable=memory udev.children-max=2
                            panic=10 irqpoll nr_cpus=1
			    swiotlb=noforce reset_devices"

This has helped reduce the memory footprint of crashkernel on several
aarch64 machines available in the beaker lab. For e.g. I was seeing
OOM issues on large aws ec2 instances with the default crashkernel size
of 512M, and I had to use an increased crashkernel size of 786M on the
same to boot the crash dump kernel.

Signed-off-by: Bhupesh Sharma <bhsharma@redhat.com>
Acked-by: Pingfan Liu <piliu@redhat.com>
2019-05-29 17:04:27 +08:00
Bhupesh Sharma
3001788f4c Add aarch64 specific kdump.sysconfig and use 'nr_cpus' instead of 'maxcpus'
'maxcpus' setting normally don't work on several kdump enabled systems
due to a known udev issue.

Currently the fedora kdump configuration is set as the following on the
aarch64 systems:

 # cat /etc/sysconfig/kdump
<..snip..>
  # This variable lets us append arguments to the current kdump
  # commandline after processed by KDUMP_COMMANDLINE_REMOVE
  # KDUMP_COMMANDLINE_APPEND="irqpoll maxcpus=1 reset_devices"
<..snip..>

Since the 'maxcpus' setting doesn't limit the number of SMP CPUs,
so the kdump kernel still boots with all CPUs available on the system.
For e.g on the qualcomm amberwing its 46 CPUs:

  # lscpu
Architecture:        aarch64
Byte Order:          Little Endian
CPU(s):              46
On-line CPU(s) list: 0-45
Thread(s) per core:  1
Core(s) per socket:  46
Socket(s):           1
NUMA node(s):        1
Vendor ID:           Qualcomm
Model:               1
Model name:          Falkor
Stepping:            0x0
CPU max MHz:         2600.0000
CPU min MHz:         600.0000
BogoMIPS:            40.00
L1d cache:           32K
L1i cache:           64K
L2 cache:            512K
L3 cache:            58880K
NUMA node0 CPU(s):   0-45
Flags:               fp asimd evtstrm aes pmull sha1 sha2 crc32 cpuid asimdrdm

This causes the memory consumption in the kdump kernel to swell up and
we can end up having OOM issues in the kdump kernel boot.

Whereas if we use 'nr_cpus=1' in the bootargs, the number of SMP CPUs in
the kdump kernel get limited to 1.

The 'swiotlb=noforce' setting in bootargs provide us extra guarding, to
ensure the crash kernel size requirements do not swell on systems
which support swiotlb.

With the above settings, crashkernel boots properly (without OOM) on all
the aarch64 boards I could test on - qualcomm amberwings, hp-moonshots
and hpe-apache (thunderx2) for crash dump saving on local disk.

Signed-off-by: Bhupesh Sharma <bhsharma@redhat.com>
Acked-by: Pingfan Liu <piliu@redhat.com>
2019-05-29 17:04:20 +08:00
Kairui Song
f0fa5c8e91 kdumpctl: check for ssh path availability when rebuild
Currently kdumpctl rebuild will simply rebuild the initramfs, and
only perform basic config syntax check. But it should also check if the
target path is available when using SSH target, else kdump may fail.
is second kernel. kdumpctl rebuild should cover this case, and create
the path if it doesn't exist.

This patch make rebuild and restart behaves the same, rebuild is
now equal to restart, except it won't check config change or reload
kdump resource.

Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Dave Young <dyoung@redhat.com>
2019-05-27 16:13:29 +08:00