The Zstandard (zstd) compression method is not enabled:
$ makedumpfile -v
makedumpfile: version 1.7.0 (released on 8 Nov 2021)
lzo enabled
snappy enabled
zstd disabled
This patch will enable it when building kexec-tools rpm package.
Signed-off-by: Tao Liu <ltao@redhat.com>
Acked-by: Coiby Xu <coxu@redhat.com>
kdump-error-handler.sh does nothing except calling three functions,
it can be easily merged into kdump.sh by using a parameter to run the
error handling routine.
kdump-lib-initramfs.sh was created to hold the three shared functions
and related code, so by merging these two files, kdump-lib-initramfs.sh
can be simplified by a lot.
Following up commits will clean up kdump-lib-initramfs.sh.
Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Philipp Rudo <prudo@redhat.com>
At least, this is a plausible suspect for #1993505 - thanks to
@kevin for identifying it - and fixing it should be safe and
correct, so we may as well do it and see if it helps.
While kdump migration action is registered for LPM event, ensure it is
cleared as appropriate to avoid duplicate/stale notification entries.
Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Acked-by: Kairui Song <kasong@redhat.com>
Dump capture initramfs needs rebuild after partition migration (LPM).
Use servicelog notification mechanism to invoke kdump rebuild after
migration.
Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Reviewed-by: Pingfan Liu <piliu@redhat.com>
Acked-by: Kairui Song <kasong@redhat.com>
Packaging guidelines have been amended to not require systemd for scriptlets,
see https://docs.fedoraproject.org/en-US/packaging-guidelines/Scriptlets/#_scriptlets.
The comment duplicates what the macro contains.
systemd-sysv-convert binary was removed in 2013, trying to call it is
unlikely to succeed.
chkconfig binary is provided by the chkconfig package, which is not in
Requires. (And makes little sense to call nowadays anyway.)
To track and manage kernel's crashkernel usage by kernel version,
each kernel package will include a crashkernel.default containing the
default `crashkernel=` value of that kernel. So we can use a hook to
update the kernel cmdline of new installed kernel accordingly.
Put it after all other grub boot loader setup hooks, so it can simply
call grubby to modify the kernel cmdline.
Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Baoquan He <bhe@redhat.com>
In case of fadump, the initramfs image has to be built to boot into
the production environment as well as to offload the active crash dump
to the specified dump target (for boot after crash). As the same image
would be used for both boot scenarios, it could not be built optimally
while accommodating both cases.
Use --include to include the initramfs image built for offloading
active crash dump to the specified dump target. Also, introduce a new
out-of-tree dracut module (99zz-fadumpinit) that installs a customized
init program while moving the default /init to /init.dracut. This
customized init program is leveraged to isolate fadump image within
the default initramfs image by kicking off default boot process
(exec /init.dracut) for regular boot scenario and activating fadump
initramfs image, if the system is booting after a crash.
If squash is available, ensure default initramfs image is also built
with squash module to reduce memory consumption in capture kernel.
Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Kairui Song <kasong@redhat.com>
Backport from upstream.
commit 9a6f589d99dcef114c89fde992157f5467028c8f
Author: Tao Liu <ltao@redhat.com>
Date: Fri Jun 18 18:28:04 2021 +0800
[PATCH] check for invalid physical address of /proc/kcore when making ELF dumpfile
Previously when executing makedumpfile with -E option against
/proc/kcore, makedumpfile will fail:
# makedumpfile -E -d 31 /proc/kcore kcore.dump
...
write_elf_load_segment: Can't convert physaddr(ffffffffffffffff) to an offset.
makedumpfile Failed.
It's because /proc/kcore contains PT_LOAD program headers which have
physaddr (0xffffffffffffffff). With -E option, makedumpfile will
try to convert the physaddr to an offset and fails.
Skip the PT_LOAD program headers which have such physaddr.
Signed-off-by: Tao Liu <ltao@redhat.com>
Signed-off-by: Kazuhito Hagio <k-hagio-ab@nec.com>
Signed-off-by: Tao Liu <ltao@redhat.com>
Acked-by: Kairui Song <kasong@redhat.com>
Backport from upstream.
commit 38d921a2ef50ebd36258097553626443ffe27496
Author: Coiby Xu <coxu@redhat.com>
Date: Tue Jun 15 18:26:31 2021 +0800
[PATCH] check for invalid physical address of /proc/kcore when finding max_paddr
Kernel commit 464920104bf7adac12722035bfefb3d772eb04d8 ("/proc/kcore:
update physical address for kcore ram and text") sets an invalid paddr
(0xffffffffffffffff = -1) for PT_LOAD segments of not direct mapped
regions:
$ readelf -l /proc/kcore
...
Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
NOTE 0x0000000000000120 0x0000000000000000 0x0000000000000000
0x0000000000002320 0x0000000000000000 0x0
LOAD 0x1000000000010000 0xd000000000000000 0xffffffffffffffff
^^^^^^^^^^^^^^^^^^
0x0001f80000000000 0x0001f80000000000 RWE 0x10000
makedumpfile uses max_paddr to calculate the number of sections for
sparse memory model thus wrong number is obtained based on max_paddr
(-1). This error could lead to the failure of copying /proc/kcore
for RHEL-8.5 on ppc64le machine [1]:
$ makedumpfile /proc/kcore vmcore1
get_mem_section: Could not validate mem_section.
get_mm_sparsemem: Can't get the address of mem_section.
makedumpfile Failed.
Let's check if the phys_start of the segment is a valid physical
address to fix this problem.
[1] https://bugzilla.redhat.com/show_bug.cgi?id=1965267
Reported-by: Xiaoying Yan <yiyan@redhat.com>
Signed-off-by: Coiby Xu <coxu@redhat.com>
Signed-off-by: Tao Liu <ltao@redhat.com>
Acked-by: Kairui Song <kasong@redhat.com>
Backport from upstream.
commit 646456862df8926ba10dd7330abf3bf0f887e1b6
Author: Kazuhito Hagio <k-hagio-ab@nec.com>
Date: Wed May 26 14:31:26 2021 +0900
[PATCH] Increase SECTION_MAP_LAST_BIT to 5
* Required for kernel 5.12
Kernel commit 1f90a3477df3 ("mm: teach pfn_to_online_page() about
ZONE_DEVICE section collisions") added a section flag
(SECTION_TAINT_ZONE_DEVICE) and causes makedumpfile an error on
some machines like this:
__vtop4_x86_64: Can't get a valid pmd_pte.
readmem: Can't convert a virtual address(ffffe2bdc2000000) to physical address.
readmem: type_addr: 0, addr:ffffe2bdc2000000, size:32768
__exclude_unnecessary_pages: Can't read the buffer of struct page.
create_2nd_bitmap: Can't exclude unnecessary pages.
Increase SECTION_MAP_LAST_BIT to 5 to fix this. The bit had not
been used until the change, so we can just increase the value.
Signed-off-by: Kazuhito Hagio <k-hagio-ab@nec.com>
Signed-off-by: Tao Liu <ltao@redhat.com>
Acked-by: Kairui Song <kasong@redhat.com>
The wrapper is introduced in commit 002337c, according to the commit
message, the only usage of the wrapper is when dracut-initqueue calls
"systemctl start emergency" directly. In that case, emergency
is started, but not in a isolation mode, which means dracut-initqueue
is still running. On the other hand, emergency will call
"systemctl start dracut-initqueue" again when default action is dump_to_rootfs.
systemd would block on the last dracut-initqueue, waiting for the first
instance to exit, which leaves us hang.
In previous commit we added initqueue status detect in dump_to_rootfs,
so now even without the wrapper, it will not hang.
And actually, previously, with the wrapper, emergency might still hang
for like 30s. When dracut called emergency service because initqueue
timed out, dump_to_rootfs will try start initqueue again and timeout
again. Now with the wrapper removed, we can avoid these two kinds of
hangs, bacause without the isolation we can detect initqueue service
status correctly in such case.
Also remove the invalid header comments in service file, the service
is not part of systemd code. And sync the service spec with dracut.
Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Coiby Xu <coxu@redhat.com>
Backport from upstream:
commit 0ef2ca6c9fa2f61f217a4bf5d7fd70f24e12b2eb
Author: Kazuhito Hagio <k-hagio-ab@nec.com>
Date: Thu Feb 4 16:29:06 2021 +0900
[PATCH] Show write byte size in report messages
Show write byte size in report messages. This value can be different
from the size of the actual file because of some holes on dumpfile
data structure.
$ makedumpfile --show-stats -l -d 1 vmcore dump.ld1
...
Total pages : 0x0000000000080000
Write bytes : 377686445
...
# ls -l dump.ld1
-rw------- 1 root root 377691573 Feb 4 16:28 dump.ld1
Note that this value should not be used with /proc/kcore to determine
how much disk space is needed for crash dump, because the real memory
usage when a crash occurs can vary widely.
Signed-off-by: Kazuhito Hagio <k-hagio-ab@nec.com>
Signed-off-by: Tao Liu <ltao@redhat.com>
Acked-by: Kairui Song <kasong@redhat.com>
Backport from upstream:
commit 6f3e75a558ed50d6ff0b42e3f61c099b2005b7bb
Author: Julien Thierry <jthierry@redhat.com>
Date: Tue Nov 24 10:45:25 2020 +0000
[PATCH 2/2] Add shorthand --show-stats option to show report stats
Provide shorthand --show-stats option to enable report messages
without needing to set a particular value for message-level.
Signed-off-by: Julien Thierry <jthierry@redhat.com>
Signed-off-by: Kazuhito Hagio <k-hagio-ab@nec.com>
Signed-off-by: Tao Liu <ltao@redhat.com>
Acked-by: Kairui Song <kasong@redhat.com>
Backport from upstream.
commit 3422e1d6bc3511c5af9cb05ba74ad97dd93ffd7f
Author: Julien Thierry <jthierry@redhat.com>
Date: Tue Nov 24 10:45:24 2020 +0000
[PATCH 1/2] Add --dry-run option to prevent writing the dumpfile
Add a --dry-run option to run all operations without writing the
dump to the output file.
Signed-off-by: Julien Thierry <jthierry@redhat.com>
Signed-off-by: Kazuhito Hagio <k-hagio-ab@nec.com>
Signed-off-by: Tao Liu <ltao@redhat.com>
Acked-by: Kairui Song <kasong@redhat.com>
The `/boot` directory on some operating systems might be read-only.
If we cannot write to `$KDUMP_BOOTDIR` when generating the kdump
initrd, attempt to place the generated initrd at `/var/lib/kdump`
instead.
Signed-off by: Kelvin Fan <kelvinfan001@gmail.com>
Acked-by: Kairui Song <kasong@redhat.com>
ipcalc is needed for generating 45route-static.conf. However,
on newer Fedora, e.g. 34, dracut-network drops dependency on
dhcp-client which requires ipcalc. Make kexec-tools explicitly
depends on ipcalc.
Reported-by: Jie Li <jieli@redhat.com>
Signed-off-by: Coiby Xu <coxu@redhat.com>
Acked-by: Kairui Song <kasong@redhat.com>
Backports:
commit 54aec3878b3f91341e6bc735eda158cca5c54ec9
Author: Alexander Egorenkov <egorenar@linux.ibm.com>
Date: Fri Sep 18 13:55:56 2020 +0200
[PATCH] make use of 'uts_namespace.name' offset in VMCOREINFO
* Required for kernel 5.11
The offset of the field 'init_uts_ns.name' has changed since
kernel commit 9a56493f6942 ("uts: Use generic ns_common::count").
Make use of the offset 'uts_namespace.name' if available in
VMCOREINFO.
Signed-off-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Signed-off-by: Kairui Song <kasong@redhat.com>
commit 44b073b7ec467aee0d7de381d455b8ace1199184
Author: John Ogness <john.ogness@linutronix.de>
Date: Wed Nov 25 10:10:31 2020 +0106
[PATCH 2/2] printk: use committed/finalized state values
* Required for kernel 5.10
The ringbuffer entries use 2 state values (committed and finalized)
rather than a single flag to represent being available for reading.
Copy the definitions and state lookup function directly from the
kernel source and use the new states.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Kairui Song <kasong@redhat.com>
Backports:
commit c617ec63339222f3a44d73e36677a9acc8954ccd
Author: John Ogness <john.ogness@linutronix.de>
Date: Thu Nov 19 02:41:21 2020 +0000
[PATCH 1/2] printk: add support for lockless ringbuffer
* Required for kernel 5.10
Linux 5.10 introduces a new lockless ringbuffer. The new ringbuffer
is structured completely different to the previous iterations.
Add support for retrieving the ringbuffer from debug information
and/or using vmcoreinfo. The new ringbuffer is detected based on
the availability of the "prb" symbol.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Kazuhito Hagio <k-hagio-ab@nec.com>
Signed-off-by: Kairui Song <kasong@redhat.com>
The dracut module is opportunistic about using the built-in squashfs
support only when available, but the spec file hard requires it. Demote
it to a weak dep to truly make it optional.
This caters to environments which strive to stay minimal, like FCOS and
RHCOS. See https://github.com/coreos/fedora-coreos-config/pull/708 for
details.
Since the logger was introduced into kdump, let's enable it for kdump
so that we can output kdump messages according the log level and save
these messages for debugging.
Signed-off-by: Lianbo Jiang <lijiang@redhat.com>
Acked-by: Kairui Song <kasong@redhat.com>
Kdump service will create kdump initramfs when needed, but it won't
clean up the kdump initramfs on kernel uninstall. So create a kernel
install hook to do the clean up job.
Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Dave Young <dyoung@redhat.com>
If the directory /etc/kdump/pre.d is optional, then it is hard
to tell between the following cases:
-1. no directory "/etc/kdump/pre.d"
-2. "rm -rf /etc/kdump/pre.d", which removes all scripts under pre.d
For the second case, kdump.img should be rebuilt.
To bail out from this corner case, always creating pre.d and post.d
during rpm installation.
Signed-off-by: Pingfan Liu <piliu@redhat.com>
Acked-by: Kairui Song <kasong@redhat.com>
Switch root is never used for kdump image, and this will be helpful to
reduce the initramfs size.
Also increase dracut dependency version and the function is
dracut_no_switch_root is new introduced.
This commit is applied to RHEL some time ago, but missing in Fedora as
Fedora's Dracut didn't backport this feature at that time. Now apply
this missing commit.
Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Pingfan Liu <piliu@redhat.com>
This reverts commit cee618593c.
Upstream dracut have provided a parameter for adding mandantory network
requirement by appending "rd.neednet" parameter, so we should use that
instead.
Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Dave Young <dyoung@redhat.com>
Backport from the upstream makedumpfile devel branch.
commit 989152e113bfcb4fbfbad6f3aed6f43be4455919
Author: Kazuhito Hagio <k-hagio-ab@nec.com>
Date: Tue Feb 25 16:04:55 2020 -0500
[PATCH] Introduce --check-params option
Currently it's difficult to check whether a makedumpfile command-line
is valid or not without an actual panic. This is inefficient and if
a wrong configuration is not tested, you will miss the vmcore when an
actual panic occurs.
In order for kdump facilities like kexec-tools to be able to check
the specified command-line parameters in advance, introduce the
--check-params option that only checks them and exits immediately.
Signed-off-by: Kazuhito Hagio <k-hagio-ab@nec.com>
Signed-off-by: Kazuhito Hagio <k-hagio-ab@nec.com>
Acked-by: Kairui Song <kasong@redhat.com>
The dracut initqueue may quit immediately and won't trigger any hook if
there is no "finished" hook still pending (finished hook will be deleted
once it return 0).
This issue start to appear with latest dracut, latest dracut use
network-manager to configure the network,
network-manager module only install "settled" hook, and we didn't
install any other hook. So NFS/SSH dump will fail. iSCSI dump works
because dracut iscsi module will install a "finished" hook to detect if
the iscsi target is up.
So for NFS/SSH we keep initqueue running until the host successfully get
a valid IP address, which means the network is ready.
Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Pingfan Liu <piliu@redhat.com>
Backport from the makedumpfile devel branch in upstream.
commit 8425342a52b23d462f10bceeeb1c8a3a43d56bf0
Author: Kazuhito Hagio <k-hagio@ab.jp.nec.com>
Date: Fri Sep 6 09:50:34 2019 -0400
[PATCH] Fix inconsistent return value from find_vmemmap()
When -e option is given, the find_vmemmap() returns FAILED(1) if
it failed on x86_64, but on architectures other than that, it is
stub_false() and returns FALSE(0).
if (info->flag_excludevm) {
if (find_vmemmap() == FAILED) {
ERRMSG("Can't find vmemmap pages\n");
#define find_vmemmap() stub_false()
As a result, on the architectures other than x86_64, the -e option
does some unnecessary processing with no effect, and marks the dump
DUMP_DH_EXCLUDED_VMEMMAP unexpectedly.
Also, the functions for the -e option return COMPLETED or FAILED,
which are for command return value, not for function return value.
So let's fix the issue by following the common style that returns
TRUE or FALSE, and avoid confusion.
Signed-off-by: Kazuhito Hagio <k-hagio@ab.jp.nec.com>
Signed-off-by: Kazuhito Hagio <k-hagio@ab.jp.nec.com>
Acked-by: Kairui Song <kasong@redhat.com>
Backport from the makedumpfile devel branch in upstream.
commit b461971bfac0f193a0c274c3b657d158e07d4995
Author: Kazuhito Hagio <k-hagio@ab.jp.nec.com>
Date: Thu Aug 29 14:51:56 2019 -0400
[PATCH] Fix exclusion range in find_vmemmap_pages()
In the function, since pfn ranges are literally start and end, not start
and end+1, if the struct page of endpfn is at the last in a vmemmap page,
the vmemmap page is dropped by the following code, and not excluded.
npfns_offset = endpfn - vmapp->rep_pfn_start;
vmemmap_offset = npfns_offset * size_table.page;
// round down to page boundary
vmemmap_offset -= (vmemmap_offset % pagesize);
We can use (endpfn+1) here to fix.
Signed-off-by: Kazuhito Hagio <k-hagio@ab.jp.nec.com>
Signed-off-by: Kazuhito Hagio <k-hagio@ab.jp.nec.com>
Acked-by: Kairui Song <kasong@redhat.com>
Backport from the makedumpfile devel branch in upstream.
commit aa5ab4cf6c7335392094577380d2eaee8a0a8d52
Author: Kazuhito Hagio <k-hagio@ab.jp.nec.com>
Date: Thu Aug 29 12:26:34 2019 -0400
[PATCH] x86_64: Fix incorrect exclusion by -e option with KASLR
The -e option uses info->vmemmap_start for creating a table to determine
the positions of page structures that should be excluded, but it is a
hardcoded value even with KASLR-enabled vmcore. As a result, the option
excludes incorrect pages from it.
To fix this, get the vmemmap start address from info->mem_map_data.
Signed-off-by: Kazuhito Hagio <k-hagio@ab.jp.nec.com>
Signed-off-by: Kazuhito Hagio <k-hagio@ab.jp.nec.com>
Acked-by: Kairui Song <kasong@redhat.com>
Backport from the makedumpfile devel branch in upstream.
commit 7bdb468c2c99dd780c9a5321f93c79cbfdce2527
Author: Kazuhito Hagio <k-hagio@ab.jp.nec.com>
Date: Tue Jul 23 12:24:47 2019 -0400
[PATCH] Increase SECTION_MAP_LAST_BIT to 4
kernel commit 326e1b8f83a4 ("mm/sparsemem: introduce a SECTION_IS_EARLY
flag") added the flag to mem_section->section_mem_map value, and it caused
makedumpfile an error like the following:
readmem: Can't convert a virtual address(fffffc97d1000000) to physical address.
readmem: type_addr: 0, addr:fffffc97d1000000, size:32768
__exclude_unnecessary_pages: Can't read the buffer of struct page.
create_2nd_bitmap: Can't exclude unnecessary pages.
To fix this, SECTION_MAP_LAST_BIT needs to be updated. The bit has not
been used until the addition, so we can just increase the value.
Signed-off-by: Kazuhito Hagio <k-hagio@ab.jp.nec.com>
Signed-off-by: Kazuhito Hagio <k-hagio@ab.jp.nec.com>
Acked-by: Dave Young <dyoung@redhat.com>
Acked-by: Kairui Song <kasong@redhat.com>
Backport from the makedumpfile devel branch in upstream.
commit c1b834f80311706db2b5070cbccdcba3aacc90e5
Author: Kazuhito Hagio <k-hagio@ab.jp.nec.com>
Date: Tue Jul 23 11:50:52 2019 -0400
[PATCH] Do not proceed when get_num_dumpable_cyclic() fails
Currently, when get_num_dumpable_cyclic() fails and returns FALSE in
create_dump_bitmap(), info->num_dumpable is set to 0 and makedumpfile
proceeds to write a broken dumpfile slowly with incorrect progress
indicator due to the value.
It should not proceed when get_num_dumpable_cyclic() fails.
Signed-off-by: Kazuhito Hagio <k-hagio@ab.jp.nec.com>
Signed-off-by: Kazuhito Hagio <k-hagio@ab.jp.nec.com>
Acked-by: Dave Young <dyoung@redhat.com>
Acked-by: Kairui Song <kasong@redhat.com>
When building for i386, an error occured:
kexec/arch/i386/kexec-x86.c:39:22: error: 'multiboot2_x86_probe'
undeclared here (not in a function); did you mean 'multiboot_x86_probe'?
39 | { "multiboot2-x86", multiboot2_x86_probe, multiboot2_x86_load,
| ^~~~~~~~~~~~~~~~~~~~
| multiboot_x86_probe
kexec/arch/i386/kexec-x86.c:39:44: error: 'multiboot2_x86_load'
undeclared here (not in a function); did you mean 'multiboot_x86_load'?
39 | { "multiboot2-x86", multiboot2_x86_probe, multiboot2_x86_load,
| ^~~~~~~~~~~~~~~~~~~
| multiboot_x86_load
kexec/arch/i386/kexec-x86.c:40:4: error: 'multiboot2_x86_usage'
undeclared here (not in a function); did you mean 'multiboot_x86_usage'?
40 | multiboot2_x86_usage },
| ^~~~~~~~~~~~~~~~~~~~
| multiboot_x86_usage
Fix this issue by putting the definition in the right header, also tidy
up Makefile.
Signed-off-by: Kairui Song <kasong@redhat.com>
Backport from the makedumpfile devel branch in upstream.
commit d222b01e516bba73ef9fefee4146734a5f260fa1 (HEAD -> devel)
Author: Lianbo Jiang <lijiang@redhat.com>
Date: Wed Jan 30 10:48:53 2019 +0800
[PATCH] x86_64: Add support for AMD Secure Memory Encryption
On AMD machine with Secure Memory Encryption (SME) feature, if SME is
enabled, page tables contain a specific attribute bit (C-bit) in their
entries to indicate whether a page is encrypted or unencrypted.
So get NUMBER(sme_mask) from vmcoreinfo, which stores the value of
the C-bit position, and drop it to obtain the true physical address.
Signed-off-by: Lianbo Jiang <lijiang@redhat.com>
Signed-off-by: Lianbo Jiang <lijiang@redhat.com>
Acked-by: Kairui Song <kasong@redhat.com>
'maxcpus' setting normally don't work on several kdump enabled systems
due to a known udev issue.
Currently the fedora kdump configuration is set as the following on the
aarch64 systems:
# cat /etc/sysconfig/kdump
<..snip..>
# This variable lets us append arguments to the current kdump
# commandline after processed by KDUMP_COMMANDLINE_REMOVE
# KDUMP_COMMANDLINE_APPEND="irqpoll maxcpus=1 reset_devices"
<..snip..>
Since the 'maxcpus' setting doesn't limit the number of SMP CPUs,
so the kdump kernel still boots with all CPUs available on the system.
For e.g on the qualcomm amberwing its 46 CPUs:
# lscpu
Architecture: aarch64
Byte Order: Little Endian
CPU(s): 46
On-line CPU(s) list: 0-45
Thread(s) per core: 1
Core(s) per socket: 46
Socket(s): 1
NUMA node(s): 1
Vendor ID: Qualcomm
Model: 1
Model name: Falkor
Stepping: 0x0
CPU max MHz: 2600.0000
CPU min MHz: 600.0000
BogoMIPS: 40.00
L1d cache: 32K
L1i cache: 64K
L2 cache: 512K
L3 cache: 58880K
NUMA node0 CPU(s): 0-45
Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 cpuid asimdrdm
This causes the memory consumption in the kdump kernel to swell up and
we can end up having OOM issues in the kdump kernel boot.
Whereas if we use 'nr_cpus=1' in the bootargs, the number of SMP CPUs in
the kdump kernel get limited to 1.
The 'swiotlb=noforce' setting in bootargs provide us extra guarding, to
ensure the crash kernel size requirements do not swell on systems
which support swiotlb.
With the above settings, crashkernel boots properly (without OOM) on all
the aarch64 boards I could test on - qualcomm amberwings, hp-moonshots
and hpe-apache (thunderx2) for crash dump saving on local disk.
Signed-off-by: Bhupesh Sharma <bhsharma@redhat.com>
Acked-by: Pingfan Liu <piliu@redhat.com>
On powerpc, after hot add cpu and trigger crash on the hot-added cpu, the
kdump kernel hangs after "I'm in purgatory".
The current udev rules expects the dtb to be rebuit on cpu add/remove event.
But since powerpc does not follow the standard cpu hot add framework, it
only ejects online/offline event to user space when cpu is hot
added/removed, instead of add/remove event. Pingfan tried fixing that but
it didn't please the maintainer as it breaks some old userspace tools.
Due to the failure of dtb's rebuilding, KDump kernel fails to get the
'boot_cpuid' and eventually fails to boot [see early_init_dt_scan_cpus() in
arch/powerpc/kernel/prom.c file] if system crashes on hot-added CPU.
Work around it by changing udev rules on powerpc to onlne/offline.
As for offline message, it is even useless on powerpc, and can be dropped.
See the explain: On powerpc, /sys/devices/system/cpu/cpuX nodes are present
for all "possible", irrespective of whether a CPU is hot-added/removed.
crash_notes are already built for all /sys/devices/system/cpu/cpuX nodes and
these nodes are present for all "possible" CPUs
(online/offline/could-be-hot-removed/could-be-hot-added)
Signed-off-by: Pingfan Liu <piliu@redhat.com>
Acked-by: Kairui Song <kasong@redhat.com>