Commit Graph

1710 Commits

Author SHA1 Message Date
Hari Bathini
fa9201b240 fadump: isolate fadump initramfs image within the default one
In case of fadump, the initramfs image has to be built to boot into
the production environment as well as to offload the active crash dump
to the specified dump target (for boot after crash). As the same image
would be used for both boot scenarios, it could not be built optimally
while accommodating both cases.

Use --include to include the initramfs image built for offloading
active crash dump to the specified dump target. Also, introduce a new
out-of-tree dracut module (99zz-fadumpinit) that installs a customized
init program while moving the default /init to /init.dracut. This
customized init program is leveraged to isolate fadump image within
the default initramfs image by kicking off default boot process
(exec /init.dracut) for regular boot scenario and activating fadump
initramfs image, if the system is booting after a crash.

If squash is available, ensure default initramfs image is also built
with squash module to reduce memory consumption in capture kernel.

Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Kairui Song <kasong@redhat.com>
2021-06-29 21:35:58 +08:00
Kairui Song
c4749f9c57 Release 2.0.22-4
Signed-off-by: Kairui Song <kasong@redhat.com>
2021-06-29 21:24:19 +08:00
Coiby Xu
ad6f60d70d fix format issue in find_online_znet_device
Change spaces to tab to fix alignment issue.

Fixes: commit 7d47251568
       ("Iterate /sys/bus/ccwgroup/devices to tell if we should set up rd.znet")
Signed-off-by: Coiby Xu <coxu@redhat.com>
Acked-by: Kairui Song <kasong@redhat.com>
2021-06-29 17:11:07 +08:00
Coiby Xu
03f9b91351 check the existence of /sys/bus/ccwgroup/devices before trying to find online network device
/sys/bus/ccwgroup/devices doesn't exist for non-s390x machines which leads to
the warning "find: '/sys/bus/ccwgroup/devices': No such file or directory".
This warning can be eliminated by checking the existence of
"/sys/bus/ccwgroup/devices" beforehand.

Fixes: commit 7d47251568
       ("Iterate /sys/bus/ccwgroup/devices to tell if we should set up rd.znet")

Reported-by: Ruowen Qin <ruqin@redhat.com>
BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1974618
Signed-off-by: Coiby Xu <coxu@redhat.com>
Acked-by: Kairui Song <kasong@redhat.com>
2021-06-29 17:11:00 +08:00
Tao Liu
50bb8b701f check for invalid physical address of /proc/kcore when making ELF dumpfile
Backport from upstream.

commit 9a6f589d99dcef114c89fde992157f5467028c8f
Author: Tao Liu <ltao@redhat.com>
Date:   Fri Jun 18 18:28:04 2021 +0800

    [PATCH] check for invalid physical address of /proc/kcore when making ELF dumpfile

    Previously when executing makedumpfile with -E option against
    /proc/kcore, makedumpfile will fail:

      # makedumpfile -E -d 31 /proc/kcore kcore.dump
      ...
      write_elf_load_segment: Can't convert physaddr(ffffffffffffffff) to an offset.

      makedumpfile Failed.

    It's because /proc/kcore contains PT_LOAD program headers which have
    physaddr (0xffffffffffffffff).  With -E option, makedumpfile will
    try to convert the physaddr to an offset and fails.

    Skip the PT_LOAD program headers which have such physaddr.

    Signed-off-by: Tao Liu <ltao@redhat.com>
    Signed-off-by: Kazuhito Hagio <k-hagio-ab@nec.com>

Signed-off-by: Tao Liu <ltao@redhat.com>
Acked-by: Kairui Song <kasong@redhat.com>
2021-06-28 15:52:21 +08:00
Tao Liu
0feb109818 check for invalid physical address of /proc/kcore when finding max_paddr
Backport from upstream.

commit 38d921a2ef50ebd36258097553626443ffe27496
Author: Coiby Xu <coxu@redhat.com>
Date:   Tue Jun 15 18:26:31 2021 +0800

    [PATCH] check for invalid physical address of /proc/kcore when finding max_paddr

    Kernel commit 464920104bf7adac12722035bfefb3d772eb04d8 ("/proc/kcore:
    update physical address for kcore ram and text") sets an invalid paddr
    (0xffffffffffffffff = -1) for PT_LOAD segments of not direct mapped
    regions:

      $ readelf -l /proc/kcore
      ...
      Program Headers:
        Type           Offset             VirtAddr           PhysAddr
                       FileSiz            MemSiz              Flags  Align
        NOTE           0x0000000000000120 0x0000000000000000 0x0000000000000000
                       0x0000000000002320 0x0000000000000000         0x0
        LOAD           0x1000000000010000 0xd000000000000000 0xffffffffffffffff
                                                             ^^^^^^^^^^^^^^^^^^
                       0x0001f80000000000 0x0001f80000000000  RWE    0x10000

    makedumpfile uses max_paddr to calculate the number of sections for
    sparse memory model thus wrong number is obtained based on max_paddr
    (-1).  This error could lead to the failure of copying /proc/kcore
    for RHEL-8.5 on ppc64le machine [1]:

      $ makedumpfile /proc/kcore vmcore1
      get_mem_section: Could not validate mem_section.
      get_mm_sparsemem: Can't get the address of mem_section.

      makedumpfile Failed.

    Let's check if the phys_start of the segment is a valid physical
    address to fix this problem.

    [1] https://bugzilla.redhat.com/show_bug.cgi?id=1965267

    Reported-by: Xiaoying Yan <yiyan@redhat.com>
    Signed-off-by: Coiby Xu <coxu@redhat.com>

Signed-off-by: Tao Liu <ltao@redhat.com>
Acked-by: Kairui Song <kasong@redhat.com>
2021-06-28 15:52:16 +08:00
Tao Liu
18b9b763de Increase SECTION_MAP_LAST_BIT to 5
Backport from upstream.

commit 646456862df8926ba10dd7330abf3bf0f887e1b6
Author: Kazuhito Hagio <k-hagio-ab@nec.com>
Date:   Wed May 26 14:31:26 2021 +0900

    [PATCH] Increase SECTION_MAP_LAST_BIT to 5

    * Required for kernel 5.12

    Kernel commit 1f90a3477df3 ("mm: teach pfn_to_online_page() about
    ZONE_DEVICE section collisions") added a section flag
    (SECTION_TAINT_ZONE_DEVICE) and causes makedumpfile an error on
    some machines like this:

      __vtop4_x86_64: Can't get a valid pmd_pte.
      readmem: Can't convert a virtual address(ffffe2bdc2000000) to physical address.
      readmem: type_addr: 0, addr:ffffe2bdc2000000, size:32768
      __exclude_unnecessary_pages: Can't read the buffer of struct page.
      create_2nd_bitmap: Can't exclude unnecessary pages.

    Increase SECTION_MAP_LAST_BIT to 5 to fix this.  The bit had not
    been used until the change, so we can just increase the value.

    Signed-off-by: Kazuhito Hagio <k-hagio-ab@nec.com>

Signed-off-by: Tao Liu <ltao@redhat.com>
Acked-by: Kairui Song <kasong@redhat.com>
2021-06-28 15:52:02 +08:00
Kairui Song
302be5c34b Release 2.0.22-3
Signed-off-by: Kairui Song <kasong@redhat.com>
2021-06-20 02:38:04 +08:00
Coiby Xu
62578ace21 selftest: ignore all spaces when compare the dmesg files
For the log entry that has multiple lines, "makedumpfile --dump-dmesg"
would indent the remaining lines while vmcore-dmesg doesn't. For
example, vmcore-dmesg.txt and vmcore-dmesg.txt.2 are the outputs of
vmcore-dmesg and "makedumpfile --dump-dmesg" respectively,

    ```
    diff -u vmcore-dmesg.txt vmcore-dmesg.txt.2
    --- vmcore-dmesg.txt    2021-03-28 22:13:09.986000000 -0400
    +++ vmcore-dmesg.txt.2  2021-03-28 22:13:39.920106131 -0400
    @@ -397,9 +397,9 @@
     [    1.710742] vc vcsa: hash matches
     [    1.711938] RAS: Correctable Errors collector initialized.
     [    1.713736] Unstable clock detected, switching default tracing clock to "global"
    -If you want to keep using the local clock, then add:
    -  "trace_clock=local"
    -on the kernel command line
    +               If you want to keep using the local clock, then add:
    +                 "trace_clock=local"
    +               on the kernel command line
     [    1.750539] ata1.01: NODEV after polling detection
     [    1.750973] ata1.00: ATA-7: QEMU HARDDISK, 2.5+, max UDMA/100
     [    1.752885] ata1.00: 8388608 sectors, multi 16: LBA48
    ```

Quite often, all three tests could fail because of the above difference. So
let's ignore all the spaces. This patch could fix bz1952299 [1].

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1952299

Signed-off-by: Coiby Xu <coxu@redhat.com>
Acked-by: Kairui Song <kasong@redhat.com>
2021-06-08 22:21:47 +08:00
Coiby Xu
560c0e8a7b selftest: Make test_base_image depends on EXTRA_RPMS
test_base_image should depend on EXTRA_RPMS so it gets rebuild when
EXTRA_RPMS changes.

Fixes: commit bbc064f958
       ("selftest: add EXTRA_RPMs so dracut RPMs can be installed onto
        the image to run the tests")
Signed-off-by: Coiby Xu <coxu@redhat.com>
Acked-by: Kairui Song <kasong@redhat.com>
2021-06-08 22:21:47 +08:00
Coiby Xu
0a15d859bb selftest: fix the error of misplacing double quotes
Signed-off-by: Coiby Xu <coxu@redhat.com>
Acked-by: Kairui Song <kasong@redhat.com>
2021-06-08 22:21:47 +08:00
Lianbo Jiang
2d9504c4a4 mkdumprd: display the absolute path of dump location in the check_user_configured_target()
When kdump service fails, the current errors do not display the
absolute path of dump location(marked it as "^"), for example:

kdump: kexec: unloaded kdump kernel
kdump: Stopping kdump: [OK]
kdump: Detected change(s) in the following file(s):  /etc/kdump.conf
kdump: Rebuilding /boot/initramfs-4.18.0-304.el8.x86_64kdump.img
kdump: Dump path "/var1/crash" does not exist in dump target "UUID=c202ef45-3ac3-4adb-85e7-307a916757f0"
                  ^^^^^^^^^^^
kdump: mkdumprd: failed to make kdump initrd
kdump: Starting kdump: [FAILED]

Here, it should output the absolute path of dump location with this
format: "<mount path>/<path>". To fix it, let's extend the relative
pathname to the absolute pathname in check_user_configured_target().

Signed-off-by: Lianbo Jiang <lijiang@redhat.com>
Acked-by: Kairui Song <kasong@redhat.com>
2021-06-08 10:49:24 +08:00
Coiby Xu
7d47251568 Iterate /sys/bus/ccwgroup/devices to tell if we should set up rd.znet
This patch fixes bz1941106 and bz1941905 which passed empty rd.znet to the
kernel command line in the following cases,
 - The IBM (Z15) KVM guest uses virtio for all devices including network
   device, so there is no znet device for IBM KVM guest. So we can't
   assume a s390x machine always has a znet device.
 - When a bridged network is used, kexec-tools tries to obtain the znet
   configuration from the ifcfg script of the bridged network rather than
   from the ifcfg script of znet device.

We can iterate /sys/bus/ccwgroup/devices to tell if there if there is
a znet network device. By getting an ifname from znet, we can also avoid
mistaking the slave netdev as a znet network device in a bridged network
or bonded network.

Note: This patch also assumes there is only one znet device as commit
7148c0a30d ("add s390x netdev setup")
which greatly simplifies the code. According to IBM [1], there could be
more than znet devices for a z/VM system and a z/VM system may have a
non-znet network device like ConnectX. Since kdump_setup_znet was
introduced in 2012 and so far there is no known customer complaint that
invalidates this assumption I think it's safe to assume an IBM z/VM
system only has one znet device. Besides, there is no z/VM system found
on beaker to test the alternative scenarios.

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1941905#c13

Signed-off-by: Coiby Xu <coxu@redhat.com>
Acked-by: Kairui Song <kasong@redhat.com>
2021-06-08 10:48:34 +08:00
Kairui Song
41980f30d9 Use a customized emergency shell
Use a modified and minimized version of emergency shell.
The differences of this kdump shell and dracut emergency shell are:

  - Kdump shell won't generate a rdsosreport automatically
  - Customized prompts
  - Never ask root password
  - Won't tangle with dracut's emergency_action. If emergency_action is
    set, dracut emergency shell will perform dracut's emergency_action
    instead of kdump final_action on exit.
  - If rd.shell=no is set, kdump shell will still work, dracut emergency
    shell won't, even if kdump failure_action is set to shell.

Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Coiby Xu <coxu@redhat.com>
2021-06-04 14:26:51 +08:00
Kairui Song
a2306346bc Remove the kdump error handler isolation wrapper
The wrapper is introduced in commit 002337c, according to the commit
message, the only usage of the wrapper is when dracut-initqueue calls
"systemctl start emergency" directly. In that case, emergency
is started, but not in a isolation mode, which means dracut-initqueue
is still running. On the other hand, emergency will call
"systemctl start dracut-initqueue" again when default action is dump_to_rootfs.

systemd would block on the last dracut-initqueue, waiting for the first
instance to exit, which leaves us hang.

In previous commit we added initqueue status detect in dump_to_rootfs,
so now even without the wrapper, it will not hang.

And actually, previously, with the wrapper, emergency might still hang
for like 30s. When dracut called emergency service because initqueue
timed out, dump_to_rootfs will try start initqueue again and timeout
again. Now with the wrapper removed, we can avoid these two kinds of
hangs, bacause without the isolation we can detect initqueue service
status correctly in such case.

Also remove the invalid header comments in service file, the service
is not part of systemd code. And sync the service spec with dracut.

Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Coiby Xu <coxu@redhat.com>
2021-06-04 14:26:45 +08:00
Kairui Song
108258139a Don's try to restart dracut-initqueue if it's already there
kdump's dump_to_rootfs will try to start initqueue unconditionally.
dump_to_rootfs will run after systemd isolate to emergency
target, so this is currently accetable.

But there is a problem when initqueue starts the emergency action
because of initqueue timeout. dump_to_rootfs will start initqueue and
lead to timeout again.

So following patch will remove the previous isolation wrapper, and
detect the service status here. Previous isolation makes the detection
impossible. Now this detection will be valid and helpful to prevent
double timeout or hang.

Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Coiby Xu <coxu@redhat.com>
2021-06-04 14:25:37 +08:00
Hari Bathini
39a642b66b kdump-lib.sh: fix a warning in prepare_kdump_bootinfo()
Fix the warning observed when KDUMP_KERNELVER is specified:

  kdumpctl[10926]: /lib/kdump/kdump-lib.sh: line 697: [: missing `]'

Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Acked-by: Kairui Song <kasong@redhat.com>
2021-06-04 02:53:10 +08:00
Pingfan Liu
45377836b0 kdump-lib.sh: fix the case if no enough total RAM for kdump in get_recommend_size()
For crashkernel=auto policy, if total RAM size is under a throttle,
there is no memory reserved for kdump.

Also correct a trivial bug by correcting the arch name.

Signed-off-by: Pingfan Liu <piliu@redhat.com>
Acked-by: Kairui Song <kasong@redhat.com>
2021-05-25 10:07:55 +08:00
Kairui Song
e9e6a2c745 kdumpctl: Add kdumpctl estimate
Add a rough esitimation support, currently, following memory usage are
checked by this sub command:

- System RAM
- Kdump Initramfs size
- Kdump Kernel image size
- Kdump Kernel module size
- Kdump userspace user and other runtime allocated memory (currently
  simply using a fixed value: 64M)
- LUKS encryption memory usage

The output of kdumpctl estimate looks like this:
  # kdumpctl estimate
  Reserved crashkernel:    256M
  Recommanded crashkernel: 160M

  Kernel image size:   47M
  Kernel modules size: 12M
  Initramfs size:      19M
  Runtime reservation: 64M
  Large modules:
      xfs: 1892352
      nouveau: 2318336

And if the kdump target is encrypted:
  # kdumpctl estimate
  Encrypted kdump target requires extra memory, assuming using the keyslot with minimun memory requirement

  Reserved crashkernel:    256M
  Recommanded crashkernel: 655M

  Kernel image size:   47M
  Kernel modules size: 12M
  Initramfs size:      19M
  Runtime reservation: 64M
  LUKS required size:  512M
  Large modules:
      xfs: 1892352
      nouveau: 2318336
  WARNING: Current crashkernel size is lower than recommanded size 655M.

The "Recommanded" value is calculated based on memory usages mentioned
above, and will be adjusted accodingly to be no less than the value provided
by kdump_get_arch_recommend_size.

Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Pingfan Liu <piliu@redhat.com>
2021-05-19 15:27:43 +08:00
Kairui Song
85c725813b mkdumprd: make use of the new get_luks_crypt_dev helper
Simplfy the code and also improve the performance. udevadm call is
heavy.

Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Pingfan Liu <piliu@redhat.com>
2021-05-19 15:27:37 +08:00
Kairui Song
1c70cf51c7 kdump-lib.sh: introduce a helper to get all crypt dev used by kdump
Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Pingfan Liu <piliu@redhat.com>
2021-05-19 15:27:19 +08:00
Kairui Song
3423bbc17f kdump-lib.sh: introduce a helper to get underlying crypt device
Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Pingfan Liu <piliu@redhat.com>
2021-05-19 15:26:58 +08:00
Kairui Song
13796ca93a Release 2.0.22-2
Signed-off-by: Kairui Song <kasong@redhat.com>
2021-05-13 17:14:38 +08:00
Tao Liu
d5fe96cd7a Disable CMA in kdump 2nd kernel
kexec-tools needs to disable CMA for kdump kernel cmdline,
otherwise kdump kernel may run out of memory.

This patch strips the inherited cma=, hugetlb_cma= cmd
line from 1st kernel, and sets to be 0 for 2nd kernel.

Signed-off-by: Tao Liu <ltao@redhat.com>
Acked-by: Kairui Song <kasong@redhat.com>
2021-05-13 17:13:39 +08:00
Coiby Xu
8178d7a5a1 Warn the user if network scripts are used
Signed-off-by: Coiby Xu <coxu@redhat.com>
Acked-by: Kairui Song <kasong@redhat.com>
2021-05-13 17:13:31 +08:00
Coiby Xu
d5f6d38173 Set up bond cmdline by "nmcli --get-values"
Now kdumpctl will exit if failing to set up bond cmdline.

Signed-off-by: Coiby Xu <coxu@redhat.com>
Acked-by: Kairui Song <kasong@redhat.com>
2021-05-13 17:13:28 +08:00
Coiby Xu
6f1badec78 Set up dns cmdline by parsing "nmcli --get-values"
Signed-off-by: Coiby Xu <coxu@redhat.com>
Acked-by: Kairui Song <kasong@redhat.com>
2021-05-13 17:13:25 +08:00
Coiby Xu
8b08b4f17b Set up s390 znet cmdline by "nmcli --get-values"
Now kdumpctl will abort when failing to set up znet.

Signed-off-by: Coiby Xu <coxu@redhat.com>
Acked-by: Kairui Song <kasong@redhat.com>
2021-05-13 17:13:15 +08:00
Coiby Xu
0c292f49c7 Add helper to get nmcli connection show cmd by ifname
Signed-off-by: Coiby Xu <coxu@redhat.com>
Acked-by: Kairui Song <kasong@redhat.com>
2021-05-13 17:13:08 +08:00
Coiby Xu
c69578ca43 Add helper to get nmcli connection apath by ifname
apath (a D-Bus active connection path) is used for nmcli connection operations, e.g.
  $ nmcli connection show $apath

Signed-off-by: Coiby Xu <coxu@redhat.com>
Acked-by: Kairui Song <kasong@redhat.com>
2021-05-13 17:13:01 +08:00
Coiby Xu
10c309b5f7 Add helper to get value by field using "nmcli --get-values"
nmcli --get-values <field> connection show /org/freedesktop/NetworkManager/ActiveConnection/1
returns the following value for the corresponding field respectively,
  Field                                  Value
  IP4.DNS                                "10.19.42.41 | 10.11.5.19 | 10.5.30.160"
  802-3-ethernet.s390-subchannels        ""
  bond.options                           "mode=balance-rr"

Signed-off-by: Coiby Xu <coxu@redhat.com>
Acked-by: Kairui Song <kasong@redhat.com>
2021-05-13 17:11:38 +08:00
Kairui Song
c05d8a16a0 Update makedumpfile to 1.6.9
Signed-off-by: Kairui Song <kasong@redhat.com>
2021-05-13 16:45:36 +08:00
Kairui Song
dece041609 Release 2.0.22-1
Update kexec-tools to 2.0.22

Signed-off-by: Kairui Song <kasong@redhat.com>
2021-05-11 02:12:50 +08:00
Coiby Xu
8a33ffffbc rd.route should use the name from kdump_setup_ifname
This fixes bz1854037 which happens because kexec-tools generates rd.route for
eth0 instead of for kdump-eth0,
 1. "rd.route=168.63.129.16:10.0.0.1:eth0 rd.route=169.254.169.254:10.0.0.1:eth0" is passed to the dracut cmdline by kexec-tools
 2. In the 2rd kernel, dracut/modules.d/35network-manager/nm-config.sh calls
    /usr/libexec/nm-initrd-generator to generate two .nmconnection files
    based on the dracut cmdline, i.e. kdump-eth0.nmconnection and eth0.nmconnection,
    - /run/NetworkManager/system-connections/kdump-eth0.nmconnection
        [connection]
        id=kdump-eth0
        uuid=3ef53b1b-3908-437e-a15f-cf1f3ea2678b
        type=ethernet
        autoconnect-retries=1
        interface-name=kdump-eth0
        multi-connect=1
        permissions=
        wait-device-timeout=60000
        [ethernet]
        mac-address-blacklist=
        [ipv4]
        address1=10.0.0.4/24,10.0.0.1
        dhcp-timeout=90
        dns=168.63.129.16;
        dns-search=
        may-fail=false
        method=manual
        [ipv6]
        addr-gen-mode=eui64
        dhcp-timeout=90
        dns-search=
        method=disabled
        [proxy]

    - /run/NetworkManager/system-connections/eth0.nmconnection
        [connection]
        id=eth0
        uuid=f224dc22-2891-4d7b-8f66-745029df4b53
        type=ethernet
        autoconnect-retries=1
        interface-name=eth0
        multi-connect=1
        permissions=
        [ethernet]
        mac-address-blacklist=
        [ipv4]
        dhcp-timeout=90
        dns=168.63.129.16;
        dns-search=
        method=auto
        route1=168.63.129.16/32,10.0.0.1
        route2=169.254.169.254/32,10.0.0.1
        [ipv6]
        addr-gen-mode=eui64
        dhcp-timeout=90
        dns-search=
        method=auto
        [proxy]

 3. Since there's eth0.nmconnection, NetworkManager will try to get an IP for eth0 regardless of the fact it's a slave NIC and time out
    ```
    $ ip link show
    2: kdump-eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
       link/ether 00:0d:3a:11:86:8b brd ff:ff:ff:ff:ff:ff
    3: eth0: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master kdump-eth0 state UP mode DEFAULT group default qlen 1000
    ```

Reported-by: Huijing Hei <hhei@redhat.com>
Signed-off-by: Coiby Xu <coxu@redhat.com>
Acked-by: Kairui Song <kasong@redhat.com>
2021-05-11 02:11:50 +08:00
Coiby Xu
97ee5dc64c get kdump ifname once in kdump_install_net
Signed-off-by: Coiby Xu <coxu@redhat.com>
Acked-by: Kairui Song <kasong@redhat.com>
2021-05-11 02:11:50 +08:00
Tao Liu
ca05b754af Fix incorrect file permissions of vmcore-dmesg-incomplete.txt
vmcore-dmesg-incomplete.txt is generated by shell redirection,
which taking the default umask value. When dmesg collector exits
with non-zero, the file will exist and anyone can have access to
it.

This patch fixed the issue by chmod the file, making it accessible
only to its owner.

Signed-off-by: Tao Liu <ltao@redhat.com>
Acked-by: Kairui Song <kasong@redhat.com>
2021-05-11 02:11:50 +08:00
Kairui Song
ee160bf04d Revert "Always set vm.zone_reclaim_mode = 3 in kdump kernel"
This reverts commit 5633e83318.

vm.zone_reclaim_mode may cause trashing on some machines. And after
second thought, vm.zone_reclaim_mode is barely helpful for machines
with high mem stress, so just revert it.

Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Pingfan Liu <piliu@redhat.com>
2021-04-28 18:05:23 +08:00
Kairui Song
6137956f79 kdumpctl: fix check_config error when kdump.conf is empty
Kdump scirpt already have default values for core_collector, path in
many other place. Empty kdump.conf still works. Fix this corner case and
fix the error message.

Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Pingfan Liu <piliu@redhat.com>
2021-04-28 18:05:12 +08:00
Kairui Song
d0a301aa3a Release 2.0.21-9
Signed-off-by: Kairui Song <kasong@redhat.com>
2021-04-28 16:51:52 +08:00
Tao Liu
475e33030b Make dracut-squash required for kexec-tools
This patch reverts commit "Make dracut-squash a weak dep".

Although kexec-tools can work without dracut-squash, it is essential
for kdump to run properly in cases [1][2] where minimal amount of memory
consumption is expected. Thus dracut-squash is needed for it.

[1] https://lists.fedoraproject.org/archives/list/kexec@lists.fedoraproject.org/message/SJX7CW3WLOYSFI2YJKGTUGDBWSCMZXVZ/
[2] https://www.spinics.net/lists/systemd-devel/msg05864.html

Signed-off-by: Tao Liu <ltao@redhat.com>
Acked-by: Kairui Song <kasong@redhat.com>
2021-04-28 16:13:39 +08:00
Tao Liu
0db060c4e2 Show write byte size in report messages
Backport from upstream:

commit 0ef2ca6c9fa2f61f217a4bf5d7fd70f24e12b2eb
Author: Kazuhito Hagio <k-hagio-ab@nec.com>
Date:   Thu Feb 4 16:29:06 2021 +0900

    [PATCH] Show write byte size in report messages

    Show write byte size in report messages.  This value can be different
    from the size of the actual file because of some holes on dumpfile
    data structure.

      $ makedumpfile --show-stats -l -d 1 vmcore dump.ld1
      ...
      Total pages     : 0x0000000000080000
      Write bytes     : 377686445
      ...
      # ls -l dump.ld1
      -rw------- 1 root root 377691573 Feb  4 16:28 dump.ld1

    Note that this value should not be used with /proc/kcore to determine
    how much disk space is needed for crash dump, because the real memory
    usage when a crash occurs can vary widely.

    Signed-off-by: Kazuhito Hagio <k-hagio-ab@nec.com>

Signed-off-by: Tao Liu <ltao@redhat.com>
Acked-by: Kairui Song <kasong@redhat.com>
2021-04-28 16:13:23 +08:00
Tao Liu
8973bd7ed0 Add shorthand --show-stats option to show report stats
Backport from upstream:

commit 6f3e75a558ed50d6ff0b42e3f61c099b2005b7bb
Author: Julien Thierry <jthierry@redhat.com>
Date:   Tue Nov 24 10:45:25 2020 +0000

    [PATCH 2/2] Add shorthand --show-stats option to show report stats

    Provide shorthand --show-stats option to enable report messages
    without needing to set a particular value for message-level.

    Signed-off-by: Julien Thierry <jthierry@redhat.com>
    Signed-off-by: Kazuhito Hagio <k-hagio-ab@nec.com>

Signed-off-by: Tao Liu <ltao@redhat.com>
Acked-by: Kairui Song <kasong@redhat.com>
2021-04-28 15:45:25 +08:00
Tao Liu
e1ab0275c0 Add --dry-run option to prevent writing the dumpfile
Backport from upstream.

commit 3422e1d6bc3511c5af9cb05ba74ad97dd93ffd7f
Author: Julien Thierry <jthierry@redhat.com>
Date:   Tue Nov 24 10:45:24 2020 +0000

    [PATCH 1/2] Add --dry-run option to prevent writing the dumpfile

    Add a --dry-run option to run all operations without writing the
    dump to the output file.

    Signed-off-by: Julien Thierry <jthierry@redhat.com>
    Signed-off-by: Kazuhito Hagio <k-hagio-ab@nec.com>

Signed-off-by: Tao Liu <ltao@redhat.com>
Acked-by: Kairui Song <kasong@redhat.com>
2021-04-28 15:45:08 +08:00
Hari Bathini
d0e9c51e0d fadump: fix dump capture failure to root disk
If the dump target is the root disk, kdump scripts add an entry in
/etc/fstab for root disk with /sysroot as the mount point. The root
disk, passed through root=<> kernel commandline parameter, is mounted
at /sysroot in read-only mode before switching from initial ramdisk.
So, in fadump mode, a remount of /sysroot to read-write mode is needed
to capture dump successfully, because /sysroot is already mounted as
read-only based on root=<> boot parameter.

Commit e8ef4db8ff ("Fix dump_fs mount point detection and fallback
mount") removed initialization of $_op variable, the variable holding
the options the dump target was mounted with, leading to the below
error as remount was skipped:

  kdump[586]: saving to /sysroot/var/crash/127.0.0.1-2021-04-22-07:22:08/
  kdump.sh[587]: mkdir: cannot create directory '/sysroot/var/crash/127.0.0.1-2021-04-22-07:22:08/': Read-only file system
  kdump[589]: saving vmcore failed

Restore $_op variable initialization in dump_fs() function to fix this.

Fixes: e8ef4db8ff ("Fix dump_fs mount point detection and fallback mount")
Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Acked-by: Kairui Song <kasong@redhat.com>
2021-04-28 15:41:33 +08:00
Kairui Song
8d0ef743e0 Revert "get kdump ifname once in kdump_install_net"
This reverts commit afda4f4961.
2021-04-28 15:18:53 +08:00
Kairui Song
b0156e9b64 Revert "pass kdumpnic to kdump_setup_/bond/bridge/vlan directly"
This reverts commit 586d767697.
2021-04-28 15:18:53 +08:00
Kairui Song
4753ab2c70 Revert "rd.route should use the name from kdump_setup_ifname"
This reverts commit 18ffd3cb17.
2021-04-28 15:18:53 +08:00
Coiby Xu
18ffd3cb17 rd.route should use the name from kdump_setup_ifname
This fix bz1854037 which happens because kexec-tools generates rd.route for
eth0 instead of for kdump-eth0,
 1. "rd.route=168.63.129.16:10.0.0.1:eth0 rd.route=169.254.169.254:10.0.0.1:eth0" is passed to the dracut cmdline by kexec-tools
 2. In the 2rd kernel,
    - dracut/modules.d/40network/net-lib.sh will write /tmp/net.route.eth0 based on rd.route
    - dracut/modules.d/45ifcfg/write-ifcfg.sh will copy /tmp/net.route.eth0 to /tmp/icfg and then copytree /tmp/ifcfg to /run/initramfs/state/etc/sysconfig/network-scripts
 3. NetworkManager will try to get an IP for eth0 regardless of the fact it's a slave NIC and time out
    ```
    $ ip link show
    2: kdump-eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
       link/ether 00:0d:3a:11:86:8b brd ff:ff:ff:ff:ff:ff
    3: eth0: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master kdump-eth0 state UP mode DEFAULT group default qlen 1000
    ```

Reported-by: Huijing Hei <hhei@redhat.com>
Signed-off-by: Coiby Xu <coxu@redhat.com>
Acked-by: Kairui Song <kasong@redhat.com
2021-04-25 16:56:32 +08:00
Coiby Xu
586d767697 pass kdumpnic to kdump_setup_/bond/bridge/vlan directly
This avoids calling kdump_setup_ifname repeatedly.

Signed-off-by: Coiby Xu <coxu@redhat.com>
Acked-by: Kairui Song <kasong@redhat.com
2021-04-25 16:52:29 +08:00
Coiby Xu
afda4f4961 get kdump ifname once in kdump_install_net
Signed-off-by: Coiby Xu <coxu@redhat.com>
Acked-by: Kairui Song <kasong@redhat.com
2021-04-25 16:52:21 +08:00