upstream: fedora
resolves: bz2003832
conflict: none
commit 09ccf88405
Author: Kairui Song <kasong@redhat.com>
Date: Mon Aug 16 23:25:14 2021 +0800
kdump-lib.sh: add a config value retrive helper
Add a helper kdump_get_conf_val to replace get_option_value.
It can help cover more corner cases in the code, like when there are
multiple spaces in config file, config value separated by a tab,
heading spaces, or trailing comments.
And this uses "sed group command" and "sed hold buffer", make it much
faster than previous `grep <config> | tail -1`.
This helper is supposed to provide a universal way for kexec-tools
scripts to read in config value. Currently, different scripts are
reading the config in many different fragile ways.
For example, following codes are found in kexec-tools script code base:
1. grep ^force_rebuild $KDUMP_CONFIG_FILE
echo $_force_rebuild | cut -d' ' -f2
2. grep ^kdump_post $KDUMP_CONFIG_FILE | cut -d\ -f2
3. awk '/^sshkey/ {print $2}' $conf_file
4. grep ^path $KDUMP_CONFIG_FILE | cut -d' ' -f2-
1, 2, and 4 will fail if the space is replaced by, e.g. a tab
1 and 2 might fail if there are multiple spaces between config name
and config value:
"kdump_post /var/crash/scripts/kdump-post.sh"
A space will be read instead of config value.
1, 2, 3 will fail if there are space in file path, like:
"kdump_post /var/crash/scripts dir/kdump-post.sh"
4 will fail if there are trailing comments:
"path /var/crash # some comment here"
And all will fail if there are heading space,
" path /var/crash"
And all will most likely cause problems if the config file contains
the same option more than once.
And all of them are slower than the new sed call. Old get_option_value
is also very slow and doesn't handle heading space.
Although we never claim to support heading space or tailing comments
before, it's harmless to be more robust on config reading, and many
conf files in /etc support heading spaces. And have a faster and
safer config reading helper makes it easier to clean up the code.
Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Philipp Rudo <prudo@redhat.com>
Signed-off-by: Tao Liu <ltao@redhat.com>
upstream: fedora
resolves: bz2003832
conflict: none
commit a0282ab22c
Author: Kairui Song <kasong@redhat.com>
Date: Tue Aug 3 19:49:51 2021 +0800
kdump-lib.sh: add a config format and read helper
Add a helper `kdump_read_conf` to replace read_strip_comments.
`kdump_read_conf` does a few more things:
- remove trailing spaces.
- format the content, remove duplicated spaces between name and value.
- read from KDUMP_CONFIG_FILE (/etc/kdump.conf) directly, avoid pasting
"/etc/kdump.conf" path everywhere in the code.
- check if config file exists, just in case.
Also unify the environmental variable, now KDUMP_CONFIG_FILE stands for
the default config location.
This helps avoid some shell pitfalls about spaces when reading config.
Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Philipp Rudo <prudo@redhat.com>
Signed-off-by: Tao Liu <ltao@redhat.com>
Resolves: bz1982474
Upstream: Fedora
Conflict: None
commit b2bbb54d89
Author: Coiby Xu <coxu@redhat.com>
Date: Thu Jul 15 09:18:33 2021 +0800
Check the existence of /sys/bus/ccwgroup/devices/*/online beforehand
On s390x KVM machines, the following errors would show when building kdump
initramfs that dumps vmcore to a remote target,
$ kdumpctl rebuild
/usr/lib/dracut/modules.d/99kdumpbase/module-setup.sh: line 475: /sys/bus/ccwgroup/devices/online: No such file or directory
/usr/lib/dracut/modules.d/99kdumpbase/module-setup.sh: line 476: [: -ne: unary operator expected
This happens because s390x KVM machines use virtual network and
/sys/bus/ccwgroup/devices/ exists but is empty. Fix it by check
the existence of file "/sys/bus/ccwgroup/devices/*/online".
Fixes: commit 7d47251568
("Iterate /sys/bus/ccwgroup/devices to tell if we should set up rd.znet")
BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1982474
Reported-by: Jie Li <jieli@redhat.com>
Signed-off-by: Coiby Xu <coxu@redhat.com>
Acked-by: Kairui Song <kasong@redhat.com>
Signed-off-by: Coiby Xu <coxu@redhat.com>
Resolves: bz1924115
Conflict: None
Upstream: Fedora
commit fa9201b240 (devel)
Author: Hari Bathini <hbathini@linux.ibm.com>
Date: Wed Jun 23 20:06:48 2021 +0530
fadump: isolate fadump initramfs image within the default one
In case of fadump, the initramfs image has to be built to boot into
the production environment as well as to offload the active crash dump
to the specified dump target (for boot after crash). As the same image
would be used for both boot scenarios, it could not be built optimally
while accommodating both cases.
Use --include to include the initramfs image built for offloading
active crash dump to the specified dump target. Also, introduce a new
out-of-tree dracut module (99zz-fadumpinit) that installs a customized
init program while moving the default /init to /init.dracut. This
customized init program is leveraged to isolate fadump image within
the default initramfs image by kicking off default boot process
(exec /init.dracut) for regular boot scenario and activating fadump
initramfs image, if the system is booting after a crash.
If squash is available, ensure default initramfs image is also built
with squash module to reduce memory consumption in capture kernel.
Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Kairui Song <kasong@redhat.com>
Signed-off-by: Kairui Song <kasong@redhat.com>
Related: bz1977543
Upstream: Fedora
Conflict: None
commit ad6f60d70d
Author: Coiby Xu <coxu@redhat.com>
Date: Mon Jun 28 18:37:11 2021 +0800
fix format issue in find_online_znet_device
Change spaces to tab to fix alignment issue.
Fixes: commit 7d47251568
("Iterate /sys/bus/ccwgroup/devices to tell if we should set up rd.znet")
Signed-off-by: Coiby Xu <coxu@redhat.com>
Acked-by: Kairui Song <kasong@redhat.com>
Signed-off-by: Coiby Xu <coxu@redhat.com>
Resolves: bz1977543
Upstream: Fedora
Conflict: None
commit 03f9b91351
Author: Coiby Xu <coxu@redhat.com>
Date: Mon Jun 28 18:37:10 2021 +0800
check the existence of /sys/bus/ccwgroup/devices before trying to find online network device
/sys/bus/ccwgroup/devices doesn't exist for non-s390x machines which leads to
the warning "find: '/sys/bus/ccwgroup/devices': No such file or directory".
This warning can be eliminated by checking the existence of
"/sys/bus/ccwgroup/devices" beforehand.
Fixes: commit 7d47251568
("Iterate /sys/bus/ccwgroup/devices to tell if we should set up rd.znet")
Reported-by: Ruowen Qin <ruqin@redhat.com>
BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1974618
Signed-off-by: Coiby Xu <coxu@redhat.com>
Acked-by: Kairui Song <kasong@redhat.com>
Signed-off-by: Coiby Xu <coxu@redhat.com>
Resolves: bz1941905
Upstream: Fedora
Conflict: None
commit 7d47251568
Author: Coiby Xu <coxu@redhat.com>
Date: Mon Jun 7 07:26:03 2021 +0800
Iterate /sys/bus/ccwgroup/devices to tell if we should set up rd.znet
This patch fixes bz1941106 and bz1941905 which passed empty rd.znet to the
kernel command line in the following cases,
- The IBM (Z15) KVM guest uses virtio for all devices including network
device, so there is no znet device for IBM KVM guest. So we can't
assume a s390x machine always has a znet device.
- When a bridged network is used, kexec-tools tries to obtain the znet
configuration from the ifcfg script of the bridged network rather than
from the ifcfg script of znet device.
We can iterate /sys/bus/ccwgroup/devices to tell if there if there is
a znet network device. By getting an ifname from znet, we can also avoid
mistaking the slave netdev as a znet network device in a bridged network
or bonded network.
Note: This patch also assumes there is only one znet device as commit
7148c0a30d ("add s390x netdev setup")
which greatly simplifies the code. According to IBM [1], there could be
more than znet devices for a z/VM system and a z/VM system may have a
non-znet network device like ConnectX. Since kdump_setup_znet was
introduced in 2012 and so far there is no known customer complaint that
invalidates this assumption I think it's safe to assume an IBM z/VM
system only has one znet device. Besides, there is no z/VM system found
on beaker to test the alternative scenarios.
[1] https://bugzilla.redhat.com/show_bug.cgi?id=1941905#c13
Signed-off-by: Coiby Xu <coxu@redhat.com>
Acked-by: Kairui Song <kasong@redhat.com>
Signed-off-by: Coiby Xu <coxu@redhat.com>
Resolves: bz1901024
Upstream: Fedora
Conflict: None
commit a2306346bc
Author: Kairui Song <kasong@redhat.com>
Date: Mon Apr 26 17:09:56 2021 +0800
Remove the kdump error handler isolation wrapper
The wrapper is introduced in commit 002337c, according to the commit
message, the only usage of the wrapper is when dracut-initqueue calls
"systemctl start emergency" directly. In that case, emergency
is started, but not in a isolation mode, which means dracut-initqueue
is still running. On the other hand, emergency will call
"systemctl start dracut-initqueue" again when default action is dump_to_rootfs.
systemd would block on the last dracut-initqueue, waiting for the first
instance to exit, which leaves us hang.
In previous commit we added initqueue status detect in dump_to_rootfs,
so now even without the wrapper, it will not hang.
And actually, previously, with the wrapper, emergency might still hang
for like 30s. When dracut called emergency service because initqueue
timed out, dump_to_rootfs will try start initqueue again and timeout
again. Now with the wrapper removed, we can avoid these two kinds of
hangs, bacause without the isolation we can detect initqueue service
status correctly in such case.
Also remove the invalid header comments in service file, the service
is not part of systemd code. And sync the service spec with dracut.
Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Coiby Xu <coxu@redhat.com>
Signed-off-by: Kairui Song <kasong@redhat.com>
Resolves: bz1896247
Upstream: fedora
Conflict: none
commit ee160bf04d
Author: Kairui Song <kasong@redhat.com>
Date: Mon Apr 19 23:00:10 2021 +0800
Revert "Always set vm.zone_reclaim_mode = 3 in kdump kernel"
This reverts commit 5633e83318.
vm.zone_reclaim_mode may cause trashing on some machines. And after
second thought, vm.zone_reclaim_mode is barely helpful for machines
with high mem stress, so just revert it.
Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Pingfan Liu <piliu@redhat.com>
Signed-off-by: Kairui Song <kasong@redhat.com>
Resolves: bz1947928
Upstream: fedora
Conflict: none
commit 475e33030b
Author: Tao Liu <ltao@redhat.com>
Date: Sun Apr 25 17:05:42 2021 +0800
Make dracut-squash required for kexec-tools
This patch reverts commit "Make dracut-squash a weak dep".
Although kexec-tools can work without dracut-squash, it is essential
for kdump to run properly in cases [1][2] where minimal amount of memory
consumption is expected. Thus dracut-squash is needed for it.
[1] https://lists.fedoraproject.org/archives/list/kexec@lists.fedoraproject.org/message/SJX7CW3WLOYSFI2YJKGTUGDBWSCMZXVZ/
[2] https://www.spinics.net/lists/systemd-devel/msg05864.html
Signed-off-by: Tao Liu <ltao@redhat.com>
Acked-by: Kairui Song <kasong@redhat.com>
Signed-off-by: Tao Liu <ltao@redhat.com>
Resolves: bz1919052
Upstream: Fedora
Conflict: None
commit d5f6d38173
Author: Coiby Xu <coxu@redhat.com>
Date: Thu Apr 1 15:32:13 2021 +0800
Set up bond cmdline by "nmcli --get-values"
Now kdumpctl will exit if failing to set up bond cmdline.
Signed-off-by: Coiby Xu <coxu@redhat.com>
Acked-by: Kairui Song <kasong@redhat.com>
Signed-off-by: Coiby Xu <coxu@redhat.com>
Resolves: bz1919052
Upstream: Fedora
Conflict: None
commit 8b08b4f17b
Author: Coiby Xu <coxu@redhat.com>
Date: Thu Apr 1 15:32:11 2021 +0800
Set up s390 znet cmdline by "nmcli --get-values"
Now kdumpctl will abort when failing to set up znet.
Signed-off-by: Coiby Xu <coxu@redhat.com>
Acked-by: Kairui Song <kasong@redhat.com>
Signed-off-by: Coiby Xu <coxu@redhat.com>
Resolves: bz1950932
Upstream: Fedora
Conflict: None
commit 8a33ffffbc
Author: Coiby Xu <coxu@redhat.com>
Date: Thu May 6 09:20:27 2021 +0800
rd.route should use the name from kdump_setup_ifname
This fixes bz1854037 which happens because kexec-tools generates rd.route for
eth0 instead of for kdump-eth0,
1. "rd.route=168.63.129.16:10.0.0.1:eth0 rd.route=169.254.169.254:10.0.0.1:eth0" is passed to the dracut cmdline by kexec-tools
2. In the 2rd kernel, dracut/modules.d/35network-manager/nm-config.sh calls
/usr/libexec/nm-initrd-generator to generate two .nmconnection files
based on the dracut cmdline, i.e. kdump-eth0.nmconnection and eth0.nmconnection,
- /run/NetworkManager/system-connections/kdump-eth0.nmconnection
[connection]
id=kdump-eth0
uuid=3ef53b1b-3908-437e-a15f-cf1f3ea2678b
type=ethernet
autoconnect-retries=1
interface-name=kdump-eth0
multi-connect=1
permissions=
wait-device-timeout=60000
[ethernet]
mac-address-blacklist=
[ipv4]
address1=10.0.0.4/24,10.0.0.1
dhcp-timeout=90
dns=168.63.129.16;
dns-search=
may-fail=false
method=manual
[ipv6]
addr-gen-mode=eui64
dhcp-timeout=90
dns-search=
method=disabled
[proxy]
- /run/NetworkManager/system-connections/eth0.nmconnection
[connection]
id=eth0
uuid=f224dc22-2891-4d7b-8f66-745029df4b53
type=ethernet
autoconnect-retries=1
interface-name=eth0
multi-connect=1
permissions=
[ethernet]
mac-address-blacklist=
[ipv4]
dhcp-timeout=90
dns=168.63.129.16;
dns-search=
method=auto
route1=168.63.129.16/32,10.0.0.1
route2=169.254.169.254/32,10.0.0.1
[ipv6]
addr-gen-mode=eui64
dhcp-timeout=90
dns-search=
method=auto
[proxy]
3. Since there's eth0.nmconnection, NetworkManager will try to get an IP for eth0 regardless of the fact it's a slave NIC and time out
```
$ ip link show
2: kdump-eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
link/ether 00:0d:3a:11:86:8b brd ff:ff:ff:ff:ff:ff
3: eth0: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master kdump-eth0 state UP mode DEFAULT group default qlen 1000
```
Reported-by: Huijing Hei <hhei@redhat.com>
Signed-off-by: Coiby Xu <coxu@redhat.com>
Acked-by: Kairui Song <kasong@redhat.com>
Signed-off-by: Coiby Xu <coxu@redhat.com>
Resolves: bz1947347
Upstream: Fedora
Conflict: None
commit 1ca1b71780
Author: Coiby Xu <coxu@redhat.com>
Date: Thu Apr 8 11:44:26 2021 +0800
Implement IP netmask calculation to replace "ipcalc -m"
Recently, dracut-network drops depedency on dhcp-client which requires
ipcalc. Thus the dependency chain
"kexec-tools -> dracut-network -> dhcp-client -> ipcalc"
is broken. When NIC is configured to a static IP, kexec-tools depended
on "ipcalc -m" to get netmask. This commit implements the shell
equivalent of "ipcalc -m".
The following test code shows cal_netmask_by_prefix is consistent with
"ipcalc -m",
#!/bin/bash
. dracut-module-setup.sh
for i in {0..128}; do
mask_expected=$(ipcalc -m fe::/$i| cut -d"=" -f2)
mask_actual=$(cal_netmask_by_prefix $i "-6")
if [[ "$mask_expected" != "$mask_actual" ]]; then
echo "prefix="$i, "expected="$mask_expected, "acutal="$mask_actual
exit
fi
done
echo "IPv6 tests passed"
for i in {0..32}; do
mask_expected=$(ipcalc -m 8.8.8.8/$i| cut -d"=" -f2)
mask_actual=$(cal_netmask_by_prefix $i "")
if [[ "$mask_expected" != "$mask_actual" ]]; then
echo "prefix="$i, "expected="$mask_expected, "acutal="$mask_actual
exit
fi
done
echo "IPv4 tests passed"
i=-2
res=$(cal_netmask_by_prefix "$i" "")
if [[ $? -ne 1 ]]; then
echo "cal_netmask_by_prefix should exit when prefix<0"
exit
fi
res=$(cal_netmask_by_prefix "$i" "")
if [[ $? -ne 1 ]]; then
echo "cal_netmask_by_prefix should exit when prefix<0"
exit
fi
i=33
$(cal_netmask_by_prefix $i "")
if [[ $? -ne 1 ]]; then
echo "cal_netmask_by_prefix should exit when prefix>32 for IPv4"
exit
fi
i=129
$(cal_netmask_by_prefix $i "-6")
if [[ $? -ne 1 ]]; then
echo "cal_netmask_by_prefix should exit when prefix>128 for IPv4"
exit
fi
echo "Bad prefixes tests passed"
echo "All tests passed"
Reported-by: Jie Li <jieli@redhat.com>
Signed-off-by: Coiby Xu <coxu@redhat.com>
Acked-by: Kairui Song <kasong@redhat.com>
Signed-off-by: Coiby Xu <coxu@redhat.com>
Resolves: rhbz#1938165
Upstream: fedora
Conflict: none
commit 91c802ff52
Author: Tao Liu <ltao@redhat.com>
Date: Thu Mar 18 16:52:46 2021 +0800
Fix incorrect permissions on kdump dmesg file
Also known as CVE-2021-20269. The kdump dmesg log files(kexec-dmesg.log,
vmcore-dmesg.txt) are generated by shell redirection, which take the
default umask value, making the files readable for group and others.
This patch chmod these files, making them only accessible to owner.
Signed-off-by: Tao Liu <ltao@redhat.com>
Acked-by: Kairui Song <kasong@redhat.com>
Signed-off-by: Tao Liu <ltao@redhat.com>