Most watchdogs have a parameter pretimeout, if set to non-zero, it means
before the watchdog really reset the system, it will try to panic the
kernel first, so kdump could kick in, or, just print a panic stacktrace
and then kernel should reset it self.
If we are already in kdump kernel, this is not really helpful, only
increase kernel hanging chance. And it also make thing become complex
as some watchdog triggers the kernel panic in NMI context, which
could also hang the kernel in strange ways, and fail the watchdog it
self. So just disable this parameter.
Also for hpwdt, it have another parameter kdumptimeout, which is
just designed for first kernel. The default behaviour is the watchdog
will simply stop working if timeouted, trigger a panic, and leave the
kernel to kdump. Again, if we are already in kdump this is not helpful.
So also disable that.
Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Lianbo Jiang <lijiang@redhat.com>
Currently the watchdog detection code is broken already, it
get the list of active watchdog drivers, then check if they are
set in the /etc/cmdline.d/* as preload module. But after we
switched to use squash module, /etc/cmdline.d/* is not directly visible.
So just detect whether current needed driver is installed.
Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Lianbo Jiang <lijiang@redhat.com>
In check_fs_modified, is_nfs_dump_target is already called, the dump
target can't be nfs. No need to check here.
Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Lianbo Jiang <lijiang@redhat.com>
The driver detection have nothing to do with fs detection, and currently
if the dump target is raw, the block driver detection is skipped which
is wrong. Just split it out and run the block driver detection when dump
target is fs or raw.
Also simplfied the code a bit.
Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Lianbo Jiang <lijiang@redhat.com>
- ssh-copy-id is bugged and not working, use a more robust way to sync
ssh keys
- systemd-resolvd will bind on port 53 so DHCP server won't work,
disable systemd-resolvd's builtin DNS server
Signed-off-by: Kairui Song <kasong@redhat.com>
Let's remove some redundant descriptions in the usage documentation
of the logger, and make it clear.
Signed-off-by: Lianbo Jiang <lijiang@redhat.com>
Acked-by: Kairui Song <kasong@redhat.com>
Some unused log levels have been removed, and kdump has used the
different options to control the log levels for the first kernel
and the second kernel. Therefore, let's update the kdump sysconfig
accordingly.
Signed-off-by: Lianbo Jiang <lijiang@redhat.com>
Acked-by: Kairui Song <kasong@redhat.com>
In the /etc/sysconfig/kdump, we usually use the uppercase configuration
name for all options. So let's use the same method to handle this.
Signed-off-by: Lianbo Jiang <lijiang@redhat.com>
Acked-by: Kairui Song <kasong@redhat.com>
Let's add the rd.kdumploglvl option to control log level in the second
kernel, which can make us avoid rebuilding the kdump initramfs after we
change the log level in /etc/sysconfig/kdump.
Signed-off-by: Lianbo Jiang <lijiang@redhat.com>
Acked-by: Kairui Song <kasong@redhat.com>
The kdump-logger will be used by the system service(daemons), so let's
appropriately convert the logger numeric level to syslog level with the
facility(daemon). The number is constructed by multiplying the facility
by 8 and then adding the level.
About The Syslog Protocol, please refer to the RFC5424 for more details.
Signed-off-by: Lianbo Jiang <lijiang@redhat.com>
Acked-by: Kairui Song <kasong@redhat.com>
Previously, the range of log level is from 1 to 6, and the TRACE
level and FATAL level are not used, therefore, let's remove these
unused log levels.
Now it has only four log levels: error(1), warn(2), info(3)
and debug(4). We have to remap the numeric log level to the logger
priority or syslog log level, which is finished in kdump-logger.sh
module, it is invisible for user.
Signed-off-by: Lianbo Jiang <lijiang@redhat.com>
Acked-by: Kairui Song <kasong@redhat.com>
Let's add sanity checks for the log levels in order to avoid
passing illegal log levels to the logger.
Signed-off-by: Lianbo Jiang <lijiang@redhat.com>
Acked-by: Kairui Song <kasong@redhat.com>
depend() in module-setup.sh is a better place to setup dracut module
dependency, it will do early check, and fail early if needed module is
missing. Also remove a unneeded helper add_dracut_module.
Also remove the unnecessary return in depend() function.
Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Lianbo Jiang <lijiang@redhat.com>
Let's add some code comments to help better understanding, and
no code changes.
Signed-off-by: Lianbo Jiang <lijiang@redhat.com>
Acked-by: Baoquan He <bhe@redhat.com>
When using ssh dump target, scp is always used, correct the comment.
Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Dave Young <dyoung@redhat.com>
Dracut only check if a module failed installtion if the module is listed
in --add params. Without this param, if kdumpbase failed to install due
to any reason, dracut will still build the initramfs only print a
warning. Add this param to ensure it fail early.
Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Dave Young <dyoung@redhat.com>
The dracut module is opportunistic about using the built-in squashfs
support only when available, but the spec file hard requires it. Demote
it to a weak dep to truly make it optional.
This caters to environments which strive to stay minimal, like FCOS and
RHCOS. See https://github.com/coreos/fedora-coreos-config/pull/708 for
details.
Because otherwise, `kdumpctl start` will fail anyway. This makes it
easier to enable kdump by simply adding the mandatory karg and leaving
the service enabled.
This reverts commit fa8aa52d94.
For the s390x, the vmlinuz image has only single signature according
to the kernel.spec. The dual signature issue doesn't happens on s390x,
therefore, let's restore it in order to enable the file load on s390x
by default.
Signed-off-by: Lianbo Jiang <lijiang@redhat.com>
Acked-by: Kairui Song <kasong@redhat.com>
Currently, the makedumpfile option '--message-level' is set to 1 when
dumping the vmcore, it only displays the progress indicator message,
but there are no common message and error message, it is important to
report some additional messages, especially for the error message,
which is very useful for the debugging.
In view of this, let's change the message level to 7 by default.
Signed-off-by: Lianbo Jiang <lijiang@redhat.com>
Acked-by: Kairui Song <kasong@redhat.com>
Commit 08276e9 wrongly raise this warning message to error level, fix
this.
Fixes: 08276e9 ('Rework check_config and warn on any duplicated option')
Signed-off-by: Kairui Song <kasong@redhat.com>
Previously journalctl logs are directly dropped to save memory, but this
make journalctl unusable in kdump kernel and diffcult to debug. So
instead just don't let it read kmsg but keep other logs stored as volatile.
Kernel message are already stored in the kernel log ring buffer,
no need to let journalctl make a copy, especially when in kdump
kernel, ususlly there won't be too much kernel log overlapping
the old ring buffer.
Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Lianbo Jiang <lijiang@redhat.com>
Because the logger is introduced to output the kdump logs, need to
add a documentation for this change and describe how to use it.
Signed-off-by: Lianbo Jiang <lijiang@redhat.com>
Acked-by: Kairui Song <kasong@redhat.com>
Let's use the logger in the second kernel and collect the kernel ring
buffer(dmesg) of the second kernel.
Signed-off-by: Lianbo Jiang <lijiang@redhat.com>
Acked-by: Kairui Song <kasong@redhat.com>
Currently, the kexec option '--debug/-d' is not enabled by default, which
means that users need to set it manually and wait for the next failure to
capture the additional information.
Therefore, let's enable the option '-d' for kexec loading by default.
Signed-off-by: Lianbo Jiang <lijiang@redhat.com>
Acked-by: Kairui Song <kasong@redhat.com>
The kdump logger has the default values of the log levels, but
sometimes, need to change the value of log level in order to
get more debugging messages for troubleshooting.
Here, user will have a chance to reconfigure it.
Signed-off-by: Lianbo Jiang <lijiang@redhat.com>
Acked-by: Kairui Song <kasong@redhat.com>
Since the logger was introduced into kdump, let's enable it for kdump
so that we can output kdump messages according the log level and save
these messages for debugging.
Signed-off-by: Lianbo Jiang <lijiang@redhat.com>
Acked-by: Kairui Song <kasong@redhat.com>
Currently, all messages are directly printed to the console, sometimes,
we also need to output these messages to the journal log according to
the log level.
In view of this, introduce the kdump logger from the dracut module.
Signed-off-by: Lianbo Jiang <lijiang@redhat.com>
Acked-by: Kairui Song <kasong@redhat.com>
Instead of read and parse the kdump.conf multiple times, only read once
and use a single loop to handle the error check, which is faster.
Also check for any duplicated config otion, and error out if there are
duplicated ones.
Now it checks for following errors, most are unchanged from before:
- Any duplicated config options. (New added)
- Deprecated/Invalid kdump config option.
- Duplicated kdump target, will have a different error message of
other duplicated config options.
- Duplicated --mount options in dracut_args.
- Empty config values. All kdump configs should be in
"<config_opt> <config_value>" format.
- Check If raw target is used in fadump mode.
And removed detect of lines start with space, it will not break kdump
anyway.
The performance is measurable better than before for the check_config
function.
Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Pingfan Liu <piliu@redhat.com>
On s390, if Secure-IPL is enabled, then "kexec -s -l" is required.
Otherwise kdump kernel can not be loaded.
Signed-off-by: Pingfan Liu <piliu@redhat.com>
Acked-by: Kairui Song <kasong@redhat.com>
If failure action is set to "shell", user will need more debug info
available in kdump kernel. Especially when serial console is not
available, manually retrieve the log from journalctl is very useful
for debugging kdump issue.
Else, we can still drop journalctl content to save memory assuming
nothing will use it.
Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Lianbo Jiang <lijiang@redhat.com>
Make the test script print following line when the test is finished and vmcore is successfully dumped:
You can retrive the verify the vmcore file using following command:
./scripts/copy-from-image.sh \
/home/kasong/fedpkg/kexec-tools/tests/output/ssh-kdump/0-server.img \
/var/crash/192.168.77.62-2020-09-02-05:16:26/vmcore.flat ./
Kernel package verion is: kernel-core-5.6.6-300.fc32.x86_64
Also add a helper to copy files out of the VM image.
Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Dave Young <dyoung@redhat.com>
Now, by execute `make test-run` in tests/, kexec-tools sanity tests
will be performed using VMs. There are currently 3 test cases, for local
kdump, nfs kdump and ssh kdump.
For each test VM, the selftest framework will create a snapshot layer,
do setup as required by the test case, this ensure each test runs in a
clean VM.
This framework will install a custom systemd service that starts when
system have finished booting, and the service will do basic routine
(fetch and set boot counter, etc..), then call the test case which is
installed in /kexec-kdump-test/test.sh in VM.
Each VM will have two serial consoles, one for ordinary console usage,
one for the test communication and log. The test script will watch the
second test console to know the test status.
The test cases are located in tests/scripts/testcases, documents about
the test cases structure will be provided in following commits.
Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Dave Young <dyoung@redhat.com>
The Makefile In tests/ could help build a VM image using Fedora cloud
image as base image, or, user can specify a base image using
BASE_IMAGE=<path/to/file>. The current repo will be packeged and
installed in the image, so the image could be used as a test image to
test kexec-tools.
The image building is splited into two steps:
The first step, it either convert the base image to qcow2 or create
a snapshot on it, and install basic packages (dracut, grubby, ...)
and do basic setups (setup crashkernel=, disable selinux, ...).
See tests/scripts/build-scripts/base-image.sh for detail.
The second step, it creates a snapshot on top of the image produced by
the previous step, and install the packaged kexec-tools of current
repo. See tests/scripts/build-scripts/test-base-image.sh for detail.
In this way, if repo's content is changes, `make` will detect it and
only rebuild the second snapshot which speed up the rebuild by a lot.
The image will be located as tests/output/test-base-image, and in qcow2
format. And default user/password is set to root/fedora.
Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Dave Young <dyoung@redhat.com>
Encrypted target have many issues, so let user check
kexec-kdump-howto.txt, which have more details.
Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Pingfan Liu <piliu@redhat.com>
Now all atomic special workaround is removed, we can remove the atomic
detection function.
Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Dave Young <dyoung@redhat.com>
kernel installation is not always in a fixed location /boot, there are
multiple different style of kernel installation, and initramfs location
changes with kernel. The two files should be detected together and adapt
to different style.
To do so we use a list of known installation destinations, and a list
of possible kernel image and initrd names. Iterate the two list to
detect the installation location of the two files. If GRUB is in use,
the BOOT_IMAGE= cmdline from GRUB will also be considered. And also
prefers user specified config if given.
Previous atomic workaround is no longer needed as the new detection
method can cover that case.
Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Dave Young <dyoung@redhat.com>