kexec-tools

Author	SHA1	Message	Date
Coiby Xu	12d9eff9dc	Show how much time kdump has waited for the network to be ready Relates: https://bugzilla.redhat.com/show_bug.cgi?id=2151504 Currently, when the network isn't ready, kdump would repeatedly print the same info, [ 29.537230] kdump[671]: Bad kdump network destination: 192.123.1.21 [ 30.559418] kdump[679]: Bad kdump network destination: 192.123.1.21 [ 31.580189] kdump[687]: Bad kdump network destination: 192.123.1.21 This is not user-friendly and users may think kdump has got stuck. So also show much time has waited for the network to be ready, [ 29.546258] kdump[673]: Waiting for network to be ready (50s / 10min) ... [ 32.608967] kdump[697]: Waiting for network to be ready (56s / 10min) Note kdump_get_ip_route no longer prints an error message and it's up to the caller to determine the log level and print relevant messages. And kdump_collect_netif_usage aborts when kdump_get_ip_route fails. Reported-by: Martin Pitt <mpitt@redhat.com> Signed-off-by: Coiby Xu <coxu@redhat.com> Reviewed-by: Philipp Rudo <prudo@redhat.com>	2023-04-15 06:39:17 +08:00
Coiby Xu	9792994f2f	Wait for the network to be truly ready before dumping vmcore nm-wait-online-initrd.service installed by dracut's 35-networkmanager module calls nm-online with "-s" which means it returns immediately when NetworkManager logs "startup complete". Thus it doesn't truly wait for network connectivity to be established [1]. Wait for the network to be truly ready before dumping vmcore. There are two benefits brought by this approach, - ssh/nfs dumping won't fail because of that the network is not ready e.g. [2][3] - users don't need to use workarounds like rd.net.carrier.timeout to make sure the network is ready [1] https://bugzilla.redhat.com/show_bug.cgi?id=1485712 [2] https://bugzilla.redhat.com/show_bug.cgi?id=1909014 [3] https://bugzilla.redhat.com/show_bug.cgi?id=2035451 Signed-off-by: Coiby Xu <coxu@redhat.com> Reviewed-by: Thomas Haller <thaller@redhat.com> Reviewed-by: Philipp Rudo <prudo@redhat.com>	2022-11-23 06:39:27 +08:00
Coiby Xu	568623e69a	Address the cases where a NIC has a different name in kdump kernel A NIC may get a different name in the kdump kernel from 1st kernel in cases like, - kernel assigned network interface names are not persistent e.g. [1] - there is an udev rule to rename the NIC in the 1st kernel but the kdump initrd may not have that rule e.g. [2] If NM tries to match a NIC with a connection profile based on NIC name i.e. connection.interface-name, it will fail the above bases. A simple solution is to ask NM to match a connection profile by MAC address. Note we don't need to do this for user-created NICs like vlan, bridge and bond. An remaining issue is passing the name of a NIC via the kdumpnic dracut command line parameter which requires passing ifname=<interface>:<MAC> to have fixed NIC name. But we can simply drop this requirement. kdumpnic is needed because kdump needs to get the IP by NIC name and use the IP to created a dumping folder named "{IP}-{DATE}". We can simply pass the IP to the kdump kernel directly via a new dracut command line parameter kdumpip instead. In addition to the benefit of simplifying the code, there are other three benefits brought by this approach, - make use of whatever network to transfer the vmcore. Because as long as we have the network to we don't care which NIC is active. - if obtained IP in the kdump kernel is different from the one in the 1st kernel. "{IP}-{DATE}" would better tell where the dumped vmcore comes from. - without passing ifname=<interface>:<MAC> to kdump initrd, the issue of there are two interfaces with the same MAC address for Azure Hyper-V NIC SR-IOV [3] is resolved automatically. [1] https://bugzilla.redhat.com/show_bug.cgi?id=1121778 [2] https://bugzilla.redhat.com/show_bug.cgi?id=810107 [3] https://bugzilla.redhat.com/show_bug.cgi?id=1962421 Signed-off-by: Coiby Xu <coxu@redhat.com> Reviewed-by: Thomas Haller <thaller@redhat.com> Reviewed-by: Philipp Rudo <prudo@redhat.com>	2022-11-23 06:39:27 +08:00
Tao Liu	bea6143178	Fix the sync issue for dump_fs Previously the sync for dump_fs is problematic, it always return success according to man 2 sync. So it cannot detect the error of the dump target is full and not all of vmcore data been written back the disk, which will leave the vmcore imcomplete and report misleading log as "saving vmcore complete". In this patch, we will use "sync -f vmcore" instead, which will return error if syncfs on the dump target fails. In this way, vmcore sync related failures, such as autoextend of lvm2 thinpool fails, can be detected and handled properly. Signed-off-by: Tao Liu <ltao@redhat.com> Reviewed-by: Coiby Xu <coxu@redhat.com>	2022-10-08 15:46:11 +08:00
Tao Liu	c743881ae6	virtiofs support for kexec-tools This patch add virtiofs support for kexec-tools by introducing a new option for /etc/kdump.conf: virtiofs myfs Where myfs is a variable tag name specified in qemu cmdline "-device vhost-user-fs-pci,tag=myfs". The patch covers the following cases: 1) Dumping VM's vmcore to a virtiofs shared directory; 2) When the VM's rootfs is a virtiofs shared directory and dumping the VM's vmcore to its subdirectory, such as /var/crash; 3) The combination of case 1 & 2: The VM's rootfs is a virtiofs shared directory and dumping the VM's vmcore to another virtiofs shared directory. Case 2 & 3 need dracut >= 057, otherwise VM cannot boot from virtiofs shared rootfs. But it is not the issue of kexec-tools. Reviewed-by: Philipp Rudo <prudo@redhat.com> Signed-off-by: Tao Liu <ltao@redhat.com>	2022-09-29 12:22:49 +08:00
Kairui Song	3d70f8b049	logger: save log after all kdump progress finished Make log saving the last step of kdump.sh, so it can catch more info, for example, the output of post.d hooks will be covered by the log now. Signed-off-by: Kairui Song <kasong@tencent.com> Reviewed-by: Philipp Rudo <prudo@redhat.com>	2022-04-29 16:22:41 +08:00
Philipp Rudo	7cd3f232d5	kdump-lib-initramfs: merge definitions for default ssh key There are currently three identical definitions for the default ssh key. Combine them into one in kdump-lib-initramfs.sh. Signed-off-by: Philipp Rudo <prudo@redhat.com> Reviewed-by: Tao Liu <ltao@redhat.com> Reviewed-by: Coiby Xu <coxu@redhat.com>	2022-04-02 16:24:32 +08:00
Kairui Song	ee337c6f49	Add header comment for POSIX compliant scripts To make things cleaner and more human readable, add a short comment for the POSIX scripts. Signed-off-by: Kairui Song <kasong@redhat.com> Acked-by: Philipp Rudo <prudo@redhat.com>	2021-09-15 23:11:37 +08:00
Kairui Song	7c76611abb	dracut-kdump.sh: reformat with shfmt This is done with `shfmt -w -s dracut-kdump.sh`. There is no behaviour change. Signed-off-by: Kairui Song <kasong@redhat.com> Acked-by: Philipp Rudo <prudo@redhat.com>	2021-09-15 23:10:57 +08:00
Kairui Song	b1339c3b8a	dracut-kdump.sh: make it POSIX compatible POSIX doesn't support keyword `local`, so this commit reduced variable usage. Heredoc ("<<<") operation is also not supported, so kdump.conf is now pre-parse into a temp file. Also fixes many POSIX syntax errors. Signed-off-by: Kairui Song <kasong@redhat.com> Acked-by: Philipp Rudo <prudo@redhat.com>	2021-09-14 03:25:54 +08:00
Kairui Song	725027b735	dracut-kdump.sh: POSIX doesn't support pipefail Set pipefail will cause POSIX shell to exit with failure. So only do that in bash. Signed-off-by: Kairui Song <kasong@redhat.com> Acked-by: Philipp Rudo <prudo@redhat.com>	2021-09-14 03:25:54 +08:00
Kairui Song	b1c794a2cf	dracut-kdump.sh: Use stat instead of ls to get vmcore size ls output is fragile, so use stat instead. Signed-off-by: Kairui Song <kasong@redhat.com> Acked-by: Philipp Rudo <prudo@redhat.com>	2021-09-14 03:25:54 +08:00
Kairui Song	7a9823b42e	dracut-kdump.sh: simplify dump_ssh There is a workaround for `scp` that it expects IPv6 address to be quoted with [ ... ], only apply the workaround once and store the updated `scp` address to reuse it. Signed-off-by: Kairui Song <kasong@redhat.com> Acked-by: Philipp Rudo <prudo@redhat.com>	2021-09-14 03:25:54 +08:00
Kairui Song	8f89e89071	dracut-kdump.sh: remove add_dump_code `add_dump_code "<op>"` is just `DUMP_INSTRUCTION="<op>"`, no need a extra wrapper for that. Signed-off-by: Kairui Song <kasong@redhat.com> Acked-by: Philipp Rudo <prudo@redhat.com>	2021-09-14 03:25:54 +08:00
Kairui Song	0675edbadb	dracut-kdump.sh: don't put KDUMP_SCRIPT_DIR in PATH monitor_dd_progress is the only extra binary in KDUMP_SCRIPT_DIR, no need to change PATH environment variable, just call it directly. Signed-off-by: Kairui Song <kasong@redhat.com> Acked-by: Philipp Rudo <prudo@redhat.com>	2021-09-14 03:25:54 +08:00
Kairui Song	a1205effaa	kdump-lib-initramfs.sh: move dump related functions to kdump.sh These dump related functions are only used by dracut-kdump.sh. Signed-off-by: Kairui Song <kasong@redhat.com> Acked-by: Philipp Rudo <prudo@redhat.com>	2021-09-14 03:25:54 +08:00
Kairui Song	e7118d1de8	Merge kdump-error-handler.sh into kdump.sh kdump-error-handler.sh does nothing except calling three functions, it can be easily merged into kdump.sh by using a parameter to run the error handling routine. kdump-lib-initramfs.sh was created to hold the three shared functions and related code, so by merging these two files, kdump-lib-initramfs.sh can be simplified by a lot. Following up commits will clean up kdump-lib-initramfs.sh. Signed-off-by: Kairui Song <kasong@redhat.com> Acked-by: Philipp Rudo <prudo@redhat.com>	2021-09-14 03:25:54 +08:00
Kairui Song	a0282ab22c	kdump-lib.sh: add a config format and read helper Add a helper `kdump_read_conf` to replace read_strip_comments. `kdump_read_conf` does a few more things: - remove trailing spaces. - format the content, remove duplicated spaces between name and value. - read from KDUMP_CONFIG_FILE (/etc/kdump.conf) directly, avoid pasting "/etc/kdump.conf" path everywhere in the code. - check if config file exists, just in case. Also unify the environmental variable, now KDUMP_CONFIG_FILE stands for the default config location. This helps avoid some shell pitfalls about spaces when reading config. Signed-off-by: Kairui Song <kasong@redhat.com> Acked-by: Philipp Rudo <prudo@redhat.com>	2021-09-14 03:25:29 +08:00
Tao Liu	00785873ef	Fix incorrect vmcore permissions when dumped through ssh Previously when dumping vmcore to a remote machine through ssh, the files are created remotely and file permissions are taken from the default umask value, which making the files accessible to anyone on the remote machine. This patch fixed the security issue by setting a customized umask value before the file creation on the remote machine. Signed-off-by: Tao Liu <ltao@redhat.com> Acked-by: Kairui Song <kasong@redhat.com>	2021-03-23 16:10:43 +08:00
Lianbo Jiang	a571b0da9f	fix kdump failure of saving vmcore with the scp + ipv6 method Currently, kdump will fail to save vmcore when using the scp and ipv6. The reason is that the scp requires IPv6 addresses to be enclosed in square brackets, but ssh doesn’t require this. Let's enclose the ipv6 address in square brackets for scp dump. Signed-off-by: Lianbo Jiang <lijiang@redhat.com> Acked-by: Pingfan Liu <piliu@redhat.com>	2021-01-21 15:03:20 +08:00
Lianbo Jiang	cd86148804	Save the final failure information to log file if saving vmcore failed Currently, if saving vmcore failed, the final failure information won't be saved to the kexec-dmesg.log, because the action of saving the log occurs before the final log is printed, it has no chance to save the log(marked it with the '^^^' below) to the log file(kexec-dmesg.log). For example: [1] console log: [ 3.589967] kdump[453]: saving vmcore-dmesg.txt to /sysroot//var/crash/127.0.0.1-2020-11-26-14:19:17/ [ 3.627261] kdump[458]: saving vmcore-dmesg.txt complete [ 3.633923] kdump[460]: saving vmcore [ 3.661020] kdump[465]: saving vmcore failed ^^^^^^^^^^^^^^^^^^^^ [2] kexec-dmesg.log: Nov 26 14:19:17 kvm-06-guest25.hv2.lab.eng.bos.redhat.com kdump[453]: saving vmcore-dmesg.txt to /sysroot//var/crash/127.0.0.1-2020-11-26-14:19:17/ Nov 26 14:19:17 kvm-06-guest25.hv2.lab.eng.bos.redhat.com kdump[458]: saving vmcore-dmesg.txt complete Nov 26 14:19:17 kvm-06-guest25.hv2.lab.eng.bos.redhat.com kdump[460]: saving vmcore Let's improve it in order to avoid the loss of important information. Signed-off-by: Lianbo Jiang <lijiang@redhat.com> Acked-by: Kairui Song <kasong@redhat.com>	2020-12-29 17:40:26 +08:00
Lianbo Jiang	d7054f4cd8	Improve debugging in the kdump kernel Let's use the logger in the second kernel and collect the kernel ring buffer(dmesg) of the second kernel. Signed-off-by: Lianbo Jiang <lijiang@redhat.com> Acked-by: Kairui Song <kasong@redhat.com>	2020-10-27 17:34:07 +08:00
Pingfan Liu	bc67c13651	kdump_pre: make notes more precise Signed-off-by: Pingfan Liu <piliu@redhat.com> Acked-by: Kairui Song <kasong@redhat.com>	2020-07-21 10:34:36 +08:00
Pingfan Liu	bda677c3d1	dracut-kdump.sh: exit shell when machine reboot The following scenario is observed: kdump: kdump_pre script exited with non-zero status! [ 5.104841] systemd[1]: Shutting down. [ 5.122162] printk: systemd-shutdow: 27 output lines suppressed due to ratelimiting kdump: dump target is /dev/mapper/rhel_hpe--dl380pgen8--02--vm--12-root kdump: saving to /sysroot//var/crash/127.0.0.1-2020-06-27-03:55:01/ kdump: saving vmcore-dmesg.txt kdump: saving vmcore-dmesg.txt complete kdump: saving vmcore Checking for memory holes : [ 0.0 %] / Checking for memory holes : [100.0 %] \| [ 5.516573] systemd-shutdown[1]: Syncing filesystems and block devices. [ 5.519515] systemd-shutdown[1]: Sending SIGTERM to remaining processes... It is caused by the following script if [ $? -ne 0 ]; then echo "kdump: kdump_pre script exited with non-zero status!" do_final_action fi When do_final_action runs, a systemd service is forked for reboot, then the subshell returns, and parent continues to execute. Place "exit 1" to stop executing and make kdump service failure. Signed-off-by: Pingfan Liu <piliu@redhat.com> Acked-by: Kairui Song <kasong@redhat.com>	2020-07-21 10:34:07 +08:00
Kairui Song	1a5d44d7f4	Fix kdump failure when mount target specified by dracut_args commit `61e0169` changed definition of dump_fs function, so need to do a mount target conversion before calling it. Signed-off-by: Kairui Song <kasong@redhat.com>	2020-06-11 14:00:54 +08:00
onitsuka.shinic@fujitsu.com	45e02e73fa	dracut-kdump.sh: Execute the binary and script filesin /etc/kdump/{pre.d,post.d} This patch executes the binary and script files in /etc/kdump/{pre.d,post.d} just like kdump_pre or kdump_post directive written in /etc/kdump.conf. Signed-off-by: Shinichi Onitsuka <onitsuka.shinic@fujitsu.com> Acked-by: Pingfan Liu <piliu@redhat.com>	2020-06-11 12:59:21 +08:00
Kairui Song	cfd93e2b7e	Revert "Add a hook to wait for kdump target in initqueue" This reverts commit `cee618593c`. Upstream dracut have provided a parameter for adding mandantory network requirement by appending "rd.neednet" parameter, so we should use that instead. Signed-off-by: Kairui Song <kasong@redhat.com> Acked-by: Dave Young <dyoung@redhat.com>	2020-05-28 16:26:00 +08:00
Hari Bathini	e3f2f926dd	powerpc: enable the scripts to capture dump on POWERNV platform With FADump support added on POWERNV paltform, enable the scripts to capture /proc/vmcore. Also, if CONFIG_OPAL_CORE is enabled, OPAL core is preserved and exported on POWERNV platform. So, offload OPAL core, if it is available. Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> Acked-by: Kairui Song <kasong@redhat.com>	2020-02-06 22:13:06 +08:00
Kairui Song	cee618593c	Add a hook to wait for kdump target in initqueue The dracut initqueue may quit immediately and won't trigger any hook if there is no "finished" hook still pending (finished hook will be deleted once it return 0). This issue start to appear with latest dracut, latest dracut use network-manager to configure the network, network-manager module only install "settled" hook, and we didn't install any other hook. So NFS/SSH dump will fail. iSCSI dump works because dracut iscsi module will install a "finished" hook to detect if the iscsi target is up. So for NFS/SSH we keep initqueue running until the host successfully get a valid IP address, which means the network is ready. Signed-off-by: Kairui Song <kasong@redhat.com> Acked-by: Pingfan Liu <piliu@redhat.com>	2020-01-29 08:12:45 +08:00
Kairui Song	75d9132417	Get rid of duplicated strip_comments when reading config When reading kdump configs, a single parsing should be enough and this saves a lot of duplicated striping call which speed up the total load speed. Speed up about 2 second when building and 0.1 second for reload in my tests. Signed-off-by: Kairui Song <kasong@redhat.com> Acked-by: Dave Young <dyoung@redhat.com>	2019-05-20 16:56:28 +08:00
Xunlei Pang	391969ced7	dracut-kdump: use POSIX shell syntax kdump.sh may run under sh/dash in kdump kernel. Signed-off-by: Xunlei Pang <xlpang@redhat.com> Acked-by: Dave Young <dyoung@redhat.com>	2016-11-28 10:41:15 +08:00
Xunlei Pang	74c6f46429	Support special mount information via "dracut_args" There are some complaints about nfs kdump that users must mount nfs beforehand, which may cause some overhead to nfs server. For example, there're thounsands of diskless clients deployed with nfs dumping, each time the client is boot up, it will trigger kdump rebuilding so will mount nfs, thus resulting in thousands of nfs request concurrently imposed on the same nfs server. We introduce a new way of specifying mount information via the already-existent "dracut_args" directive(so avoid adding extra directives in /etc/kdump.conf), we will skip all the filesystem mounting and checking stuff for it. So it can be used in the above-mentioned nfs scenario to avoid severe nfs server overhead. Specifically, if there is any "--mount" information specified via "dracut_args" in /etc/kdump.conf, always use it as the final mount without any validation(mounting or checking like mount options, fs size, etc), so users are expected to ensure its correctness. NOTE: -Only one mount target is allowed using "dracut_args" globally. -Dracut will create <mountpoint> if it doesn't exist in kdump kernel, <mountpoint> must be specified as an absolute path. -Users should do a test first and ensure it works because kdump does not prepare the mount or check all the validity. Reviewed-by: Pratyush Anand <panand@redhat.com> Suggested-by: Dave Young <dyoung@redhat.com> Acked-by: Dave Young <dyoung@redhat.com> Signed-off-by: Xunlei Pang <xlpang@redhat.com>	2016-08-26 14:03:48 +08:00
Minfei Huang	edec8a8266	dracut-kdump: Use the first filtered ip address as dump directory For now, Kdump will use ipv4 address as dump directory, and it works, if ipv4 is enabled. Once Kdump start to support ipv6 protocol, we may only setup the ipv6 address exclusively. Modify the code to make Kdump work in either ipv4 and ipv6 protocol. Signed-off-by: Minfei Huang <mhuang@redhat.com> Acked-by: Dave Young <dyoung@redhat.com>	2015-07-28 12:42:17 +08:00
Dave Young	977d20cd50	Revert commit `63476302` The ipv6 patchset is still under review, previously the commit was mistakenly merged, thus let's revert it. Revert "dracut-kdump: Use proper the known hosts entry in the file known_hosts" This reverts commit `63476302aa`. Conflicts: kdump-lib.sh Signed-off-by: Minfei Huang <mhuang@redhat.com> Signed-off-by: Dave Young <dyoung@redhat.com>	2015-06-26 10:14:14 +08:00
Baoquan He	6f4940f198	Revert "execute kdump_post after do_default_action" This reverts commit `f4c45236bf`. Since that commit will change the behaviour of kdump_post. That is not good. Signed-off-by: Baoquan He <bhe@redhat.com>	2015-04-08 15:50:16 +08:00
Baoquan He	f4c45236bf	execute kdump_post after do_default_action User complains that kdump_post script doesn't execute after mount failed. This happened since mount failure will trigger kdump-error-handler.service, and then start kdump-error-handler.sh. However in kdump-error-handler.sh it doesn't execute kdump_post. Hence add it in this patch. Surely the function do_kdump_post need be moved into kdump-lib-initramfs.sh to be a common function. v1->v2: Add a return value to do_kdump_post when invoked in kdump_error-handler.sh. And call do_kdump_post earlier than do_default_action, otherwise it may not execute if reboot/poweroff/halt. Signed-off-by: Baoquan He <bhe@redhat.com> Acked-by: Dave Young <dyoung@redhat.com> Acked-by: Meifei Huang <mhuang@redhat.com>	2015-02-11 17:11:02 +08:00
Minfei Huang	63476302aa	dracut-kdump: Use proper the known hosts entry in the file known_hosts Once login using ssh, the ssh will store the known hosts entry to the local ~/.ssh/known_hosts. From now, we can login using ssh automaticly. The ssh will check the ~/ssh/.known_hosts entry, if set the option StrictHostKeyChecking=yes/ask in the config or command line, when you want to login the target. the default value of StrictHostKeyChecking is ask. And the kdump using the ssh will append the option StrictHostKeyChecking=yes in the command line. We can using following ip to connect peer machine, if enable the ipv6. fe80::5054:ff:fe48:ca80%eth0 Obviously, above ip contains the ethX. Kdump will add the prefix "kdump-" before ethX to avoid flowing netdevice name in case netdevice names ethX in the 2nd kernel. So the ip address will change to fe80::5054:ff:fe48:ca80%kdump-eth0. Kdump will login the target manully in the 2nd kernel, because of the option StrictHostKeyChecking=yes and inexistence known hosts entry in the local ~/.ssh/known_hosts. Hence dumping core will fail. In order to login automaticly using ssh, we should add the prefix "kdump-" before ethX in the local ~/.ssh/known_hosts. Signed-off-by: Minfei Huang <mhuang@redhat.com>	2014-12-11 14:19:49 +08:00
WANG Chao	1742affe2c	kdump-initramfs-lib: Fix core_collector issue In ssh or raw dump case, if user do not specify "core_collector" in kdump.conf, kdump will fail. Because global DEFAULT_CORE_COLLECTOR variable isn't applied to CORE_COLLECTOR. Now fix it and clean up the duplicate code in kdump.sh. Signed-off-by: WANG Chao <chaowang@redhat.com> Acked-by: Dave Young <dyoung@redhat.com> Acked-by: Vivek Goyal <vgoyal@redhat.com>	2014-12-05 11:02:31 +08:00
Hari Bathini	d1483f9b28	kdump: fix save vmcore path for fadump With fadump support, dracut-kdump.sh script is installed into default initrd to capture vmcore generated by firmware assisted dump. Thus in fadump case, the same initrd is being used for normal boot as well as boot after system crash. Hence a device node, added by firmware while system crashes, is checked to identify if it is a normal boot or boot after crash to determine whether or not capture vmcore. While testing fadump in fedora21 alpha, observed that vmcore capture is initiated even during normal boot, inspite of this check, with the below error: "kdump.sh[451]: /bin/kdump.sh: line 5: return: can only `return' from a function or sourced script" The below patch tries to fix this issue. Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com> Acked-by: Dave Young <dyoung@redhat.com> Acked-by: WANG Chao <chaowang@redhat.com> Acked-by: Vivek Goyal <vgoyal@redhat.com>	2014-10-27 21:12:02 +08:00
WANG Chao	2276b8561c	Introduce kdump capture service This patch introduce a new kdump-capture.service which is used to run kdump.sh. kdump-capture.service has OnFailure=emergency.target and OnFailureIsolate=yes set. When kdump.sh fails, the kdump emergency service will be triggered and enter the error handling path. In 2nd kernel, the default target for systemd is initrd.target, so we put kdump-capture.service in initrd.target.wants/ and by that, system will start kdump-capture as part of the boot process. kdump.sh used to run in dracut-pre-pivot hook. Now kdump-capture.service is placed after dracut-pre-pivot.service and other dependencies are all copied from dracut-pre-pivot.service. So the start point of kdump.sh will be almost the same as it used to be. Signed-off-by: WANG Chao <chaowang@redhat.com> Acked-by: Vivek Goyal <vgoyal@redhat.com> Acked-by: Dave Young <dyoung@redhat.com>	2014-08-05 13:13:32 +08:00
WANG Chao	002337c671	Introduce kdump error handling service Now upon failure kdump script might not be called at all and it might not be able to execute default action. It results in a hang. Because we disable emergency shell and rely on kdump.sh being invoked through dracut-pre-pivot hook. But it might happen that we never call into dracut-pre-pivot hook because certain systemd targets could not reach due to failure in their dependencies. In those cases error handling code does not run and system hangs. For example: sysroot-var-crash.mount --> initrd-root-fs.target --> initrd.target \ --> dracut-pre-pivot.service --> kdump.sh If /sysroot/var/crash mount fails, initrd-root-fs.target will not be reached. And then initrd.target will not be reached, dracut-pre-pivot.service wouldn't run. Finally kdump.sh wouldn't run. To solve this problem, we need to separate the error handling code from dracut-pre-pivot hook, and every time when a failure shows up, the separated code can be called by the emergency service. By default systemd provides an emergency service which will drop us into shell every time upon a critical failure. It's very convenient for us to re-use the framework of systemd emergency, because we don't have to touch the other parts of systemd. We can use our own script instead of the default one. This new scheme will overwrite emergency shell and replace with kdump error handling code. And this code will do the error handling as needed. Now, we will not rely on dracut-pre-pivot hook running always. Instead whenever error happens and it is serious enough that emergency shell needed to run, now kdump error handler will run. dracut-emergency is also replaced by kdump error handler and it's enabled again all the way down. So all the failure (including systemd and dracut) in 2nd kernel could be captured, and trigger kdump error handler. dracut-initqueue is a special case, which calls "systemctl start emergency" directly, not via "OnFailure=emergency". In case of failure, emergency is started, but not in a isolation mode, which means dracut-initqueue is still running. On the other hand, emergency will call dracut-initqueue again when default action is dump_to_rootfs. systemd would block on the last dracut-initqueue, waiting for the first instance to exit, which leaves us hang. It looks like the following: dracut-initqueue (running) --> call dracut-emergency: --> dracut-emergency (running) --> kdump-error-handler.sh (running) --> call dracut-initqueue: --> blocking and waiting for the original instance to exit. To fix this, I'd like to introduce a wrapper emergency service. This emegency service will replace both the systemd and dracut emergency. And this service does nothing but to isolate to real kdump error handler service: dracut-initqueue (running) --> call dracut-emergency: --> dracut-emergency isolate to kdump-error-handler.service --> dracut-emergency and dracut-initqueue will both be stopped and kdump-error-handler.service will run kdump-error-handler.sh. In a normal failure case, this still works: foo.service fails --> trigger emergency.service --> emergency.service isolates to kdump-error-handler.service --> kdump-error-handler.service will run kdump-error-handler.sh Signed-off-by: WANG Chao <chaowang@redhat.com> Acked-by: Vivek Goyal <vgoyal@redhat.com> Acked-by: Dave Young <dyoung@redhat.com>	2014-08-05 13:13:32 +08:00
WANG Chao	3b27570bea	cleanup: extract functions from kdump.sh to kdump-lib-initramfs.sh Extract functions from kdump.sh, and construct kdump-lib-initramfs.sh as kdump common functions/varaibles library. kdump-lib-initramfs.sh will include kdump-lib.sh, because it will use the functions from there. IOW, kdump-lib-initramfs.sh will be a superset of kdump-lib.sh So after this cleanup: - scripts running in 1st kernel only have to include kdump-lib.sh - scripts running in 2nd kernel only have to include kdump-lib-initramfs.sh Signed-off-by: WANG Chao <chaowang@redhat.com> Acked-by: Vivek Goyal <vgoyal@redhat.com> Acked-by: Dave Young <dyoung@redhat.com>	2014-08-05 13:13:11 +08:00
Vivek Goyal	7f2717aa0a	dracut-kdump.sh: Issue a sync after saving vmcore-dmesg.txt Recently somebody reported an issue where vmcore-dmesg.txt was saved successfully but later saving vmcore failed to due to lack of space on disk. System rebooted but after reboot there was nothing on disk. Not even vmcore-dmesg.txt. Issue a sync after saving vmcore-dmesg.txt to solve this issue. I think this is happening because we are doing "reboot -f" instead of going through systemd reboot path. Anyway, doing a sync now should take care of this. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Acked-by: WANG Chao <chaowang@redhat.com>	2014-07-28 13:22:26 +08:00
Hari Bathini	78589a3207	kdump: Check whether or not to invoke capturing vmcore The script dracut-kdump.sh is responsible for capturing vmcore during second kernel boot. Currently this script gets installed into kdump initrd as part of kdumpbase dracut module. With fadump support, 'dracut-kdump.sh' script also gets installed into default initrd to capture vmcore generated by firmware assisted dump. Thus in fadump case, the same initrd is going to be used for normal boot as well as boot after system crash. Hence a check is required to see if it is a normal boot or boot after crash. A new node "ibm,kernel-dump" is added, to the device tree, by firmware to notify kernel if it is booting after crash. The below patch adds a check for this node before executing steps to capture vmcore. This check will help bypassing the vmcore capture steps during normal boot process. Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com> Acked-by: Vivek Goyal <vgoyal@redhat.com>	2014-07-28 13:03:48 +08:00
Martin Perina	2066e5f792	Add fence_kdump support for generic clusters Adds two new options to kdump.conf to be able to configure fence_kdump support for generic clusters: fence_kdump_args <arg(s)> - Command line arguments for fence_kdump_send (it can contain all valid arguments except hosts to send notification to) fence_kdump_nodes <node(s)> - List of cluster node(s) separated by space to send fence_kdump notification to (this option is mandatory to enable fence_kdump) Generic clusters fence_kdump configuration take precedence over older method of fence_kdump configuration for Pacemaker clusters. It means that if fence_kdump is configured using above options in kdump.conf, old Pacemaker configuration is not used even if it exists. Bug-Url: https://bugzilla.redhat.com/1078134 Signed-off-by: Martin Perina <mperina@redhat.com> Acked-by: Vivek Goyal <vgoyal@redhat.com>	2014-04-03 14:43:06 +08:00
Martin Perina	48f4375f2e	Rename FENCE_KDUMP_NODES to FENCE_KDUMP_NODES_FILE Renames FENCE_KDUMP_NODES variable to FENCE_KDUMP_NODES_FILE to distinguish it from values read from fence_kdump_nodes option in kdump.conf (introduced in following patches). Bug-Url: https://bugzilla.redhat.com/1078134 Signed-off-by: Martin Perina <mperina@redhat.com> Acked-by: Vivek Goyal <vgoyal@redhat.com>	2014-04-03 14:42:54 +08:00
Martin Perina	98d4be908a	Rename FENCE_KDUMP_CONFIG to FENCE_KDUMP_CONFIG_FILE Renames FENCE_KDUMP_CONFIG variable to FENCE_KDUMP_CONFIG_FILE to distinguish it from values read from fence_kdump_args option in kdump.conf (introduced in following patches). Bug-Url: https://bugzilla.redhat.com/1078134 Signed-off-by: Martin Perina <mperina@redhat.com> Acked-by: Vivek Goyal <vgoyal@redhat.com>	2014-04-03 14:42:13 +08:00
WANG Chao	e5e0507371	kdump.sh: send fence kdump message to other nodes in the cluster In 2nd kernel, to prevent the crashed system from being fenced off, fence kdump message must be send to other nodes in the cluster periodically before dumping process. We preserve every node's name in /etc/fence_kdump_nodes in the initrd, so we parse this file and send notify them. Signed-off-by: WANG Chao <chaowang@redhat.com> Tested-by: Zhi Zou <zzou@redhat.com> Tested-by: Marek Grac <mgrac@redhat.com> Acked-by: Vivek Goyal <vgoyal@redhat.com>	2014-01-29 16:20:06 +08:00
WANG Chao	fac2d59ae4	makedumpfile compression method default to lzo Lzo is proven faster than zlib, for large memory machine it will extremely shorten the time for saving vmcore. Let's switch to lzo as the default compression method for makedumpfile. The drawback is lzo has a little less compression ratio than zlib. But considering for most users, speed/time is a more serious concern than vmcore size. So I think default to lzo will benefit most of the users. v1->v2: update kdump.conf.5 [DaveY] Signed-off-by: WANG Chao <chaowang@redhat.com> Acked-by: Vivek Goyal <vgoyal@redhat.com> Acked-by: Dave Young <dyoung@redhat.com>	2013-12-24 14:25:07 +08:00
arthur	ef9f97dcad	Add rd.memdebug in kdump module Description: Currently we only added memdebug code before different dracut hooks ie. pre-udev pre-pivot etc. Add memdebug in kdump.sh before capturing vmcore is also good for debugging. solution: Add make_trace_mem before saving vmcore. Signed-off-by: arthur <zzou@redhat.com> Acked-by: Dave Young <dyoung@redhat.com>	2013-11-28 11:39:18 +08:00

1 2

95 Commits