Commit Graph

4 Commits

Author SHA1 Message Date
Vivek Goyal b8b1667e2c udev-rules: Restart kdump service on cpu ADD/REMOVE events
This patch changes restart of kdump service from cpu online/offline events
to cpu add/remove events.

Some people have complained that they are running cpu online/offline tests
at high frequency and kdump restarts at high frequency and systemd disables
the service. As a temporary fix, we committed a patch to never disable
kdump service.

In general it probably is a good idea to restart kdump service on cpu
add/remove events.

Toshi Kani confirmed following.

- File for /sys/devices/system/cpu/cpuX/crash_notes will be created first
  before ADD event goes out. That means we can not miss creating EFL notes
  for newly created cpu.

- For REMOVE event files under /sys/devices/system/cpu/cpuX/ are removed
  first and then REMOVE event goes out. That means we will remove the elf
  note header for removed cpu.

- There are some race conditions like a cpu is removed but system crashes
  before kdump service restarts. In that case vmcore.c has to be more robust
  to be able to inspect elf notes and discard empty ones.

  Also it is possible that after cpu remove, crash notes memory got reused
  for something else and after crash vmcore.c might see some random data.
  It does basic size checks and discards elf notes if checks don't pass.

  Above rance conditions can happen even with OFFLINE event and there is
  no good way to remove these altogether. So making vmcore.c more robust
  is the right solution here.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Acked-by: WANG Chao <chaowang@redhat.com>
2014-09-15 21:55:07 +08:00
Prarit Bhargava e8f71f38d5 Fix kdump udev memory event restarts
During debugging of another problem issues were noted with the kdump udev
rules.  The kdump service is restarted on memory add and remove events.
These are the wrong events for these types of devices and result in an overly
aggressive restarting of the kdump service.

There are four udev events to consider, "add", "remove", "online", and
"offline".  The remove event is a complete removal from the system -- neither
the hardware nor the kernel know about the hardware; it has been physically
removed.  The add event is associated with hardware being physically added to
the system.  The kernel has some limited knowledge of the device, however,
it is not avaiable for the kernel to use until it is brought online.  Online
events refer to the device being available for the kernel to use.  Opposite
to that is the offline event, which occurs when a device is no longer in
use by the kernel.

Note that in all four events the kernel *may* have some remaining information
stored about the device.

In the case of memory hotplug, kdump should be restarted when a memory module
is onlined or offlined.  This is because the memory is not in use by the
kernel until the memory is onlined, and it is unused when the memory is
offlined.

Making these modifications results in smooth service on systems that do
heavy memory onlining and offlining.

Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
2014-05-04 17:08:34 +08:00
Dave Young d123cc3f2b udev rules fix
Resolves: bz808817

use systemctl try-restart kdump.service instead of old /etc/init.d/kdump restart
systemctl try-restart will restart kdump service only if the kdump service is runing. Original behavior is wrong when user does not chkconfig on the service.

Tested the cpu online/offline events.

Signed-off-by: Dave Young <dyoung@redhat.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
2012-05-28 09:50:47 +08:00
Neil Horman 558bea7d40 Mass Update of RHEL5 patches 2008-06-05 15:18:53 +00:00