Backport various fixes from master.
- Backport fix for migration history cleanup causing resource recovery - Backport fix for SIGABRT during pacemaker-fenced shutdown - Resolves: rhbz2166393 - Resolves: rhbz2166967
This commit is contained in:
parent
3161587759
commit
31b1074af1
File diff suppressed because it is too large
Load Diff
|
@ -0,0 +1,61 @@
|
|||
From 45617b727e280cac384a28ae3d96145e066e6197 Mon Sep 17 00:00:00 2001
|
||||
From: Reid Wahl <nrwahl@protonmail.com>
|
||||
Date: Fri, 3 Feb 2023 12:08:57 -0800
|
||||
Subject: [PATCH] Fix: fencer: Prevent double g_source_remove of op_timer_one
|
||||
|
||||
QE observed a rarely reproducible core dump in the fencer during
|
||||
Pacemaker shutdown, in which we try to g_source_remove() an op timer
|
||||
that's already been removed.
|
||||
|
||||
free_stonith_remote_op_list()
|
||||
-> g_hash_table_destroy()
|
||||
-> g_hash_table_remove_all_nodes()
|
||||
-> clear_remote_op_timers()
|
||||
-> g_source_remove()
|
||||
-> crm_glib_handler()
|
||||
-> "Source ID 190 was not found when attempting to remove it"
|
||||
|
||||
The likely cause is that request_peer_fencing() doesn't set
|
||||
op->op_timer_one to 0 after calling g_source_remove() on it, so if that
|
||||
op is still in the stonith_remote_op_list at shutdown with the same
|
||||
timer, clear_remote_op_timers() tries to remove the source for
|
||||
op_timer_one again.
|
||||
|
||||
There are only five locations that call g_source_remove() on a
|
||||
remote_fencing_op_t timer.
|
||||
* Three of them are in clear_remote_op_timers(), which first 0-checks
|
||||
the timer and then sets it to 0 after g_source_remove().
|
||||
* One is in remote_op_query_timeout(), which does the same.
|
||||
* The last is the one we fix here in request_peer_fencing().
|
||||
|
||||
I don't know all the conditions of QE's test scenario at this point.
|
||||
What I do know:
|
||||
* have-watchdog=true
|
||||
* stonith-watchdog-timeout=10
|
||||
* no explicit topology
|
||||
* fence agent script is missing for the configured fence device
|
||||
* requested fencing of one node
|
||||
* cluster shutdown
|
||||
|
||||
Fixes RHBZ2166967
|
||||
|
||||
Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
|
||||
---
|
||||
daemons/fenced/fenced_remote.c | 1 +
|
||||
1 file changed, 1 insertion(+)
|
||||
|
||||
diff --git a/daemons/fenced/fenced_remote.c b/daemons/fenced/fenced_remote.c
|
||||
index d61b5bd..b7426ff 100644
|
||||
--- a/daemons/fenced/fenced_remote.c
|
||||
+++ b/daemons/fenced/fenced_remote.c
|
||||
@@ -1825,6 +1825,7 @@ request_peer_fencing(remote_fencing_op_t *op, peer_device_info_t *peer)
|
||||
op->state = st_exec;
|
||||
if (op->op_timer_one) {
|
||||
g_source_remove(op->op_timer_one);
|
||||
+ op->op_timer_one = 0;
|
||||
}
|
||||
|
||||
if (!((stonith_watchdog_timeout_ms > 0)
|
||||
--
|
||||
2.31.1
|
||||
|
|
@ -36,7 +36,7 @@
|
|||
## can be incremented to build packages reliably considered "newer"
|
||||
## than previously built packages with the same pcmkversion)
|
||||
%global pcmkversion 2.1.5
|
||||
%global specversion 5
|
||||
%global specversion 6
|
||||
|
||||
## Upstream commit (full commit ID, abbreviated commit ID, or tag) to build
|
||||
%global commit a3f44794f94e1571c6ba0042915ade369b4ce4b1
|
||||
|
@ -250,6 +250,8 @@ Source1: https://codeload.github.com/%{github_owner}/%{nagios_name}/tar.gz
|
|||
# upstream commits
|
||||
Patch001: 001-sync-points.patch
|
||||
Patch002: 002-remote-regression.patch
|
||||
Patch003: 003-history-cleanup.patch
|
||||
Patch004: 004-g_source_remove.patch
|
||||
|
||||
Requires: resource-agents
|
||||
Requires: %{pkgname_pcmk_libs}%{?_isa} = %{version}-%{release}
|
||||
|
@ -852,6 +854,12 @@ exit 0
|
|||
%license %{nagios_name}-%{nagios_hash}/COPYING
|
||||
|
||||
%changelog
|
||||
* Thu Feb 9 2023 Chris Lumens <clumens@redhat.com> - 2.1.5-6
|
||||
- Backport fix for migration history cleanup causing resource recovery
|
||||
- Backport fix for SIGABRT during pacemaker-fenced shutdown
|
||||
- Resolves: rhbz2166393
|
||||
- Resolves: rhbz2166967
|
||||
|
||||
* Tue Jan 24 2023 Ken Gaillot <kgaillot@redhat.com> - 2.1.5-5
|
||||
- Backport fix for remote node shutdown regression
|
||||
- Resolves: rhbz2163450
|
||||
|
|
Loading…
Reference in New Issue