- nfsserver: also stop rpc-statd for nfsv4_only to avoid stop failing

in some cases
- podman: force-remove containers in stopping state if necessary

  Resolves: RHEL-59172
  Resolves: RHEL-58008
This commit is contained in:
Oyvind Albrigtsen 2024-10-02 11:01:39 +02:00
parent 5a84bdea60
commit 5307e871ec
3 changed files with 94 additions and 1 deletions

View File

@ -0,0 +1,43 @@
From 2ab2c832180dacb2e66d38541beae0957416eb96 Mon Sep 17 00:00:00 2001
From: Antonio Romito <aromito@redhat.com>
Date: Mon, 9 Sep 2024 17:30:38 +0200
Subject: [PATCH] Improve handling of "stopping" container removal in
remove_container()
- Added handling for containers in a stopping state by checking the state and force-removing if necessary.
- Improved log messages to provide clearer information when force removal is needed.
Related: https://issues.redhat.com/browse/RHEL-58008
---
heartbeat/podman | 11 +++++++++--
1 file changed, 9 insertions(+), 2 deletions(-)
diff --git a/heartbeat/podman b/heartbeat/podman
index 53867bff20..643ec4d894 100755
--- a/heartbeat/podman
+++ b/heartbeat/podman
@@ -254,6 +254,13 @@ remove_container()
ocf_run podman rm -v $CONTAINER
rc=$?
if [ $rc -ne 0 ]; then
+ if [ $rc -eq 2 ]; then
+ if podman inspect --format '{{.State.Status}}' $CONTAINER | grep -wq "stopping"; then
+ ocf_log err "Inactive container ${CONTAINER} is stuck in 'stopping' state. Force-remove it."
+ ocf_run podman rm -f $CONTAINER
+ rc=$?
+ fi
+ fi
# due to a podman bug (rhbz#1841485), sometimes a stopped
# container can still be associated with Exec sessions, in
# which case the "podman rm" has to be forced
@@ -517,8 +524,8 @@ podman_stop()
# but the associated container exit code is -1. If that's the case,
# assume there's no failure and continue with the rm as usual.
if [ $rc -eq 125 ] && \
- podman inspect --format '{{.State.Status}}:{{.State.ExitCode}}' $CONTAINER | grep -wq "stopped:-1"; then
- ocf_log warn "Container ${CONTAINER} had an unexpected stop outcome. Trying to remove it anyway."
+ podman inspect --format '{{.State.Status}}:{{.State.ExitCode}}' $CONTAINER | grep -Eq '^(exited|stopped):-1$'; then
+ ocf_log err "Container ${CONTAINER} had an unexpected stop outcome. Trying to remove it anyway."
else
ocf_exit_reason "Failed to stop container, ${CONTAINER}, based on image, ${OCF_RESKEY_image}."
return $OCF_ERR_GENERIC

View File

@ -0,0 +1,38 @@
From 38eaf00bc81af7530c56eba282918762a47a9326 Mon Sep 17 00:00:00 2001
From: Oyvind Albrigtsen <oalbrigt@redhat.com>
Date: Thu, 19 Sep 2024 13:01:53 +0200
Subject: [PATCH] nfsserver: also stop rpc-statd for nfsv4_only to avoid stop
failing in some cases
E.g. nfs_no_notify=true nfsv4_only=true nfs_shared_infodir=/nfsmq/nfsinfo would cause a "Failed to unmount a bind mount" error
---
heartbeat/nfsserver | 16 +++++++---------
1 file changed, 7 insertions(+), 9 deletions(-)
diff --git a/heartbeat/nfsserver b/heartbeat/nfsserver
index 5793d7a70..fd9268afc 100755
--- a/heartbeat/nfsserver
+++ b/heartbeat/nfsserver
@@ -947,15 +947,13 @@ nfsserver_stop ()
sleep 1
done
- if ! ocf_is_true "$OCF_RESKEY_nfsv4_only"; then
- nfs_exec stop rpc-statd > /dev/null 2>&1
- ocf_log info "Stop: rpc-statd"
- rpcinfo -t localhost 100024 > /dev/null 2>&1
- rc=$?
- if [ "$rc" -eq "0" ]; then
- ocf_exit_reason "Failed to stop rpc-statd"
- return $OCF_ERR_GENERIC
- fi
+ nfs_exec stop rpc-statd > /dev/null 2>&1
+ ocf_log info "Stop: rpc-statd"
+ rpcinfo -t localhost 100024 > /dev/null 2>&1
+ rc=$?
+ if [ "$rc" -eq "0" ]; then
+ ocf_exit_reason "Failed to stop rpc-statd"
+ return $OCF_ERR_GENERIC
fi
nfs_exec stop nfs-idmapd > /dev/null 2>&1

View File

@ -45,7 +45,7 @@
Name: resource-agents
Summary: Open Source HA Reusable Cluster Resource Scripts
Version: 4.10.0
Release: 65%{?rcver:%{rcver}}%{?numcomm:.%{numcomm}}%{?alphatag:.%{alphatag}}%{?dirty:.%{dirty}}%{?dist}
Release: 66%{?rcver:%{rcver}}%{?numcomm:.%{numcomm}}%{?alphatag:.%{alphatag}}%{?dirty:.%{dirty}}%{?dist}
License: GPLv2+ and LGPLv2+
URL: https://github.com/ClusterLabs/resource-agents
Source0: %{upstream_prefix}-%{upstream_version}.tar.gz
@ -136,6 +136,8 @@ Patch83: RHEL-43579-galera-mysql-redis-remove-Unpromoted-monitor-action.patch
Patch84: RHEL-22715-LVM-activate-fix-false-positive.patch
Patch85: RHEL-58038-Filesystem-dont-sleep-no-processes-only-send-force-net-fs-after-kill.patch
Patch86: RHEL-59576-Filesystem-try-umount-first-avoid-arguments-list-too-long.patch
Patch87: RHEL-59172-nfsserver-also-stop-rpc-statd-for-nfsv4_only.patch
Patch88: RHEL-58008-podman-force-remove-container-if-necessary.patch
# bundled ha-cloud-support libs
Patch500: ha-cloud-support-aliyun.patch
@ -345,6 +347,8 @@ exit 1
%patch -p1 -P 84
%patch -p1 -P 85
%patch -p1 -P 86
%patch -p1 -P 87
%patch -p1 -P 88
# bundled ha-cloud-support libs
%patch -p1 -P 500
@ -665,6 +669,14 @@ rm -rf %{buildroot}/usr/share/doc/resource-agents
%{_usr}/lib/ocf/lib/heartbeat/OCF_*.pm
%changelog
* Wed Oct 2 2024 Oyvind Albrigtsen <oalbrigt@redhat.com> - 4.10.0-66
- nfsserver: also stop rpc-statd for nfsv4_only to avoid stop failing
in some cases
- podman: force-remove containers in stopping state if necessary
Resolves: RHEL-59172
Resolves: RHEL-58008
* Wed Sep 25 2024 Oyvind Albrigtsen <oalbrigt@redhat.com> - 4.10.0-65
- Filesystem: dont sleep during stop-action when there are no
processes to kill, and only use force argument for network