Fix: transport: Correctly find NLMSG_DONE and NLMSG_ERROR
Kernel v6.9 changed Netlink behavior to deliver NLMSG_DONE in the same recv() call as the data messages. This broke booth, causing an arbitrary to hang during startup. This was fixed upstream by booth commit 7d933651, which landed in booth v1.2. RHEL 9.6 contains the same kernel change. It was backported to kernel-5.14.0-535.el9 in RHEL 57755. So without this commit, booth is broken on RHEL 9.6 and later. (Omitted hyphen in RHEL issue to avoid GitLab pipeline failure due to unapproved ticket.) More details on the kernel change can be found here: https://lore.kernel.org/netdev/20240315124808.033ff58d@elisabeth/T/. Resolves: RHEL-133741 Signed-off-by: Reid Wahl <nwahl@redhat.com>
This commit is contained in:
parent
42c0385108
commit
7cb6fffa11
@ -0,0 +1,72 @@
|
||||
From 7d93365197f3df144ea007a0ce27cff3b59af8d3 Mon Sep 17 00:00:00 2001
|
||||
From: Jan Friesse <jfriesse@redhat.com>
|
||||
Date: Tue, 23 Apr 2024 18:01:02 +0200
|
||||
Subject: [PATCH] transport: Fix _find_myself for kernel 6.9
|
||||
|
||||
Kernel 6.9 seems to have changed AF_NETLINK behavior slightly making
|
||||
booth unable to start.
|
||||
|
||||
Previously it was expected only first item in
|
||||
the message can be NLMSG_DONE or NLMSG_ERROR type. And it looks this was
|
||||
true for Kernel < 6.9.
|
||||
|
||||
With kernel 6.9 this is no longer true, so any item can be type
|
||||
NLMSG_DONE or NLMSG_ERROR.
|
||||
|
||||
Result was loop was never terminated and booth was waiting for more
|
||||
messages from kernel which never arrived.
|
||||
|
||||
Solution is to change loop a bit so NLMSG_DONE, NLMSG_ERROR and
|
||||
RTM_NEWADDR are handled correctly.
|
||||
|
||||
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
|
||||
---
|
||||
src/transport.c | 21 ++++++++++-----------
|
||||
1 file changed, 10 insertions(+), 11 deletions(-)
|
||||
|
||||
diff --git a/src/transport.c b/src/transport.c
|
||||
index 0d17f18..817a4dc 100644
|
||||
--- a/src/transport.c
|
||||
+++ b/src/transport.c
|
||||
@@ -208,17 +208,16 @@ int _find_myself(int family, struct booth_site **mep, int fuzzy_allowed)
|
||||
return 0;
|
||||
}
|
||||
|
||||
- h = (struct nlmsghdr *)rcvbuf;
|
||||
- if (h->nlmsg_type == NLMSG_DONE)
|
||||
- break;
|
||||
-
|
||||
- if (h->nlmsg_type == NLMSG_ERROR) {
|
||||
- close(fd);
|
||||
- log_error("netlink socket recvmsg error");
|
||||
- return 0;
|
||||
- }
|
||||
+ for (h = (struct nlmsghdr *)rcvbuf; NLMSG_OK(h, status); h = NLMSG_NEXT(h, status)) {
|
||||
+ if (h->nlmsg_type == NLMSG_DONE)
|
||||
+ goto out;
|
||||
+
|
||||
+ if (h->nlmsg_type == NLMSG_ERROR) {
|
||||
+ close(fd);
|
||||
+ log_error("netlink socket recvmsg error");
|
||||
+ return 0;
|
||||
+ }
|
||||
|
||||
- while (NLMSG_OK(h, status)) {
|
||||
if (h->nlmsg_type == RTM_NEWADDR) {
|
||||
struct ifaddrmsg *ifa = NLMSG_DATA(h);
|
||||
struct rtattr *tb[IFA_MAX+1];
|
||||
@@ -271,10 +270,10 @@ int _find_myself(int family, struct booth_site **mep, int fuzzy_allowed)
|
||||
}
|
||||
}
|
||||
}
|
||||
- h = NLMSG_NEXT(h, status);
|
||||
}
|
||||
}
|
||||
|
||||
+out:
|
||||
close(fd);
|
||||
|
||||
if (!me)
|
||||
--
|
||||
2.51.1
|
||||
|
||||
@ -41,7 +41,7 @@
|
||||
|
||||
Name: booth
|
||||
Version: 1.1
|
||||
Release: 2%{?dist}
|
||||
Release: 3%{?dist}
|
||||
Summary: Ticket Manager for Multi-site Clusters
|
||||
License: GPLv2+
|
||||
Url: https://github.com/%{github_owner}/%{name}
|
||||
@ -49,6 +49,7 @@ Source0: https://github.com/%{github_owner}/%{name}/releases/download/v%{
|
||||
Patch0: rhel-specific-0001-config-Add-enable-authfile-option.patch
|
||||
Patch1: RHEL-32613-1-attr-Fix-reading-of-server_reply.patch
|
||||
Patch2: RHEL-32613-2-auth-Check-result-of-gcrypt-gcry_md_get_algo_dlen.patch
|
||||
Patch3: RHEL-133741-0001-transport-Fix-_find_myself-for-kernel-6.9.patch
|
||||
|
||||
# direct build process dependencies
|
||||
BuildRequires: autoconf
|
||||
@ -297,6 +298,11 @@ VERBOSE=1 make check
|
||||
%{_usr}/lib/ocf/resource.d/booth/sharedrsc
|
||||
|
||||
%changelog
|
||||
* Wed Dec 04 2025 Reid Wahl <nwahl@redhat.com> - 1.1-3
|
||||
- Resolves: RHEL-133741
|
||||
|
||||
- transport: Fix _find_myself for kernel 6.9
|
||||
|
||||
* Tue Apr 30 2024 Jan Friesse <jfriesse@redhat.com> - 1.1-2
|
||||
- Resolves: RHEL-32613
|
||||
|
||||
|
||||
Loading…
Reference in New Issue
Block a user