Fix: transport: Correctly find NLMSG_DONE and NLMSG_ERROR

Kernel v6.9 changed Netlink behavior to deliver NLMSG_DONE in the same
recv() call as the data messages. This broke booth, causing an arbitrary
to hang during startup. This was fixed upstream by booth commit
7d933651, which landed in booth v1.2.

RHEL 9.6 contains the same kernel change. It was backported to
kernel-5.14.0-535.el9 in RHEL 57755. So without this commit, booth is
broken on RHEL 9.6 and later. (Omitted hyphen in RHEL issue to avoid
GitLab pipeline failure due to unapproved ticket.)

More details on the kernel change can be found here:
https://lore.kernel.org/netdev/20240315124808.033ff58d@elisabeth/T/.

Resolves: RHEL-133741

Signed-off-by: Reid Wahl <nwahl@redhat.com>
This commit is contained in:
Reid Wahl 2025-12-02 23:42:44 -08:00
parent 42c0385108
commit 7cb6fffa11
2 changed files with 79 additions and 1 deletions

View File

@ -0,0 +1,72 @@
From 7d93365197f3df144ea007a0ce27cff3b59af8d3 Mon Sep 17 00:00:00 2001
From: Jan Friesse <jfriesse@redhat.com>
Date: Tue, 23 Apr 2024 18:01:02 +0200
Subject: [PATCH] transport: Fix _find_myself for kernel 6.9
Kernel 6.9 seems to have changed AF_NETLINK behavior slightly making
booth unable to start.
Previously it was expected only first item in
the message can be NLMSG_DONE or NLMSG_ERROR type. And it looks this was
true for Kernel < 6.9.
With kernel 6.9 this is no longer true, so any item can be type
NLMSG_DONE or NLMSG_ERROR.
Result was loop was never terminated and booth was waiting for more
messages from kernel which never arrived.
Solution is to change loop a bit so NLMSG_DONE, NLMSG_ERROR and
RTM_NEWADDR are handled correctly.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
---
src/transport.c | 21 ++++++++++-----------
1 file changed, 10 insertions(+), 11 deletions(-)
diff --git a/src/transport.c b/src/transport.c
index 0d17f18..817a4dc 100644
--- a/src/transport.c
+++ b/src/transport.c
@@ -208,17 +208,16 @@ int _find_myself(int family, struct booth_site **mep, int fuzzy_allowed)
return 0;
}
- h = (struct nlmsghdr *)rcvbuf;
- if (h->nlmsg_type == NLMSG_DONE)
- break;
-
- if (h->nlmsg_type == NLMSG_ERROR) {
- close(fd);
- log_error("netlink socket recvmsg error");
- return 0;
- }
+ for (h = (struct nlmsghdr *)rcvbuf; NLMSG_OK(h, status); h = NLMSG_NEXT(h, status)) {
+ if (h->nlmsg_type == NLMSG_DONE)
+ goto out;
+
+ if (h->nlmsg_type == NLMSG_ERROR) {
+ close(fd);
+ log_error("netlink socket recvmsg error");
+ return 0;
+ }
- while (NLMSG_OK(h, status)) {
if (h->nlmsg_type == RTM_NEWADDR) {
struct ifaddrmsg *ifa = NLMSG_DATA(h);
struct rtattr *tb[IFA_MAX+1];
@@ -271,10 +270,10 @@ int _find_myself(int family, struct booth_site **mep, int fuzzy_allowed)
}
}
}
- h = NLMSG_NEXT(h, status);
}
}
+out:
close(fd);
if (!me)
--
2.51.1

View File

@ -41,7 +41,7 @@
Name: booth
Version: 1.1
Release: 2%{?dist}
Release: 3%{?dist}
Summary: Ticket Manager for Multi-site Clusters
License: GPLv2+
Url: https://github.com/%{github_owner}/%{name}
@ -49,6 +49,7 @@ Source0: https://github.com/%{github_owner}/%{name}/releases/download/v%{
Patch0: rhel-specific-0001-config-Add-enable-authfile-option.patch
Patch1: RHEL-32613-1-attr-Fix-reading-of-server_reply.patch
Patch2: RHEL-32613-2-auth-Check-result-of-gcrypt-gcry_md_get_algo_dlen.patch
Patch3: RHEL-133741-0001-transport-Fix-_find_myself-for-kernel-6.9.patch
# direct build process dependencies
BuildRequires: autoconf
@ -297,6 +298,11 @@ VERBOSE=1 make check
%{_usr}/lib/ocf/resource.d/booth/sharedrsc
%changelog
* Wed Dec 04 2025 Reid Wahl <nwahl@redhat.com> - 1.1-3
- Resolves: RHEL-133741
- transport: Fix _find_myself for kernel 6.9
* Tue Apr 30 2024 Jan Friesse <jfriesse@redhat.com> - 1.1-2
- Resolves: RHEL-32613