Recreate RHEL 5.14.0-687.13.1 from CS9/upstream backports

Add the RHEL 687.13.1 backports (1253-1269) from centos-stream-9 and upstream
stable, on top of 687.12.1. RHEL now ships the smb cifs.spnego fix (CVE-2026-46243)
too.  Bump pkgrelease and specrelease to 687.13.1.
This commit is contained in:
Andrew Lukoshko 2026-06-10 22:43:20 +00:00
parent cd9793b5a4
commit 60fd7c7780
18 changed files with 1878 additions and 2 deletions

View File

@ -0,0 +1,105 @@
From dbb0b8ec49fcd597c406f3b17f28b588e96cfa14 Mon Sep 17 00:00:00 2001
From: Nico Pache <npache@redhat.com>
Date: Sat, 4 Apr 2026 19:30:21 -0600
Subject: [PATCH] mm: document __GFP_NOFAIL must be blockable
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
commit 17d75422604f0b92869aa17cb44f60958212f033
Author: Barry Song <v-songbaohua@oppo.com>
Date: Sat Aug 31 08:28:22 2024 +1200
mm: document __GFP_NOFAIL must be blockable
Non-blocking allocation with __GFP_NOFAIL is not supported and may still
result in NULL pointers (if we don't return NULL, we result in busy-loop
within non-sleepable contexts):
static inline struct page *
__alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
struct alloc_context *ac)
{
...
/*
* Make sure that __GFP_NOFAIL request doesn't leak out and make sure
* we always retry
*/
if (gfp_mask & __GFP_NOFAIL) {
/*
* All existing users of the __GFP_NOFAIL are blockable, so warn
* of any new users that actually require GFP_NOWAIT
*/
if (WARN_ON_ONCE_GFP(!can_direct_reclaim, gfp_mask))
goto fail;
...
}
...
fail:
warn_alloc(gfp_mask, ac->nodemask,
"page allocation failure: order:%u", order);
got_pg:
return page;
}
Highlight this in the documentation of __GFP_NOFAIL so that non-mm
subsystems can reject any illegal usage of __GFP_NOFAIL with GFP_ATOMIC,
GFP_NOWAIT, etc.
Link: https://lkml.kernel.org/r/20240830202823.21478-3-21cnbao@gmail.com
Signed-off-by: Barry Song <v-songbaohua@oppo.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Davidlohr Bueso <dave@stgolabs.net>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: "Eugenio Pérez" <eperezma@redhat.com>
Cc: Hailong.Liu <hailong.liu@oppo.com>
Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Kees Cook <kees@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Maxime Coquelin <maxime.coquelin@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Uladzislau Rezki (Sony) <urezki@gmail.com>
Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Xie Yongji <xieyongji@bytedance.com>
Cc: Yafang Shao <laoar.shao@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
JIRA: https://redhat.atlassian.net/browse/RHEL-148561
Signed-off-by: Nico Pache <npache@redhat.com>
diff --git a/include/linux/gfp_types.h b/include/linux/gfp_types.h
index 6583a58670c5..373d3871f61e 100644
--- a/include/linux/gfp_types.h
+++ b/include/linux/gfp_types.h
@@ -168,7 +168,8 @@ typedef unsigned int __bitwise gfp_t;
* the caller still has to check for failures) while costly requests try to be
* not disruptive and back off even without invoking the OOM killer.
* The following three modifiers might be used to override some of these
- * implicit rules
+ * implicit rules. Please note that all of them must be used along with
+ * %__GFP_DIRECT_RECLAIM flag.
*
* %__GFP_NORETRY: The VM implementation will try only very lightweight
* memory direct reclaim to get some memory under memory pressure (thus
@@ -199,6 +200,8 @@ typedef unsigned int __bitwise gfp_t;
* cannot handle allocation failures. The allocation could block
* indefinitely but will never return with failure. Testing for
* failure is pointless.
+ * It _must_ be blockable and used together with __GFP_DIRECT_RECLAIM.
+ * It should _never_ be used in non-sleepable contexts.
* New users should be evaluated carefully (and the flag should be
* used only when there is no reasonable failure policy) but it is
* definitely preferable to use the flag rather than opencode endless
--
2.50.1 (Apple Git-155)

View File

@ -0,0 +1,169 @@
From e7842dda471d377ae8c6aaf9ddb4a73159f505b4 Mon Sep 17 00:00:00 2001
From: Nico Pache <npache@redhat.com>
Date: Sat, 4 Apr 2026 19:30:21 -0600
Subject: [PATCH] mm: warn about illegal __GFP_NOFAIL usage in a more
appropriate location and manner
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
commit 903edea6c53f097f5f0c847fdbbfab0c6c44f241
Author: Barry Song <v-songbaohua@oppo.com>
Date: Sat Aug 31 08:28:23 2024 +1200
mm: warn about illegal __GFP_NOFAIL usage in a more appropriate location and manner
Three points for this change:
1. We should consolidate all warnings in one place. Currently, the
order > 1 warning is in the hotpath, while others are in less
likely scenarios. Moving all warnings to the slowpath will reduce
the overhead for order > 1 and increase the visibility of other
warnings.
2. We currently have two warnings for order: one for order > 1 in
the hotpath and another for order > costly_order in the laziest
path. I suggest standardizing on order > 1 since it's been in
use for a long time.
3. We don't need to check for __GFP_NOWARN in this case. __GFP_NOWARN
is meant to suppress allocation failure reports, but here we're
dealing with bug detection, not allocation failures. So replace
WARN_ON_ONCE_GFP by WARN_ON_ONCE.
[v-songbaohua@oppo.com: also update the doc for __GFP_NOFAIL with order > 1]
Link: https://lkml.kernel.org/r/20240903223935.1697-1-21cnbao@gmail.com
Link: https://lkml.kernel.org/r/20240830202823.21478-4-21cnbao@gmail.com
Signed-off-by: Barry Song <v-songbaohua@oppo.com>
Suggested-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Christoph Lameter <cl@linux.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: David Rientjes <rientjes@google.com>
Cc: "Eugenio Pérez" <eperezma@redhat.com>
Cc: Hailong.Liu <hailong.liu@oppo.com>
Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Kees Cook <kees@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Maxime Coquelin <maxime.coquelin@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Uladzislau Rezki (Sony) <urezki@gmail.com>
Cc: Xie Yongji <xieyongji@bytedance.com>
Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Cc: Yafang Shao <laoar.shao@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
JIRA: https://redhat.atlassian.net/browse/RHEL-148561
Signed-off-by: Nico Pache <npache@redhat.com>
diff --git a/include/linux/gfp_types.h b/include/linux/gfp_types.h
index 373d3871f61e..359ed69b14d9 100644
--- a/include/linux/gfp_types.h
+++ b/include/linux/gfp_types.h
@@ -206,7 +206,8 @@ typedef unsigned int __bitwise gfp_t;
* used only when there is no reasonable failure policy) but it is
* definitely preferable to use the flag rather than opencode endless
* loop around allocator.
- * Using this flag for costly allocations is _highly_ discouraged.
+ * Allocating pages from the buddy with __GFP_NOFAIL and order > 1 is
+ * not supported. Please consider using kvmalloc() instead.
*/
#define __GFP_IO ((__force gfp_t)___GFP_IO)
#define __GFP_FS ((__force gfp_t)___GFP_FS)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 4e8615398f07..28cb88a3a758 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2914,12 +2914,6 @@ struct page *rmqueue(struct zone *preferred_zone,
{
struct page *page;
- /*
- * We most definitely don't want callers attempting to
- * allocate greater than order-1 page units with __GFP_NOFAIL.
- */
- WARN_ON_ONCE((gfp_flags & __GFP_NOFAIL) && (order > 1));
-
if (likely(pcp_allowed_order(order))) {
page = rmqueue_pcplist(preferred_zone, zone, order,
migratetype, alloc_flags);
@@ -4062,6 +4056,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
{
bool can_direct_reclaim = gfp_mask & __GFP_DIRECT_RECLAIM;
bool can_compact = gfp_compaction_allowed(gfp_mask);
+ bool nofail = gfp_mask & __GFP_NOFAIL;
const bool costly_order = order > PAGE_ALLOC_COSTLY_ORDER;
struct page *page = NULL;
unsigned int alloc_flags;
@@ -4074,6 +4069,25 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
unsigned int zonelist_iter_cookie;
int reserve_flags;
+ if (unlikely(nofail)) {
+ /*
+ * We most definitely don't want callers attempting to
+ * allocate greater than order-1 page units with __GFP_NOFAIL.
+ */
+ WARN_ON_ONCE(order > 1);
+ /*
+ * Also we don't support __GFP_NOFAIL without __GFP_DIRECT_RECLAIM,
+ * otherwise, we may result in lockup.
+ */
+ WARN_ON_ONCE(!can_direct_reclaim);
+ /*
+ * PF_MEMALLOC request from this context is rather bizarre
+ * because we cannot reclaim anything and only can loop waiting
+ * for somebody to do a work for us.
+ */
+ WARN_ON_ONCE(current->flags & PF_MEMALLOC);
+ }
+
restart:
compaction_retries = 0;
no_progress_loops = 0;
@@ -4291,29 +4305,15 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
* Make sure that __GFP_NOFAIL request doesn't leak out and make sure
* we always retry
*/
- if (gfp_mask & __GFP_NOFAIL) {
+ if (unlikely(nofail)) {
/*
- * All existing users of the __GFP_NOFAIL are blockable, so warn
- * of any new users that actually require GFP_NOWAIT
+ * Lacking direct_reclaim we can't do anything to reclaim memory,
+ * we disregard these unreasonable nofail requests and still
+ * return NULL
*/
- if (WARN_ON_ONCE_GFP(!can_direct_reclaim, gfp_mask))
+ if (!can_direct_reclaim)
goto fail;
- /*
- * PF_MEMALLOC request from this context is rather bizarre
- * because we cannot reclaim anything and only can loop waiting
- * for somebody to do a work for us
- */
- WARN_ON_ONCE_GFP(current->flags & PF_MEMALLOC, gfp_mask);
-
- /*
- * non failing costly orders are a hard requirement which we
- * are not prepared for much so let's warn about these users
- * so that we can identify them and convert them to something
- * else.
- */
- WARN_ON_ONCE_GFP(costly_order, gfp_mask);
-
/*
* Help non-failing allocations by giving some access to memory
* reserves normally used for high priority non-blocking
--
2.50.1 (Apple Git-155)

View File

@ -0,0 +1,85 @@
From 76329fc4d67ac1854f97337545e428e181b1cbe5 Mon Sep 17 00:00:00 2001
From: Nico Pache <npache@redhat.com>
Date: Sat, 4 Apr 2026 19:30:21 -0600
Subject: [PATCH] mm/page_alloc.c: avoid infinite retries caused by cpuset race
commit e05741fb10c38d70bbd7ec12b23c197b6355d519
Author: Tianyang Zhang <zhangtianyang@loongson.cn>
Date: Wed Apr 16 16:24:05 2025 +0800
mm/page_alloc.c: avoid infinite retries caused by cpuset race
__alloc_pages_slowpath has no change detection for ac->nodemask in the
part of retry path, while cpuset can modify it in parallel. For some
processes that set mempolicy as MPOL_BIND, this results ac->nodemask
changes, and then the should_reclaim_retry will judge based on the latest
nodemask and jump to retry, while the get_page_from_freelist only
traverses the zonelist from ac->preferred_zoneref, which selected by a
expired nodemask and may cause infinite retries in some cases
cpu 64:
__alloc_pages_slowpath {
/* ..... */
retry:
/* ac->nodemask = 0x1, ac->preferred->zone->nid = 1 */
if (alloc_flags & ALLOC_KSWAPD)
wake_all_kswapds(order, gfp_mask, ac);
/* cpu 1:
cpuset_write_resmask
update_nodemask
update_nodemasks_hier
update_tasks_nodemask
mpol_rebind_task
mpol_rebind_policy
mpol_rebind_nodemask
// mempolicy->nodes has been modified,
// which ac->nodemask point to
*/
/* ac->nodemask = 0x3, ac->preferred->zone->nid = 1 */
if (should_reclaim_retry(gfp_mask, order, ac, alloc_flags,
did_some_progress > 0, &no_progress_loops))
goto retry;
}
Simultaneously starting multiple cpuset01 from LTP can quickly reproduce
this issue on a multi node server when the maximum memory pressure is
reached and the swap is enabled
Link: https://lkml.kernel.org/r/20250416082405.20988-1-zhangtianyang@loongson.cn
Fixes: c33d6c06f60f ("mm, page_alloc: avoid looking up the first zone in a zonelist twice")
Signed-off-by: Tianyang Zhang <zhangtianyang@loongson.cn>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Brendan Jackman <jackmanb@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Zi Yan <ziy@nvidia.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
JIRA: https://redhat.atlassian.net/browse/RHEL-148561
Signed-off-by: Nico Pache <npache@redhat.com>
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 28cb88a3a758..dc1a7637bf97 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4193,6 +4193,14 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
}
retry:
+ /*
+ * Deal with possible cpuset update races or zonelist updates to avoid
+ * infinite retries.
+ */
+ if (check_retry_cpuset(cpuset_mems_cookie, ac) ||
+ check_retry_zonelist(zonelist_iter_cookie))
+ goto restart;
+
/* Ensure kswapd doesn't accidentally go to sleep as long as we loop */
if (alloc_flags & ALLOC_KSWAPD)
wake_all_kswapds(order, gfp_mask, ac);
--
2.50.1 (Apple Git-155)

View File

@ -0,0 +1,96 @@
From 32e483909f993e18474e30c11e0397a82c570b6e Mon Sep 17 00:00:00 2001
From: Nico Pache <npache@redhat.com>
Date: Sat, 4 Apr 2026 19:30:21 -0600
Subject: [PATCH] mm, page_alloc, thp: prevent reclaim for __GFP_THISNODE THP
allocations
commit 9c9828d3ead69416d731b1238802af31760c823e
Author: Vlastimil Babka <vbabka@suse.cz>
Date: Fri Dec 19 17:31:57 2025 +0100
mm, page_alloc, thp: prevent reclaim for __GFP_THISNODE THP allocations
Since commit cc638f329ef6 ("mm, thp: tweak reclaim/compaction effort of
local-only and all-node allocations"), THP page fault allocations have
settled on the following scheme (from the commit log):
1. local node only THP allocation with no reclaim, just compaction.
2. for madvised VMA's or when synchronous compaction is enabled always - THP
allocation from any node with effort determined by global defrag setting
and VMA madvise
3. fallback to base pages on any node
Recent customer reports however revealed we have a gap in step 1 above.
What we have seen is excessive reclaim due to THP page faults on a NUMA
node that's close to its high watermark, while other nodes have plenty of
free memory.
The problem with step 1 is that it promises no reclaim after the
compaction attempt, however reclaim is only avoided for certain compaction
outcomes (deferred, or skipped due to insufficient free base pages), and
not e.g. when compaction is actually performed but fails (we did see
compact_fail vmstat counter increasing).
THP page faults can therefore exhibit a zone_reclaim_mode-like behavior,
which is not the intention.
Thus add a check for __GFP_THISNODE that corresponds to this exact
situation and prevents continuing with reclaim/compaction once the initial
compaction attempt isn't successful in allocating the page.
Note that commit cc638f329ef6 has not introduced this over-reclaim
possibility; it appears to exist in some form since commit 2f0799a0ffc0
("mm, thp: restore node-local hugepage allocations"). Followup commits
b39d0ee2632d ("mm, page_alloc: avoid expensive reclaim when compaction may
not succeed") and cc638f329ef6 have moved in the right direction, but left
the abovementioned gap.
Link: https://lkml.kernel.org/r/20251219-costly-noretry-thisnode-fix-v1-1-e1085a4a0c34@suse.cz
Fixes: 2f0799a0ffc0 ("mm, thp: restore node-local hugepage allocations")
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Pedro Falcato <pfalcato@suse.de>
Acked-by: Zi Yan <ziy@nvidia.com>
Cc: Brendan Jackman <jackmanb@google.com>
Cc: "David Hildenbrand (Red Hat)" <david@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joshua Hahn <joshua.hahnjy@gmail.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
JIRA: https://redhat.atlassian.net/browse/RHEL-148561
Signed-off-by: Nico Pache <npache@redhat.com>
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index dc1a7637bf97..ca0d42a95410 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4183,6 +4183,20 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
compact_result == COMPACT_DEFERRED)
goto nopage;
+ /*
+ * THP page faults may attempt local node only first,
+ * but are then allowed to only compact, not reclaim,
+ * see alloc_pages_mpol().
+ *
+ * Compaction can fail for other reasons than those
+ * checked above and we don't want such THP allocations
+ * to put reclaim pressure on a single node in a
+ * situation where other nodes might have plenty of
+ * available memory.
+ */
+ if (gfp_mask & __GFP_THISNODE)
+ goto nopage;
+
/*
* Looks like reclaim/compaction is worth trying, but
* sync compaction could be very expensive, so keep
--
2.50.1 (Apple Git-155)

View File

@ -0,0 +1,122 @@
From 37f7d4a6be45deda800131b7a9ea6d1f3e4d97ab Mon Sep 17 00:00:00 2001
From: Nico Pache <npache@redhat.com>
Date: Sat, 4 Apr 2026 19:30:20 -0600
Subject: [PATCH] mm/page_alloc: ignore the exact initial compaction result
commit 66987218154918a6341a3e3eeeee58110a69e0bb
Author: Vlastimil Babka <vbabka@suse.cz>
Date: Tue Jan 6 12:52:36 2026 +0100
mm/page_alloc: ignore the exact initial compaction result
Patch series "tweaks for __alloc_pages_slowpath()", v3.
This patch (of 3):
For allocations that are of costly order and __GFP_NORETRY (and can
perform compaction) we attempt direct compaction first. If that fails, we
continue with a single round of direct reclaim+compaction (as for other
__GFP_NORETRY allocations, except the compaction is of lower priority),
with two exceptions that fail immediately:
- __GFP_THISNODE is specified, to prevent zone_reclaim_mode-like
behavior for e.g. THP page faults
- compaction failed because it was deferred (i.e. has been failing
recently so further attempts are not done for a while) or skipped,
which means there are insufficient free base pages to defragment to
begin with
Upon closer inspection, the second condition has a somewhat flawed
reasoning. If there are not enough base pages and reclaim could create
them, we instead fail. When there are enough base pages and compaction
has already ran and failed, we proceed and hope that reclaim and the
subsequent compaction attempt will succeed. But it's unclear why they
should and whether it will be as inexpensive as intended.
It might make therefore more sense to just fail unconditionally after the
initial compaction attempt. However that would change the semantics of
__GFP_NORETRY to attempt reclaim at least once.
Alternatively we can remove the compaction result checks and proceed with
the single reclaim and (lower priority) compaction attempt, leaving only
the __GFP_THISNODE exception for failing immediately.
Link: https://lkml.kernel.org/r/20260106-thp-thisnode-tweak-v3-0-f5d67c21a193@suse.cz
Link: https://lkml.kernel.org/r/20260106-thp-thisnode-tweak-v3-1-f5d67c21a193@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Brendan Jackman <jackmanb@google.com>
Cc: David Hildenbrand (Red Hat) <david@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Joshua Hahn <joshua.hahnjy@gmail.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Pedro Falcato <pfalcato@suse.de>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
JIRA: https://redhat.atlassian.net/browse/RHEL-148561
Signed-off-by: Nico Pache <npache@redhat.com>
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index ca0d42a95410..3301e934dafa 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4162,44 +4162,22 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
* includes some THP page fault allocations
*/
if (costly_order && (gfp_mask & __GFP_NORETRY)) {
- /*
- * If allocating entire pageblock(s) and compaction
- * failed because all zones are below low watermarks
- * or is prohibited because it recently failed at this
- * order, fail immediately unless the allocator has
- * requested compaction and reclaim retry.
- *
- * Reclaim is
- * - potentially very expensive because zones are far
- * below their low watermarks or this is part of very
- * bursty high order allocations,
- * - not guaranteed to help because isolate_freepages()
- * may not iterate over freed pages as part of its
- * linear scan, and
- * - unlikely to make entire pageblocks free on its
- * own.
- */
- if (compact_result == COMPACT_SKIPPED ||
- compact_result == COMPACT_DEFERRED)
- goto nopage;
-
/*
* THP page faults may attempt local node only first,
* but are then allowed to only compact, not reclaim,
* see alloc_pages_mpol().
*
- * Compaction can fail for other reasons than those
- * checked above and we don't want such THP allocations
- * to put reclaim pressure on a single node in a
- * situation where other nodes might have plenty of
- * available memory.
+ * Compaction has failed above and we don't want such
+ * THP allocations to put reclaim pressure on a single
+ * node in a situation where other nodes might have
+ * plenty of available memory.
*/
if (gfp_mask & __GFP_THISNODE)
goto nopage;
/*
- * Looks like reclaim/compaction is worth trying, but
- * sync compaction could be very expensive, so keep
+ * Proceed with single round of reclaim/compaction, but
+ * since sync compaction could be very expensive, keep
* using async compaction.
*/
compact_priority = INIT_COMPACT_PRIORITY;
--
2.50.1 (Apple Git-155)

View File

@ -0,0 +1,208 @@
From 2490569160937bfa1556b9d2dc07998148eb5f77 Mon Sep 17 00:00:00 2001
From: Nico Pache <npache@redhat.com>
Date: Sat, 4 Apr 2026 19:30:20 -0600
Subject: [PATCH] mm/page_alloc: refactor the initial compaction handling
commit 53a9b4646f67c95df1775aa5f381cb7f42cae957
Author: Vlastimil Babka <vbabka@suse.cz>
Date: Tue Jan 6 12:52:37 2026 +0100
mm/page_alloc: refactor the initial compaction handling
The initial direct compaction done in some cases in
__alloc_pages_slowpath() stands out from the main retry loop of reclaim +
compaction.
We can simplify this by instead skipping the initial reclaim attempt via a
new local variable compact_first, and handle the compact_prority as
necessary to match the original behavior. No functional change intended.
Link: https://lkml.kernel.org/r/20260106-thp-thisnode-tweak-v3-2-f5d67c21a193@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Suggested-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Joshua Hahn <joshua.hahnjy@gmail.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Brendan Jackman <jackmanb@google.com>
Cc: David Hildenbrand (Red Hat) <david@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Pedro Falcato <pfalcato@suse.de>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
JIRA: https://redhat.atlassian.net/browse/RHEL-148561
Signed-off-by: Nico Pache <npache@redhat.com>
diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 3e4c0c536a3d..ac836590ba3a 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -348,9 +348,15 @@ extern gfp_t gfp_allowed_mask;
/* Returns true if the gfp_mask allows use of ALLOC_NO_WATERMARK */
bool gfp_pfmemalloc_allowed(gfp_t gfp_mask);
+/* A helper for checking if gfp includes all the specified flags */
+static inline bool gfp_has_flags(gfp_t gfp, gfp_t flags)
+{
+ return (gfp & flags) == flags;
+}
+
static inline bool gfp_has_io_fs(gfp_t gfp)
{
- return (gfp & (__GFP_IO | __GFP_FS)) == (__GFP_IO | __GFP_FS);
+ return gfp_has_flags(gfp, __GFP_IO | __GFP_FS);
}
/*
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 3301e934dafa..277ed887ec7a 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4055,7 +4055,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
struct alloc_context *ac)
{
bool can_direct_reclaim = gfp_mask & __GFP_DIRECT_RECLAIM;
- bool can_compact = gfp_compaction_allowed(gfp_mask);
+ bool can_compact = can_direct_reclaim && gfp_compaction_allowed(gfp_mask);
bool nofail = gfp_mask & __GFP_NOFAIL;
const bool costly_order = order > PAGE_ALLOC_COSTLY_ORDER;
struct page *page = NULL;
@@ -4068,6 +4068,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
unsigned int cpuset_mems_cookie;
unsigned int zonelist_iter_cookie;
int reserve_flags;
+ bool compact_first = false;
if (unlikely(nofail)) {
/*
@@ -4095,6 +4096,19 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
cpuset_mems_cookie = read_mems_allowed_begin();
zonelist_iter_cookie = zonelist_iter_begin();
+ /*
+ * For costly allocations, try direct compaction first, as it's likely
+ * that we have enough base pages and don't need to reclaim. For non-
+ * movable high-order allocations, do that as well, as compaction will
+ * try prevent permanent fragmentation by migrating from blocks of the
+ * same migratetype.
+ */
+ if (can_compact && (costly_order || (order > 0 &&
+ ac->migratetype != MIGRATE_MOVABLE))) {
+ compact_first = true;
+ compact_priority = INIT_COMPACT_PRIORITY;
+ }
+
/*
* The fast path uses conservative alloc_flags to succeed only until
* kswapd needs to be woken up, and to avoid the cost of setting up
@@ -4137,53 +4151,6 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
if (page)
goto got_pg;
- /*
- * For costly allocations, try direct compaction first, as it's likely
- * that we have enough base pages and don't need to reclaim. For non-
- * movable high-order allocations, do that as well, as compaction will
- * try prevent permanent fragmentation by migrating from blocks of the
- * same migratetype.
- * Don't try this for allocations that are allowed to ignore
- * watermarks, as the ALLOC_NO_WATERMARKS attempt didn't yet happen.
- */
- if (can_direct_reclaim && can_compact &&
- (costly_order ||
- (order > 0 && ac->migratetype != MIGRATE_MOVABLE))
- && !gfp_pfmemalloc_allowed(gfp_mask)) {
- page = __alloc_pages_direct_compact(gfp_mask, order,
- alloc_flags, ac,
- INIT_COMPACT_PRIORITY,
- &compact_result);
- if (page)
- goto got_pg;
-
- /*
- * Checks for costly allocations with __GFP_NORETRY, which
- * includes some THP page fault allocations
- */
- if (costly_order && (gfp_mask & __GFP_NORETRY)) {
- /*
- * THP page faults may attempt local node only first,
- * but are then allowed to only compact, not reclaim,
- * see alloc_pages_mpol().
- *
- * Compaction has failed above and we don't want such
- * THP allocations to put reclaim pressure on a single
- * node in a situation where other nodes might have
- * plenty of available memory.
- */
- if (gfp_mask & __GFP_THISNODE)
- goto nopage;
-
- /*
- * Proceed with single round of reclaim/compaction, but
- * since sync compaction could be very expensive, keep
- * using async compaction.
- */
- compact_priority = INIT_COMPACT_PRIORITY;
- }
- }
-
retry:
/*
* Deal with possible cpuset update races or zonelist updates to avoid
@@ -4227,10 +4194,12 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
goto nopage;
/* Try direct reclaim and then allocating */
- page = __alloc_pages_direct_reclaim(gfp_mask, order, alloc_flags, ac,
- &did_some_progress);
- if (page)
- goto got_pg;
+ if (!compact_first) {
+ page = __alloc_pages_direct_reclaim(gfp_mask, order, alloc_flags,
+ ac, &did_some_progress);
+ if (page)
+ goto got_pg;
+ }
/* Try direct compaction and then allocating */
page = __alloc_pages_direct_compact(gfp_mask, order, alloc_flags, ac,
@@ -4238,6 +4207,33 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
if (page)
goto got_pg;
+ if (compact_first) {
+ /*
+ * THP page faults may attempt local node only first, but are
+ * then allowed to only compact, not reclaim, see
+ * alloc_pages_mpol().
+ *
+ * Compaction has failed above and we don't want such THP
+ * allocations to put reclaim pressure on a single node in a
+ * situation where other nodes might have plenty of available
+ * memory.
+ */
+ if (gfp_has_flags(gfp_mask, __GFP_NORETRY | __GFP_THISNODE))
+ goto nopage;
+
+ /*
+ * For the initial compaction attempt we have lowered its
+ * priority. Restore it for further retries, if those are
+ * allowed. With __GFP_NORETRY there will be a single round of
+ * reclaim and compaction with the lowered priority.
+ */
+ if (!(gfp_mask & __GFP_NORETRY))
+ compact_priority = DEF_COMPACT_PRIORITY;
+
+ compact_first = false;
+ goto retry;
+ }
+
/* Do not loop if specifically requested */
if (gfp_mask & __GFP_NORETRY)
goto nopage;
--
2.50.1 (Apple Git-155)

View File

@ -0,0 +1,138 @@
From a17c77996e1aa930c05901e213f1441f0db7a46a Mon Sep 17 00:00:00 2001
From: Nico Pache <npache@redhat.com>
Date: Sat, 4 Apr 2026 19:30:20 -0600
Subject: [PATCH] mm/page_alloc: simplify __alloc_pages_slowpath() flow
commit 2c4c3e29897d43c431b1cf9432fb66977f262ac2
Author: Vlastimil Babka <vbabka@suse.cz>
Date: Tue Jan 6 12:52:38 2026 +0100
mm/page_alloc: simplify __alloc_pages_slowpath() flow
The actions done before entering the main retry loop include waking up
kswapds and an allocation attempt with the precise alloc_flags. Then in
the loop we keep waking up kswapds, and we retry the allocation with flags
potentially further adjusted by being allowed to use reserves (due to e.g.
becoming an OOM killer victim).
We can adjust the retry loop to keep only one instance of waking up
kswapds and allocation attempt. Introduce the can_retry_reserves variable
for retrying once when we become eligible for reserves. It is still
useful not to evaluate reserve_flags immediately for the first allocation
attempt, because it's better to first try succeed in a non-preferred zone
above the min watermark before allocating immediately from the preferred
zone below min watermark.
Additionally move the cpuset update checks introduced by e05741fb10c3
("mm/page_alloc.c: avoid infinite retries caused by cpuset race") further
down the retry loop. It's enough to do the checks only before reaching
any potentially infinite 'goto retry;' loop.
There should be no meaningful functional changes. The change of exact
moments the retry for reserves and cpuset updates are checked should not
result in different outomes modulo races with concurrent allocator
activity.
Link: https://lkml.kernel.org/r/20260106-thp-thisnode-tweak-v3-3-f5d67c21a193@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Joshua Hahn <joshua.hahnjy@gmail.com>
Cc: Brendan Jackman <jackmanb@google.com>
Cc: David Hildenbrand (Red Hat) <david@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Pedro Falcato <pfalcato@suse.de>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
JIRA: https://redhat.atlassian.net/browse/RHEL-148561
Signed-off-by: Nico Pache <npache@redhat.com>
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 277ed887ec7a..4c2b622a39cf 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4069,6 +4069,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
unsigned int zonelist_iter_cookie;
int reserve_flags;
bool compact_first = false;
+ bool can_retry_reserves = true;
if (unlikely(nofail)) {
/*
@@ -4140,6 +4141,8 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
goto nopage;
}
+retry:
+ /* Ensure kswapd doesn't accidentally go to sleep as long as we loop */
if (alloc_flags & ALLOC_KSWAPD)
wake_all_kswapds(order, gfp_mask, ac);
@@ -4151,19 +4154,6 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
if (page)
goto got_pg;
-retry:
- /*
- * Deal with possible cpuset update races or zonelist updates to avoid
- * infinite retries.
- */
- if (check_retry_cpuset(cpuset_mems_cookie, ac) ||
- check_retry_zonelist(zonelist_iter_cookie))
- goto restart;
-
- /* Ensure kswapd doesn't accidentally go to sleep as long as we loop */
- if (alloc_flags & ALLOC_KSWAPD)
- wake_all_kswapds(order, gfp_mask, ac);
-
reserve_flags = __gfp_pfmemalloc_flags(gfp_mask);
if (reserve_flags)
alloc_flags = gfp_to_alloc_flags_cma(gfp_mask, reserve_flags) |
@@ -4178,12 +4168,18 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
ac->nodemask = NULL;
ac->preferred_zoneref = first_zones_zonelist(ac->zonelist,
ac->highest_zoneidx, ac->nodemask);
- }
- /* Attempt with potentially adjusted zonelist and alloc_flags */
- page = get_page_from_freelist(gfp_mask, order, alloc_flags, ac);
- if (page)
- goto got_pg;
+ /*
+ * The first time we adjust anything due to being allowed to
+ * ignore memory policies or watermarks, retry immediately. This
+ * allows us to keep the first allocation attempt optimistic so
+ * it can succeed in a zone that is still above watermarks.
+ */
+ if (can_retry_reserves) {
+ can_retry_reserves = false;
+ goto retry;
+ }
+ }
/* Caller is not willing to reclaim, we can't balance anything */
if (!can_direct_reclaim)
@@ -4246,6 +4242,15 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
!(gfp_mask & __GFP_RETRY_MAYFAIL)))
goto nopage;
+ /*
+ * Deal with possible cpuset update races or zonelist updates to avoid
+ * infinite retries. No "goto retry;" can be placed above this check
+ * unless it can execute just once.
+ */
+ if (check_retry_cpuset(cpuset_mems_cookie, ac) ||
+ check_retry_zonelist(zonelist_iter_cookie))
+ goto restart;
+
if (should_reclaim_retry(gfp_mask, order, ac, alloc_flags,
did_some_progress > 0, &no_progress_loops))
goto retry;
--
2.50.1 (Apple Git-155)

View File

@ -0,0 +1,85 @@
From fe93b8af523e8f3cf1e7304d100adf6ed44f6345 Mon Sep 17 00:00:00 2001
From: Nico Pache <npache@redhat.com>
Date: Sat, 28 Mar 2026 16:17:55 -0600
Subject: [PATCH] mm/page_alloc: add vm.thp_thisnode_reclaim sysctl to allow
THP reclaim on local node
Upstream commit cd2e3c32636e ("mm, page_alloc, thp: prevent reclaim for
__GFP_THISNODE THP allocations") prevents __GFP_THISNODE THP allocations
from proceeding into reclaim after compaction failure, to avoid
zone_reclaim_mode-like excessive reclaim on a single NUMA node when other
nodes have plenty of free memory. This was further refined by upstream
commits 66987218154918a6 and 53a9b4646f67 which refactored the check
into gfp_has_flags(gfp_mask, __GFP_NORETRY | __GFP_THISNODE).
While this is the correct default, to prevent workloads regressing on older
releases, or for customers/workloads that may benefit from the more aggressive
reclaim behavior. Add a sysctl knob (vm.thp_thisnode_reclaim) to restore the
previous behavior.
The sysctl defaults to 1 to avoid regressions and keep the pre-fix behavior.
Upstream-status: RHEL-Only
JIRA: https://redhat.atlassian.net/browse/RHEL-148561
Signed-off-by: Nico Pache <npache@redhat.com>
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 4c2b622a39cf..b26f3d53b751 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -290,6 +290,14 @@ int user_min_free_kbytes = -1;
static int watermark_boost_factor __read_mostly = 15000;
static int watermark_scale_factor = 10;
+/*
+ * RHEL-ONLY: When set to 1, allows reclaim for __GFP_THISNODE THP allocations,
+ * restoring the behavior prior to the fix that prevents zone_reclaim_mode-like
+ * excessive reclaim on a single NUMA node when other nodes have plenty of free
+ * memory.
+ */
+static int thp_thisnode_reclaim __read_mostly = 1;
+
/* movable_zone is the "real" zone pages in ZONE_MOVABLE are taken from */
int movable_zone;
EXPORT_SYMBOL(movable_zone);
@@ -4213,9 +4221,20 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
* allocations to put reclaim pressure on a single node in a
* situation where other nodes might have plenty of available
* memory.
+ *
+ * RHEL-ONLY: vm.thp_thisnode_reclaim can override this to
+ * restore the pre-fix behavior: allow reclaim for THISNODE THP
+ * allocations, but still fail immediately when compaction was
+ * skipped (insufficient free base pages) or deferred (recent
+ * compaction failures at this order).
*/
- if (gfp_has_flags(gfp_mask, __GFP_NORETRY | __GFP_THISNODE))
- goto nopage;
+ if (gfp_has_flags(gfp_mask, __GFP_NORETRY | __GFP_THISNODE)) {
+ if (!thp_thisnode_reclaim)
+ goto nopage;
+ if (compact_result == COMPACT_SKIPPED ||
+ compact_result == COMPACT_DEFERRED)
+ goto nopage;
+ }
/*
* For the initial compaction attempt we have lowered its
@@ -6213,6 +6232,15 @@ static struct ctl_table page_alloc_sysctl_table[] = {
.extra1 = SYSCTL_ZERO,
.extra2 = SYSCTL_ONE_HUNDRED,
},
+ {
+ .procname = "thp_thisnode_reclaim", //RHEL-ONLY
+ .data = &thp_thisnode_reclaim,
+ .maxlen = sizeof(thp_thisnode_reclaim),
+ .mode = 0644,
+ .proc_handler = proc_dointvec_minmax,
+ .extra1 = SYSCTL_ZERO,
+ .extra2 = SYSCTL_ONE,
+ },
#endif
{}
};
--
2.50.1 (Apple Git-155)

View File

@ -0,0 +1,123 @@
From e0c8209f463129749b824ebf8068fd75774dd5d7 Mon Sep 17 00:00:00 2001
From: CKI Backport Bot <cki-ci-bot+cki-gitlab-backport-bot@redhat.com>
Date: Tue, 28 Apr 2026 12:07:13 +0000
Subject: [PATCH] smb: client: fix OOB reads parsing symlink error response
JIRA: https://redhat.atlassian.net/browse/RHEL-171472
CVE: CVE-2026-31613
commit 3df690bba28edec865cf7190be10708ad0ddd67e
Author: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Date: Mon Apr 6 15:49:38 2026 +0200
smb: client: fix OOB reads parsing symlink error response
When a CREATE returns STATUS_STOPPED_ON_SYMLINK, smb2_check_message()
returns success without any length validation, leaving the symlink
parsers as the only defense against an untrusted server.
symlink_data() walks SMB 3.1.1 error contexts with the loop test "p <
end", but reads p->ErrorId at offset 4 and p->ErrorDataLength at offset
0. When the server-controlled ErrorDataLength advances p to within 1-7
bytes of end, the next iteration will read past it. When the matching
context is found, sym->SymLinkErrorTag is read at offset 4 from
p->ErrorContextData with no check that the symlink header itself fits.
smb2_parse_symlink_response() then bounds-checks the substitute name
using SMB2_SYMLINK_STRUCT_SIZE as the offset of PathBuffer from
iov_base. That value is computed as sizeof(smb2_err_rsp) +
sizeof(smb2_symlink_err_rsp), which is correct only when
ErrorContextCount == 0.
With at least one error context the symlink data sits 8 bytes deeper,
and each skipped non-matching context shifts it further by 8 +
ALIGN(ErrorDataLength, 8). The check is too short, allowing the
substitute name read to run past iov_len. The out-of-bound heap bytes
are UTF-16-decoded into the symlink target and returned to userspace via
readlink(2).
Fix this all up by making the loops test require the full context header
to fit, rejecting sym if its header runs past end, and bound the
substitute name against the actual position of sym->PathBuffer rather
than a fixed offset.
Because sub_offs and sub_len are 16bits, the pointer math will not
overflow here with the new greater-than.
Cc: Ronnie Sahlberg <ronniesahlberg@gmail.com>
Cc: Shyam Prasad N <sprasad@microsoft.com>
Cc: Tom Talpey <tom@talpey.com>
Cc: Bharath SM <bharathsm@microsoft.com>
Cc: linux-cifs@vger.kernel.org
Cc: samba-technical@lists.samba.org
Cc: stable <stable@kernel.org>
Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.org>
Assisted-by: gregkh_clanker_t1000
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: CKI Backport Bot <cki-ci-bot+cki-gitlab-backport-bot@redhat.com>
diff --git a/fs/smb/client/smb2file.c b/fs/smb/client/smb2file.c
index fa9726f53143..4793981e2bb7 100644
--- a/fs/smb/client/smb2file.c
+++ b/fs/smb/client/smb2file.c
@@ -27,10 +27,11 @@ static struct smb2_symlink_err_rsp *symlink_data(const struct kvec *iov)
{
struct smb2_err_rsp *err = iov->iov_base;
struct smb2_symlink_err_rsp *sym = ERR_PTR(-EINVAL);
+ u8 *end = (u8 *)err + iov->iov_len;
u32 len;
if (err->ErrorContextCount) {
- struct smb2_error_context_rsp *p, *end;
+ struct smb2_error_context_rsp *p;
len = (u32)err->ErrorContextCount * (offsetof(struct smb2_error_context_rsp,
ErrorContextData) +
@@ -39,8 +40,7 @@ static struct smb2_symlink_err_rsp *symlink_data(const struct kvec *iov)
return ERR_PTR(-EINVAL);
p = (struct smb2_error_context_rsp *)err->ErrorData;
- end = (struct smb2_error_context_rsp *)((u8 *)err + iov->iov_len);
- do {
+ while ((u8 *)p + sizeof(*p) <= end) {
if (le32_to_cpu(p->ErrorId) == SMB2_ERROR_ID_DEFAULT) {
sym = (struct smb2_symlink_err_rsp *)p->ErrorContextData;
break;
@@ -50,14 +50,16 @@ static struct smb2_symlink_err_rsp *symlink_data(const struct kvec *iov)
len = ALIGN(le32_to_cpu(p->ErrorDataLength), 8);
p = (struct smb2_error_context_rsp *)(p->ErrorContextData + len);
- } while (p < end);
+ }
} else if (le32_to_cpu(err->ByteCount) >= sizeof(*sym) &&
iov->iov_len >= SMB2_SYMLINK_STRUCT_SIZE) {
sym = (struct smb2_symlink_err_rsp *)err->ErrorData;
}
- if (!IS_ERR(sym) && (le32_to_cpu(sym->SymLinkErrorTag) != SYMLINK_ERROR_TAG ||
- le32_to_cpu(sym->ReparseTag) != IO_REPARSE_TAG_SYMLINK))
+ if (!IS_ERR(sym) &&
+ ((u8 *)sym + sizeof(*sym) > end ||
+ le32_to_cpu(sym->SymLinkErrorTag) != SYMLINK_ERROR_TAG ||
+ le32_to_cpu(sym->ReparseTag) != IO_REPARSE_TAG_SYMLINK))
sym = ERR_PTR(-EINVAL);
return sym;
@@ -128,8 +130,10 @@ int smb2_parse_symlink_response(struct cifs_sb_info *cifs_sb, const struct kvec
print_len = le16_to_cpu(sym->PrintNameLength);
print_offs = le16_to_cpu(sym->PrintNameOffset);
- if (iov->iov_len < SMB2_SYMLINK_STRUCT_SIZE + sub_offs + sub_len ||
- iov->iov_len < SMB2_SYMLINK_STRUCT_SIZE + print_offs + print_len)
+ if ((char *)sym->PathBuffer + sub_offs + sub_len >
+ (char *)iov->iov_base + iov->iov_len ||
+ (char *)sym->PathBuffer + print_offs + print_len >
+ (char *)iov->iov_base + iov->iov_len)
return -EINVAL;
return smb2_parse_native_symlink(path,
--
2.50.1 (Apple Git-155)

View File

@ -0,0 +1,46 @@
From b68bb0a260effb5982ab52535a3213ff03b57ed9 Mon Sep 17 00:00:00 2001
From: Vladislav Dronov <vdronov@redhat.com>
Date: Wed, 29 Apr 2026 23:04:01 +0200
Subject: [PATCH] crypto: authenc - Fix sleep in atomic context in decrypt_tail
JIRA: https://issues.redhat.com/browse/RHEL-172166
Upstream Status: merged into the linux.git
commit 66eae850333d639fc278d6f915c6fc01499ea893
Author: Herbert Xu <herbert@gondor.apana.org.au>
Date: Wed Jan 19 17:58:40 2022 +1100
crypto: authenc - Fix sleep in atomic context in decrypt_tail
The function crypto_authenc_decrypt_tail discards its flags
argument and always relies on the flags from the original request
when starting its sub-request.
This is clearly wrong as it may cause the SLEEPABLE flag to be
set when it shouldn't.
Fixes: 92d95ba91772 ("crypto: authenc - Convert to new AEAD interface")
Reported-by: Corentin Labbe <clabbe.montjoie@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Tested-by: Corentin Labbe <clabbe.montjoie@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Assisted-by: Patchpal 0.7.1
Signed-off-by: Vladislav Dronov <vdronov@redhat.com>
diff --git a/crypto/authenc.c b/crypto/authenc.c
index 670bf1a01d00..17f674a7cdff 100644
--- a/crypto/authenc.c
+++ b/crypto/authenc.c
@@ -253,7 +253,7 @@ static int crypto_authenc_decrypt_tail(struct aead_request *req,
dst = scatterwalk_ffwd(areq_ctx->dst, req->dst, req->assoclen);
skcipher_request_set_tfm(skreq, ctx->enc);
- skcipher_request_set_callback(skreq, aead_request_flags(req),
+ skcipher_request_set_callback(skreq, flags,
req->base.complete, req->base.data);
skcipher_request_set_crypt(skreq, src, dst,
req->cryptlen - authsize, req->iv);
--
2.50.1 (Apple Git-155)

View File

@ -0,0 +1,213 @@
From 6cedc3414cbe4e00b4a85ac4381edb18805194d6 Mon Sep 17 00:00:00 2001
From: Vladislav Dronov <vdronov@redhat.com>
Date: Wed, 29 Apr 2026 23:05:43 +0200
Subject: [PATCH] crypto: authenc - Correctly pass EINPROGRESS back up to the
caller
JIRA: https://issues.redhat.com/browse/RHEL-172166
Upstream Status: merged into the linux.git
Conflicts: Missing a large crypto-tree-wide upstream patch 255e48eb1768
("crypto: api - Use data directly in completion function"). To apply:
- Change "void *data" back to "struct crypto_async_request *areq".
- Changle "struct aead_request *req = data" back to "struct aead_request
*req = areq->data".
commit 96feb73def02d175850daa0e7c2c90c876681b5c
Author: Herbert Xu <herbert@gondor.apana.org.au>
Date: Wed Sep 24 18:20:17 2025 +0800
crypto: authenc - Correctly pass EINPROGRESS back up to the caller
When authenc is invoked with MAY_BACKLOG, it needs to pass EINPROGRESS
notifications back up to the caller when the underlying algorithm
returns EBUSY synchronously.
However, if the EBUSY comes from the second part of an authenc call,
i.e., it is asynchronous, both the EBUSY and the subsequent EINPROGRESS
notification must not be passed to the caller.
Implement this by passing a mask to the function that starts the
second half of authenc and using it to determine whether EBUSY
and EINPROGRESS should be passed to the caller.
This was a deficiency in the original implementation of authenc
because it was not expected to be used with MAY_BACKLOG.
Reported-by: Ingo Franzki <ifranzki@linux.ibm.com>
Reported-by: Mikulas Patocka <mpatocka@redhat.com>
Fixes: 180ce7e81030 ("crypto: authenc - Add EINPROGRESS check")
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Assisted-by: Patchpal AI 0.7.1
Signed-off-by: Vladislav Dronov <vdronov@redhat.com>
diff --git a/crypto/authenc.c b/crypto/authenc.c
index 17f674a7cdff..494c0b6db431 100644
--- a/crypto/authenc.c
+++ b/crypto/authenc.c
@@ -39,7 +39,7 @@ struct authenc_request_ctx {
static void authenc_request_complete(struct aead_request *req, int err)
{
- if (err != -EINPROGRESS)
+ if (err != -EINPROGRESS && err != -EBUSY)
aead_request_complete(req, err);
}
@@ -109,27 +109,42 @@ static int crypto_authenc_setkey(struct crypto_aead *authenc, const u8 *key,
return err;
}
-static void authenc_geniv_ahash_done(struct crypto_async_request *areq, int err)
+static void authenc_geniv_ahash_finish(struct aead_request *req)
{
- struct aead_request *req = areq->data;
struct crypto_aead *authenc = crypto_aead_reqtfm(req);
struct aead_instance *inst = aead_alg_instance(authenc);
struct authenc_instance_ctx *ictx = aead_instance_ctx(inst);
struct authenc_request_ctx *areq_ctx = aead_request_ctx(req);
struct ahash_request *ahreq = (void *)(areq_ctx->tail + ictx->reqoff);
- if (err)
- goto out;
-
scatterwalk_map_and_copy(ahreq->result, req->dst,
req->assoclen + req->cryptlen,
crypto_aead_authsize(authenc), 1);
+}
-out:
+static void authenc_geniv_ahash_done(struct crypto_async_request *areq, int err)
+{
+ struct aead_request *req = areq->data;
+
+ if (!err)
+ authenc_geniv_ahash_finish(req);
aead_request_complete(req, err);
}
-static int crypto_authenc_genicv(struct aead_request *req, unsigned int flags)
+/*
+ * Used when the ahash request was invoked in the async callback context
+ * of the previous skcipher request. Eat any EINPROGRESS notifications.
+ */
+static void authenc_geniv_ahash_done2(struct crypto_async_request *areq, int err)
+{
+ struct aead_request *req = areq->data;
+
+ if (!err)
+ authenc_geniv_ahash_finish(req);
+ authenc_request_complete(req, err);
+}
+
+static int crypto_authenc_genicv(struct aead_request *req, unsigned int mask)
{
struct crypto_aead *authenc = crypto_aead_reqtfm(req);
struct aead_instance *inst = aead_alg_instance(authenc);
@@ -138,6 +153,7 @@ static int crypto_authenc_genicv(struct aead_request *req, unsigned int flags)
struct crypto_ahash *auth = ctx->auth;
struct authenc_request_ctx *areq_ctx = aead_request_ctx(req);
struct ahash_request *ahreq = (void *)(areq_ctx->tail + ictx->reqoff);
+ unsigned int flags = aead_request_flags(req) & ~mask;
u8 *hash = areq_ctx->tail;
int err;
@@ -148,7 +164,8 @@ static int crypto_authenc_genicv(struct aead_request *req, unsigned int flags)
ahash_request_set_crypt(ahreq, req->dst, hash,
req->assoclen + req->cryptlen);
ahash_request_set_callback(ahreq, flags,
- authenc_geniv_ahash_done, req);
+ mask ? authenc_geniv_ahash_done2 :
+ authenc_geniv_ahash_done, req);
err = crypto_ahash_digest(ahreq);
if (err)
@@ -165,12 +182,11 @@ static void crypto_authenc_encrypt_done(struct crypto_async_request *req,
{
struct aead_request *areq = req->data;
- if (err)
- goto out;
-
- err = crypto_authenc_genicv(areq, 0);
-
-out:
+ if (err) {
+ aead_request_complete(areq, err);
+ return;
+ }
+ err = crypto_authenc_genicv(areq, CRYPTO_TFM_REQ_MAY_SLEEP);
authenc_request_complete(areq, err);
}
@@ -223,11 +239,18 @@ static int crypto_authenc_encrypt(struct aead_request *req)
if (err)
return err;
- return crypto_authenc_genicv(req, aead_request_flags(req));
+ return crypto_authenc_genicv(req, 0);
+}
+
+static void authenc_decrypt_tail_done(struct crypto_async_request *areq, int err)
+{
+ struct aead_request *req = areq->data;
+
+ authenc_request_complete(req, err);
}
static int crypto_authenc_decrypt_tail(struct aead_request *req,
- unsigned int flags)
+ unsigned int mask)
{
struct crypto_aead *authenc = crypto_aead_reqtfm(req);
struct aead_instance *inst = aead_alg_instance(authenc);
@@ -238,6 +261,7 @@ static int crypto_authenc_decrypt_tail(struct aead_request *req,
struct skcipher_request *skreq = (void *)(areq_ctx->tail +
ictx->reqoff);
unsigned int authsize = crypto_aead_authsize(authenc);
+ unsigned int flags = aead_request_flags(req) & ~mask;
u8 *ihash = ahreq->result + authsize;
struct scatterlist *src, *dst;
@@ -254,7 +278,9 @@ static int crypto_authenc_decrypt_tail(struct aead_request *req,
skcipher_request_set_tfm(skreq, ctx->enc);
skcipher_request_set_callback(skreq, flags,
- req->base.complete, req->base.data);
+ mask ? authenc_decrypt_tail_done :
+ req->base.complete,
+ mask ? req : req->base.data);
skcipher_request_set_crypt(skreq, src, dst,
req->cryptlen - authsize, req->iv);
@@ -266,12 +292,11 @@ static void authenc_verify_ahash_done(struct crypto_async_request *areq,
{
struct aead_request *req = areq->data;
- if (err)
- goto out;
-
- err = crypto_authenc_decrypt_tail(req, 0);
-
-out:
+ if (err) {
+ aead_request_complete(req, err);
+ return;
+ }
+ err = crypto_authenc_decrypt_tail(req, CRYPTO_TFM_REQ_MAY_SLEEP);
authenc_request_complete(req, err);
}
@@ -301,7 +326,7 @@ static int crypto_authenc_decrypt(struct aead_request *req)
if (err)
return err;
- return crypto_authenc_decrypt_tail(req, aead_request_flags(req));
+ return crypto_authenc_decrypt_tail(req, 0);
}
static int crypto_authenc_init_tfm(struct crypto_aead *tfm)
--
2.50.1 (Apple Git-155)

View File

@ -0,0 +1,60 @@
From 27fdbab4221b375de54bf91919798d88520c6e28 Mon Sep 17 00:00:00 2001
From: Juergen Gross <jgross@suse.com>
Date: Fri, 27 Mar 2026 14:13:38 +0100
Subject: [PATCH] Buffer overflow in drivers/xen/sys-hypervisor.c
The build id returned by HYPERVISOR_xen_version(XENVER_build_id) is
neither NUL terminated nor a string.
The first causes a buffer overflow as sprintf in buildid_show will
read and copy till it finds a NUL.
00000000 f4 91 51 f4 dd 38 9e 9d 65 47 52 eb 10 71 db 50 |..Q..8..eGR..q.P|
00000010 b9 a8 01 42 6f 2e 32 |...Bo.2|
00000017
So use a memcpy instead of sprintf to have the correct value:
00000000 f4 91 51 f4 dd 00 9e 9d 65 47 52 eb 10 71 db 50 |..Q.....eGR..q.P|
00000010 b9 a8 01 42 |...B|
00000014
(the above have a hack to embed a zero inside and check it's
returned correctly).
This is XSA-485 / CVE-2026-31786
Fixes: 84b7625728ea ("xen: add sysfs node for hypervisor build id")
Signed-off-by: Frediano Ziglio <frediano.ziglio@citrix.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
diff --git a/drivers/xen/sys-hypervisor.c b/drivers/xen/sys-hypervisor.c
index b1bb01ba82f8..91923242a5ae 100644
--- a/drivers/xen/sys-hypervisor.c
+++ b/drivers/xen/sys-hypervisor.c
@@ -366,6 +366,8 @@ static ssize_t buildid_show(struct hyp_sysfs_attr *attr, char *buffer)
ret = sprintf(buffer, "<denied>");
return ret;
}
+ if (ret > PAGE_SIZE)
+ return -ENOSPC;
buildid = kmalloc(sizeof(*buildid) + ret, GFP_KERNEL);
if (!buildid)
@@ -373,8 +375,10 @@ static ssize_t buildid_show(struct hyp_sysfs_attr *attr, char *buffer)
buildid->len = ret;
ret = HYPERVISOR_xen_version(XENVER_build_id, buildid);
- if (ret > 0)
- ret = sprintf(buffer, "%s", buildid->buf);
+ if (ret > 0) {
+ /* Build id is binary, not a string. */
+ memcpy(buffer, buildid->buf, ret);
+ }
kfree(buildid);
return ret;
--
2.50.1 (Apple Git-155)

View File

@ -0,0 +1,80 @@
From 75537b257b7125983cc0a54f0a2878d28677eb50 Mon Sep 17 00:00:00 2001
From: "Ewan D. Milne" <emilne@redhat.com>
Date: Mon, 18 May 2026 11:14:05 -0400
Subject: [PATCH] nvme: nvme-fc: move tagset removal to nvme_fc_delete_ctrl()
JIRA: https://redhat.atlassian.net/browse/RHEL-171725
Upstream Status: From upstream linux mainline
Now target is removed from nvme_fc_ctrl_free() which is the ctrl->ref
release handler. And even admin queue is unquiesced there, this way
is definitely wrong because the ctr->ref is grabbed when submitting
command.
And Marco observed that nvme_fc_ctrl_free() can be called from request
completion code path, and trigger kernel warning since request completes
from softirq context.
Fix the issue by moveing target removal into nvme_fc_delete_ctrl(),
which is also aligned with nvme-tcp and nvme-rdma.
Patch originally proposed by Ming Lei, then modified to move the tagset
removal down to after nvme_fc_delete_association() after further testing.
Cc: Marco Patalano <mpatalan@redhat.com>
Cc: Ewan Milne <emilne@redhat.com>
Cc: James Smart <james.smart@broadcom.com>
Cc: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Cc: stable@vger.kernel.org
Tested-by: Marco Patalano <mpatalan@redhat.com>
Reviewed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Ewan D. Milne <emilne@redhat.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
(cherry picked from commit ea3442efabd0aa3930c5bab73c3901ef38ef6ac3)
Signed-off-by: Ewan D. Milne <emilne@redhat.com>
diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
index bd6cbe35dace..3e500e87e30c 100644
--- a/drivers/nvme/host/fc.c
+++ b/drivers/nvme/host/fc.c
@@ -2354,17 +2354,11 @@ nvme_fc_ctrl_free(struct kref *ref)
container_of(ref, struct nvme_fc_ctrl, ref);
unsigned long flags;
- if (ctrl->ctrl.tagset)
- nvme_remove_io_tag_set(&ctrl->ctrl);
-
/* remove from rport list */
spin_lock_irqsave(&ctrl->rport->lock, flags);
list_del(&ctrl->ctrl_list);
spin_unlock_irqrestore(&ctrl->rport->lock, flags);
- nvme_unquiesce_admin_queue(&ctrl->ctrl);
- nvme_remove_admin_tag_set(&ctrl->ctrl);
-
kfree(ctrl->queues);
put_device(ctrl->dev);
@@ -3252,11 +3246,18 @@ nvme_fc_delete_ctrl(struct nvme_ctrl *nctrl)
cancel_work_sync(&ctrl->ioerr_work);
cancel_delayed_work_sync(&ctrl->connect_work);
+
/*
* kill the association on the link side. this will block
* waiting for io to terminate
*/
nvme_fc_delete_association(ctrl);
+
+ if (ctrl->ctrl.tagset)
+ nvme_remove_io_tag_set(&ctrl->ctrl);
+
+ nvme_unquiesce_admin_queue(&ctrl->ctrl);
+ nvme_remove_admin_tag_set(&ctrl->ctrl);
}
static void
--
2.50.1 (Apple Git-155)

View File

@ -0,0 +1,93 @@
From 08fc4018856d6e04b7517e1ea515f06e86a05128 Mon Sep 17 00:00:00 2001
From: "Ewan D. Milne" <emilne@redhat.com>
Date: Mon, 18 May 2026 11:19:16 -0400
Subject: [PATCH] nvme: nvme-fc: Ensure ->ioerr_work is cancelled in
nvme_fc_delete_ctrl()
JIRA: https://redhat.atlassian.net/browse/RHEL-171725
Upstream Status: From upstream linux mainline
nvme_fc_delete_assocation() waits for pending I/O to complete before
returning, and an error can cause ->ioerr_work to be queued after
cancel_work_sync() had been called. Move the call to cancel_work_sync() to
be after nvme_fc_delete_association() to ensure ->ioerr_work is not running
when the nvme_fc_ctrl object is freed. Otherwise the following can occur:
[ 1135.911754] list_del corruption, ff2d24c8093f31f8->next is NULL
[ 1135.917705] ------------[ cut here ]------------
[ 1135.922336] kernel BUG at lib/list_debug.c:52!
[ 1135.926784] Oops: invalid opcode: 0000 [#1] SMP NOPTI
[ 1135.931851] CPU: 48 UID: 0 PID: 726 Comm: kworker/u449:23 Kdump: loaded Not tainted 6.12.0 #1 PREEMPT(voluntary)
[ 1135.943490] Hardware name: Dell Inc. PowerEdge R660/0HGTK9, BIOS 2.5.4 01/16/2025
[ 1135.950969] Workqueue: 0x0 (nvme-wq)
[ 1135.954673] RIP: 0010:__list_del_entry_valid_or_report.cold+0xf/0x6f
[ 1135.961041] Code: c7 c7 98 68 72 94 e8 26 45 fe ff 0f 0b 48 c7 c7 70 68 72 94 e8 18 45 fe ff 0f 0b 48 89 fe 48 c7 c7 80 69 72 94 e8 07 45 fe ff <0f> 0b 48 89 d1 48 c7 c7 a0 6a 72 94 48 89 c2 e8 f3 44 fe ff 0f 0b
[ 1135.979788] RSP: 0018:ff579b19482d3e50 EFLAGS: 00010046
[ 1135.985015] RAX: 0000000000000033 RBX: ff2d24c8093f31f0 RCX: 0000000000000000
[ 1135.992148] RDX: 0000000000000000 RSI: ff2d24d6bfa1d0c0 RDI: ff2d24d6bfa1d0c0
[ 1135.999278] RBP: ff2d24c8093f31f8 R08: 0000000000000000 R09: ffffffff951e2b08
[ 1136.006413] R10: ffffffff95122ac8 R11: 0000000000000003 R12: ff2d24c78697c100
[ 1136.013546] R13: fffffffffffffff8 R14: 0000000000000000 R15: ff2d24c78697c0c0
[ 1136.020677] FS: 0000000000000000(0000) GS:ff2d24d6bfa00000(0000) knlGS:0000000000000000
[ 1136.028765] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1136.034510] CR2: 00007fd207f90b80 CR3: 000000163ea22003 CR4: 0000000000f73ef0
[ 1136.041641] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1136.048776] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
[ 1136.055910] PKRU: 55555554
[ 1136.058623] Call Trace:
[ 1136.061074] <TASK>
[ 1136.063179] ? show_trace_log_lvl+0x1b0/0x2f0
[ 1136.067540] ? show_trace_log_lvl+0x1b0/0x2f0
[ 1136.071898] ? move_linked_works+0x4a/0xa0
[ 1136.075998] ? __list_del_entry_valid_or_report.cold+0xf/0x6f
[ 1136.081744] ? __die_body.cold+0x8/0x12
[ 1136.085584] ? die+0x2e/0x50
[ 1136.088469] ? do_trap+0xca/0x110
[ 1136.091789] ? do_error_trap+0x65/0x80
[ 1136.095543] ? __list_del_entry_valid_or_report.cold+0xf/0x6f
[ 1136.101289] ? exc_invalid_op+0x50/0x70
[ 1136.105127] ? __list_del_entry_valid_or_report.cold+0xf/0x6f
[ 1136.110874] ? asm_exc_invalid_op+0x1a/0x20
[ 1136.115059] ? __list_del_entry_valid_or_report.cold+0xf/0x6f
[ 1136.120806] move_linked_works+0x4a/0xa0
[ 1136.124733] worker_thread+0x216/0x3a0
[ 1136.128485] ? __pfx_worker_thread+0x10/0x10
[ 1136.132758] kthread+0xfa/0x240
[ 1136.135904] ? __pfx_kthread+0x10/0x10
[ 1136.139657] ret_from_fork+0x31/0x50
[ 1136.143236] ? __pfx_kthread+0x10/0x10
[ 1136.146988] ret_from_fork_asm+0x1a/0x30
[ 1136.150915] </TASK>
Fixes: 19fce0470f05 ("nvme-fc: avoid calling _nvme_fc_abort_outstanding_ios from interrupt context")
Cc: stable@vger.kernel.org
Tested-by: Marco Patalano <mpatalan@redhat.com>
Reviewed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Ewan D. Milne <emilne@redhat.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
(cherry picked from commit 0a2c5495b6d1ecb0fa18ef6631450f391a888256)
Signed-off-by: Ewan D. Milne <emilne@redhat.com>
diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
index 3e500e87e30c..6cc7d11ad5c0 100644
--- a/drivers/nvme/host/fc.c
+++ b/drivers/nvme/host/fc.c
@@ -3244,7 +3244,6 @@ nvme_fc_delete_ctrl(struct nvme_ctrl *nctrl)
{
struct nvme_fc_ctrl *ctrl = to_fc_ctrl(nctrl);
- cancel_work_sync(&ctrl->ioerr_work);
cancel_delayed_work_sync(&ctrl->connect_work);
/*
@@ -3252,6 +3251,7 @@ nvme_fc_delete_ctrl(struct nvme_ctrl *nctrl)
* waiting for io to terminate
*/
nvme_fc_delete_association(ctrl);
+ cancel_work_sync(&ctrl->ioerr_work);
if (ctrl->ctrl.tagset)
nvme_remove_io_tag_set(&ctrl->ctrl);
--
2.50.1 (Apple Git-155)

View File

@ -0,0 +1,58 @@
From e88ced2e3c3091122785c0a2dd822b61d1839d58 Mon Sep 17 00:00:00 2001
From: Mete Durlu <mdurlu@redhat.com>
Date: Fri, 27 Mar 2026 13:14:31 +0100
Subject: [PATCH] s390/dasd: Fix gendisk parent after copy pair swap
JIRA: https://issues.redhat.com/browse/RHEL-161530
commit c943bfc6afb8d0e781b9b7406f36caa8bbf95cb9
Author: Stefan Haberland <sth@linux.ibm.com>
Date: Wed Nov 26 17:06:31 2025 +0100
s390/dasd: Fix gendisk parent after copy pair swap
After a copy pair swap the block device's "device" symlink points to
the secondary CCW device, but the gendisk's parent remained the
primary, leaving /sys/block/<dasdx> under the wrong parent.
Move the gendisk to the secondary's device with device_move(), keeping
the sysfs topology consistent after the swap.
Fixes: 413862caad6f ("s390/dasd: add copy pair swap capability")
Cc: stable@vger.kernel.org #6.1
Reviewed-by: Jan Hoeppner <hoeppner@linux.ibm.com>
Signed-off-by: Stefan Haberland <sth@linux.ibm.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Stefan Haberland <sth@linux.ibm.com>
Signed-off-by: Mete Durlu <mdurlu@redhat.com>
diff --git a/drivers/s390/block/dasd_eckd.c b/drivers/s390/block/dasd_eckd.c
index c60424afaf04..7e8679d7c686 100644
--- a/drivers/s390/block/dasd_eckd.c
+++ b/drivers/s390/block/dasd_eckd.c
@@ -6148,6 +6148,7 @@ static int dasd_eckd_copy_pair_swap(struct dasd_device *device, char *prim_busid
struct dasd_copy_relation *copy;
struct dasd_block *block;
struct gendisk *gdp;
+ int rc;
copy = device->copy;
if (!copy)
@@ -6182,6 +6183,13 @@ static int dasd_eckd_copy_pair_swap(struct dasd_device *device, char *prim_busid
/* swap blocklayer device link */
gdp = block->gdp;
dasd_add_link_to_gendisk(gdp, secondary);
+ rc = device_move(disk_to_dev(gdp), &secondary->cdev->dev, DPM_ORDER_NONE);
+ if (rc) {
+ dev_err(&primary->cdev->dev,
+ "copy_pair_swap: moving blockdevice parent %s->%s failed (%d)\n",
+ dev_name(&primary->cdev->dev),
+ dev_name(&secondary->cdev->dev), rc);
+ }
/* re-enable device */
dasd_device_remove_stop_bits(primary, DASD_STOPPED_PPRC);
--
2.50.1 (Apple Git-155)

View File

@ -0,0 +1,54 @@
From 02a659e928f9ef15fc673384e95def0b088c9684 Mon Sep 17 00:00:00 2001
From: Mete Durlu <mdurlu@redhat.com>
Date: Fri, 27 Mar 2026 13:14:33 +0100
Subject: [PATCH] s390/dasd: Move quiesce state with pprc swap
JIRA: https://issues.redhat.com/browse/RHEL-161530
commit 40e9cd4ae8ec43b107ed2bff422a8fa39dcf4e4b
Author: Stefan Haberland <sth@linux.ibm.com>
Date: Tue Mar 10 15:23:29 2026 +0100
s390/dasd: Move quiesce state with pprc swap
Quiesce and resume is a mechanism to suspend operations on DASD devices.
In the context of a controlled copy pair swap operation, the quiesce
operation is usually issued before the actual swap and a resume
afterwards.
During the swap operation, the underlying device is exchanged. Therefore,
the quiesce flag must be moved to the secondary device to ensure a
consistent quiesce state after the swap.
The secondary device itself cannot be suspended separately because there
is no separate block device representation for it.
Fixes: 413862caad6f ("s390/dasd: add copy pair swap capability")
Cc: stable@vger.kernel.org #6.1
Reviewed-by: Jan Hoeppner <hoeppner@linux.ibm.com>
Signed-off-by: Stefan Haberland <sth@linux.ibm.com>
Link: https://patch.msgid.link/20260310142330.4080106-2-sth@linux.ibm.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Stefan Haberland <sth@linux.ibm.com>
Signed-off-by: Mete Durlu <mdurlu@redhat.com>
diff --git a/drivers/s390/block/dasd_eckd.c b/drivers/s390/block/dasd_eckd.c
index 7e8679d7c686..f53791c9cbe7 100644
--- a/drivers/s390/block/dasd_eckd.c
+++ b/drivers/s390/block/dasd_eckd.c
@@ -6191,6 +6191,11 @@ static int dasd_eckd_copy_pair_swap(struct dasd_device *device, char *prim_busid
dev_name(&secondary->cdev->dev), rc);
}
+ if (primary->stopped & DASD_STOPPED_QUIESCE) {
+ dasd_device_set_stop_bits(secondary, DASD_STOPPED_QUIESCE);
+ dasd_device_remove_stop_bits(primary, DASD_STOPPED_QUIESCE);
+ }
+
/* re-enable device */
dasd_device_remove_stop_bits(primary, DASD_STOPPED_PPRC);
dasd_device_remove_stop_bits(secondary, DASD_STOPPED_PPRC);
--
2.50.1 (Apple Git-155)

View File

@ -0,0 +1,83 @@
From 85945142a2a0c5d6a104b9d86eab6648a023765d Mon Sep 17 00:00:00 2001
From: Mete Durlu <mdurlu@redhat.com>
Date: Fri, 27 Mar 2026 13:14:35 +0100
Subject: [PATCH] s390/dasd: Copy detected format information to secondary
device
JIRA: https://issues.redhat.com/browse/RHEL-161530
commit 4c527c7e030672efd788d0806d7a68972a7ba3c1
Author: Stefan Haberland <sth@linux.ibm.com>
Date: Tue Mar 10 15:23:30 2026 +0100
s390/dasd: Copy detected format information to secondary device
During online processing for a DASD device an IO operation is started to
determine the format of the device. CDL format contains specifically
sized blocks at the beginning of the disk.
For a PPRC secondary device no real IO operation is possible therefore
this IO request can not be started and this step is skipped for online
processing of secondary devices. This is generally fine since the
secondary is a copy of the primary device.
In case of an additional partition detection that is run after a swap
operation the format information is needed to properly drive partition
detection IO.
Currently the information is not passed leading to IO errors during
partition detection and a wrongly detected partition table which in turn
might lead to data corruption on the disk with the wrong partition table.
Fix by passing the format information from primary to secondary device.
Fixes: 413862caad6f ("s390/dasd: add copy pair swap capability")
Cc: stable@vger.kernel.org #6.1
Reviewed-by: Jan Hoeppner <hoeppner@linux.ibm.com>
Acked-by: Eduard Shishkin <edward6@linux.ibm.com>
Signed-off-by: Stefan Haberland <sth@linux.ibm.com>
Link: https://patch.msgid.link/20260310142330.4080106-3-sth@linux.ibm.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Stefan Haberland <sth@linux.ibm.com>
Signed-off-by: Mete Durlu <mdurlu@redhat.com>
diff --git a/drivers/s390/block/dasd_eckd.c b/drivers/s390/block/dasd_eckd.c
index f53791c9cbe7..54d6d29477e4 100644
--- a/drivers/s390/block/dasd_eckd.c
+++ b/drivers/s390/block/dasd_eckd.c
@@ -6144,6 +6144,7 @@ static void copy_pair_set_active(struct dasd_copy_relation *copy, char *new_busi
static int dasd_eckd_copy_pair_swap(struct dasd_device *device, char *prim_busid,
char *sec_busid)
{
+ struct dasd_eckd_private *prim_priv, *sec_priv;
struct dasd_device *primary, *secondary;
struct dasd_copy_relation *copy;
struct dasd_block *block;
@@ -6164,6 +6165,9 @@ static int dasd_eckd_copy_pair_swap(struct dasd_device *device, char *prim_busid
if (!secondary)
return DASD_COPYPAIRSWAP_SECONDARY;
+ prim_priv = primary->private;
+ sec_priv = secondary->private;
+
/*
* usually the device should be quiesced for swap
* for paranoia stop device and requeue requests again
@@ -6196,6 +6200,13 @@ static int dasd_eckd_copy_pair_swap(struct dasd_device *device, char *prim_busid
dasd_device_remove_stop_bits(primary, DASD_STOPPED_QUIESCE);
}
+ /*
+ * The secondary device never got through format detection, but since it
+ * is a copy of the primary device, the format is exactly the same;
+ * therefore, the detected layout can simply be copied.
+ */
+ sec_priv->uses_cdl = prim_priv->uses_cdl;
+
/* re-enable device */
dasd_device_remove_stop_bits(primary, DASD_STOPPED_PPRC);
dasd_device_remove_stop_bits(secondary, DASD_STOPPED_PPRC);
--
2.50.1 (Apple Git-155)

View File

@ -176,13 +176,13 @@ Summary: The Linux kernel
# define buildid .local
%define specversion 5.14.0
%define patchversion 5.14
%define pkgrelease 687.5.1
%define pkgrelease 687.13.1
%define kversion 5
%define tarfile_release 5.14.0-687.5.1.el9_8
# This is needed to do merge window version magic
%define patchlevel 14
# This allows pkg_release to have configurable %%{?dist} tag
%define specrelease 687.12.1%{?buildid}%{?dist}
%define specrelease 687.13.1%{?buildid}%{?dist}
# This defines the kabi tarball version
%define kabiversion 5.14.0-687.5.1.el9_8
@ -1130,6 +1130,23 @@ Patch1251: 1251-netfilter-xt-tcpmss-check-remaining-length-before-reading-op.pat
Patch1252: 1252-dm-thin-fix-metadata-refcount-underflow.patch
Patch11111: ppc64le-kvm-support.patch
Patch1253: 1253-mm-document-gfp-nofail-must-be-blockable.patch
Patch1254: 1254-mm-warn-about-illegal-gfp-nofail-usage-in-a-more-appropriate.patch
Patch1255: 1255-mm-page-alloc-c-avoid-infinite-retries-caused-by-cpuset-race.patch
Patch1256: 1256-mm-page-alloc-thp-prevent-reclaim-for-gfp-thisnode-thp-alloc.patch
Patch1257: 1257-mm-page-alloc-ignore-the-exact-initial-compaction-result.patch
Patch1258: 1258-mm-page-alloc-refactor-the-initial-compaction-handling.patch
Patch1259: 1259-mm-page-alloc-simplify-alloc-pages-slowpath-flow.patch
Patch1260: 1260-mm-page-alloc-add-vm-thp-thisnode-reclaim-sysctl-to-allow-th.patch
Patch1261: 1261-smb-client-fix-oob-reads-parsing-symlink-error-response.patch
Patch1262: 1262-crypto-authenc-fix-sleep-in-atomic-context-in-decrypt-tail.patch
Patch1263: 1263-crypto-authenc-correctly-pass-einprogress-back-up-to-the-cal.patch
Patch1264: 1264-buffer-overflow-in-drivers-xen-sys-hypervisor-c.patch
Patch1265: 1265-nvme-nvme-fc-move-tagset-removal-to-nvme-fc-delete-ctrl.patch
Patch1266: 1266-nvme-nvme-fc-ensure-ioerr-work-is-cancelled-in-nvme-fc-delet.patch
Patch1267: 1267-s390-dasd-fix-gendisk-parent-after-copy-pair-swap.patch
Patch1268: 1268-s390-dasd-move-quiesce-state-with-pprc-swap.patch
Patch1269: 1269-s390-dasd-copy-detected-format-information-to-secondary-devi.patch
# END OF PATCH DEFINITIONS
%description
@ -2027,6 +2044,23 @@ ApplyPatch 1249-bluetooth-sco-fix-race-conditions-in-sco-sock-connect.patch
ApplyPatch 1250-wifi-brcmfmac-validate-bsscfg-indices-in-if-events.patch
ApplyPatch 1251-netfilter-xt-tcpmss-check-remaining-length-before-reading-op.patch
ApplyPatch 1252-dm-thin-fix-metadata-refcount-underflow.patch
ApplyPatch 1253-mm-document-gfp-nofail-must-be-blockable.patch
ApplyPatch 1254-mm-warn-about-illegal-gfp-nofail-usage-in-a-more-appropriate.patch
ApplyPatch 1255-mm-page-alloc-c-avoid-infinite-retries-caused-by-cpuset-race.patch
ApplyPatch 1256-mm-page-alloc-thp-prevent-reclaim-for-gfp-thisnode-thp-alloc.patch
ApplyPatch 1257-mm-page-alloc-ignore-the-exact-initial-compaction-result.patch
ApplyPatch 1258-mm-page-alloc-refactor-the-initial-compaction-handling.patch
ApplyPatch 1259-mm-page-alloc-simplify-alloc-pages-slowpath-flow.patch
ApplyPatch 1260-mm-page-alloc-add-vm-thp-thisnode-reclaim-sysctl-to-allow-th.patch
ApplyPatch 1261-smb-client-fix-oob-reads-parsing-symlink-error-response.patch
ApplyPatch 1262-crypto-authenc-fix-sleep-in-atomic-context-in-decrypt-tail.patch
ApplyPatch 1263-crypto-authenc-correctly-pass-einprogress-back-up-to-the-cal.patch
ApplyPatch 1264-buffer-overflow-in-drivers-xen-sys-hypervisor-c.patch
ApplyPatch 1265-nvme-nvme-fc-move-tagset-removal-to-nvme-fc-delete-ctrl.patch
ApplyPatch 1266-nvme-nvme-fc-ensure-ioerr-work-is-cancelled-in-nvme-fc-delet.patch
ApplyPatch 1267-s390-dasd-fix-gendisk-parent-after-copy-pair-swap.patch
ApplyPatch 1268-s390-dasd-move-quiesce-state-with-pprc-swap.patch
ApplyPatch 1269-s390-dasd-copy-detected-format-information-to-secondary-devi.patch
# END OF PATCH APPLICATIONS
# Any further pre-build tree manipulations happen here.
@ -4101,6 +4135,30 @@ fi
#
#
%changelog
* Wed Jun 11 2026 Andrew Lukoshko <alukoshko@almalinux.org> - 5.14.0-687.13.1
- Recreate RHEL 5.14.0-687.13.1 from CentOS Stream 9 and upstream stable backports (1253-1269)
- RHEL changelog for 687.13.1 follows:
* Tue Jun 02 2026 CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com> [5.14.0-687.13.1.el9_8]
- smb: client: reject userspace cifs.spnego descriptions (Paulo Alcantara) [RHEL-178944] {CVE-2026-46243}
- s390/dasd: Copy detected format information to secondary device (Ramesh Chhetri) [RHEL-176472]
- s390/dasd: Move quiesce state with pprc swap (Ramesh Chhetri) [RHEL-176472]
- s390/dasd: Fix gendisk parent after copy pair swap (Ramesh Chhetri) [RHEL-176472]
- nvme: nvme-fc: Ensure ->ioerr_work is cancelled in nvme_fc_delete_ctrl() (Ewan D. Milne) [RHEL-171745]
- nvme: nvme-fc: move tagset removal to nvme_fc_delete_ctrl() (Ewan D. Milne) [RHEL-171745]
- Buffer overflow in drivers/xen/sys-hypervisor.c (Vitaly Kuznetsov) [RHEL-172510] {CVE-2026-31786}
- crypto: authenc - Correctly pass EINPROGRESS back up to the caller (Vladislav Dronov) [RHEL-172167]
- crypto: authenc - Fix sleep in atomic context in decrypt_tail (Vladislav Dronov) [RHEL-172167]
- smb: client: fix OOB reads parsing symlink error response (CKI Backport Bot) [RHEL-171471] {CVE-2026-31613}
- mm/page_alloc: add vm.thp_thisnode_reclaim sysctl to allow THP reclaim on local node (Nico Pache) [RHEL-164778]
- mm/page_alloc: simplify __alloc_pages_slowpath() flow (Nico Pache) [RHEL-164778]
- mm/page_alloc: refactor the initial compaction handling (Nico Pache) [RHEL-164778]
- mm/page_alloc: ignore the exact initial compaction result (Nico Pache) [RHEL-164778]
- mm, page_alloc, thp: prevent reclaim for __GFP_THISNODE THP allocations (Nico Pache) [RHEL-164778]
- mm/page_alloc.c: avoid infinite retries caused by cpuset race (Nico Pache) [RHEL-164778]
- mm: warn about illegal __GFP_NOFAIL usage in a more appropriate location and manner (Nico Pache) [RHEL-164778]
- mm: document __GFP_NOFAIL must be blockable (Nico Pache) [RHEL-164778]
* Sun Jun 07 2026 Andrew Lukoshko <alukoshko@almalinux.org> - 5.14.0-687.12.1
- Recreate RHEL 5.14.0-687.12.1 from CentOS Stream 9 and upstream stable
backports (SOURCES/1198-1252)