Recreate RHEL 5.14.0-687.13.1 from CS9/upstream backports
Add the RHEL 687.13.1 backports (1253-1269) from centos-stream-9 and upstream stable, on top of 687.12.1. RHEL now ships the smb cifs.spnego fix (CVE-2026-46243) too. Bump pkgrelease and specrelease to 687.13.1.
This commit is contained in:
parent
cd9793b5a4
commit
60fd7c7780
105
SOURCES/1253-mm-document-gfp-nofail-must-be-blockable.patch
Normal file
105
SOURCES/1253-mm-document-gfp-nofail-must-be-blockable.patch
Normal file
@ -0,0 +1,105 @@
|
||||
From dbb0b8ec49fcd597c406f3b17f28b588e96cfa14 Mon Sep 17 00:00:00 2001
|
||||
From: Nico Pache <npache@redhat.com>
|
||||
Date: Sat, 4 Apr 2026 19:30:21 -0600
|
||||
Subject: [PATCH] mm: document __GFP_NOFAIL must be blockable
|
||||
MIME-Version: 1.0
|
||||
Content-Type: text/plain; charset=UTF-8
|
||||
Content-Transfer-Encoding: 8bit
|
||||
|
||||
commit 17d75422604f0b92869aa17cb44f60958212f033
|
||||
Author: Barry Song <v-songbaohua@oppo.com>
|
||||
Date: Sat Aug 31 08:28:22 2024 +1200
|
||||
|
||||
mm: document __GFP_NOFAIL must be blockable
|
||||
|
||||
Non-blocking allocation with __GFP_NOFAIL is not supported and may still
|
||||
result in NULL pointers (if we don't return NULL, we result in busy-loop
|
||||
within non-sleepable contexts):
|
||||
|
||||
static inline struct page *
|
||||
__alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
|
||||
struct alloc_context *ac)
|
||||
{
|
||||
...
|
||||
/*
|
||||
* Make sure that __GFP_NOFAIL request doesn't leak out and make sure
|
||||
* we always retry
|
||||
*/
|
||||
if (gfp_mask & __GFP_NOFAIL) {
|
||||
/*
|
||||
* All existing users of the __GFP_NOFAIL are blockable, so warn
|
||||
* of any new users that actually require GFP_NOWAIT
|
||||
*/
|
||||
if (WARN_ON_ONCE_GFP(!can_direct_reclaim, gfp_mask))
|
||||
goto fail;
|
||||
...
|
||||
}
|
||||
...
|
||||
fail:
|
||||
warn_alloc(gfp_mask, ac->nodemask,
|
||||
"page allocation failure: order:%u", order);
|
||||
got_pg:
|
||||
return page;
|
||||
}
|
||||
|
||||
Highlight this in the documentation of __GFP_NOFAIL so that non-mm
|
||||
subsystems can reject any illegal usage of __GFP_NOFAIL with GFP_ATOMIC,
|
||||
GFP_NOWAIT, etc.
|
||||
|
||||
Link: https://lkml.kernel.org/r/20240830202823.21478-3-21cnbao@gmail.com
|
||||
Signed-off-by: Barry Song <v-songbaohua@oppo.com>
|
||||
Acked-by: Michal Hocko <mhocko@suse.com>
|
||||
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
||||
Acked-by: Vlastimil Babka <vbabka@suse.cz>
|
||||
Acked-by: Davidlohr Bueso <dave@stgolabs.net>
|
||||
Acked-by: David Hildenbrand <david@redhat.com>
|
||||
Cc: Christoph Lameter <cl@linux.com>
|
||||
Cc: David Rientjes <rientjes@google.com>
|
||||
Cc: "Eugenio Pérez" <eperezma@redhat.com>
|
||||
Cc: Hailong.Liu <hailong.liu@oppo.com>
|
||||
Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com>
|
||||
Cc: Jason Wang <jasowang@redhat.com>
|
||||
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
|
||||
Cc: Kees Cook <kees@kernel.org>
|
||||
Cc: Linus Torvalds <torvalds@linux-foundation.org>
|
||||
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
|
||||
Cc: Maxime Coquelin <maxime.coquelin@redhat.com>
|
||||
Cc: "Michael S. Tsirkin" <mst@redhat.com>
|
||||
Cc: Pekka Enberg <penberg@kernel.org>
|
||||
Cc: Roman Gushchin <roman.gushchin@linux.dev>
|
||||
Cc: Uladzislau Rezki (Sony) <urezki@gmail.com>
|
||||
Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
|
||||
Cc: Christoph Hellwig <hch@infradead.org>
|
||||
Cc: Xie Yongji <xieyongji@bytedance.com>
|
||||
Cc: Yafang Shao <laoar.shao@gmail.com>
|
||||
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
||||
|
||||
JIRA: https://redhat.atlassian.net/browse/RHEL-148561
|
||||
Signed-off-by: Nico Pache <npache@redhat.com>
|
||||
|
||||
diff --git a/include/linux/gfp_types.h b/include/linux/gfp_types.h
|
||||
index 6583a58670c5..373d3871f61e 100644
|
||||
--- a/include/linux/gfp_types.h
|
||||
+++ b/include/linux/gfp_types.h
|
||||
@@ -168,7 +168,8 @@ typedef unsigned int __bitwise gfp_t;
|
||||
* the caller still has to check for failures) while costly requests try to be
|
||||
* not disruptive and back off even without invoking the OOM killer.
|
||||
* The following three modifiers might be used to override some of these
|
||||
- * implicit rules
|
||||
+ * implicit rules. Please note that all of them must be used along with
|
||||
+ * %__GFP_DIRECT_RECLAIM flag.
|
||||
*
|
||||
* %__GFP_NORETRY: The VM implementation will try only very lightweight
|
||||
* memory direct reclaim to get some memory under memory pressure (thus
|
||||
@@ -199,6 +200,8 @@ typedef unsigned int __bitwise gfp_t;
|
||||
* cannot handle allocation failures. The allocation could block
|
||||
* indefinitely but will never return with failure. Testing for
|
||||
* failure is pointless.
|
||||
+ * It _must_ be blockable and used together with __GFP_DIRECT_RECLAIM.
|
||||
+ * It should _never_ be used in non-sleepable contexts.
|
||||
* New users should be evaluated carefully (and the flag should be
|
||||
* used only when there is no reasonable failure policy) but it is
|
||||
* definitely preferable to use the flag rather than opencode endless
|
||||
--
|
||||
2.50.1 (Apple Git-155)
|
||||
|
||||
@ -0,0 +1,169 @@
|
||||
From e7842dda471d377ae8c6aaf9ddb4a73159f505b4 Mon Sep 17 00:00:00 2001
|
||||
From: Nico Pache <npache@redhat.com>
|
||||
Date: Sat, 4 Apr 2026 19:30:21 -0600
|
||||
Subject: [PATCH] mm: warn about illegal __GFP_NOFAIL usage in a more
|
||||
appropriate location and manner
|
||||
MIME-Version: 1.0
|
||||
Content-Type: text/plain; charset=UTF-8
|
||||
Content-Transfer-Encoding: 8bit
|
||||
|
||||
commit 903edea6c53f097f5f0c847fdbbfab0c6c44f241
|
||||
Author: Barry Song <v-songbaohua@oppo.com>
|
||||
Date: Sat Aug 31 08:28:23 2024 +1200
|
||||
|
||||
mm: warn about illegal __GFP_NOFAIL usage in a more appropriate location and manner
|
||||
|
||||
Three points for this change:
|
||||
|
||||
1. We should consolidate all warnings in one place. Currently, the
|
||||
order > 1 warning is in the hotpath, while others are in less
|
||||
likely scenarios. Moving all warnings to the slowpath will reduce
|
||||
the overhead for order > 1 and increase the visibility of other
|
||||
warnings.
|
||||
|
||||
2. We currently have two warnings for order: one for order > 1 in
|
||||
the hotpath and another for order > costly_order in the laziest
|
||||
path. I suggest standardizing on order > 1 since it's been in
|
||||
use for a long time.
|
||||
|
||||
3. We don't need to check for __GFP_NOWARN in this case. __GFP_NOWARN
|
||||
is meant to suppress allocation failure reports, but here we're
|
||||
dealing with bug detection, not allocation failures. So replace
|
||||
WARN_ON_ONCE_GFP by WARN_ON_ONCE.
|
||||
|
||||
[v-songbaohua@oppo.com: also update the doc for __GFP_NOFAIL with order > 1]
|
||||
Link: https://lkml.kernel.org/r/20240903223935.1697-1-21cnbao@gmail.com
|
||||
Link: https://lkml.kernel.org/r/20240830202823.21478-4-21cnbao@gmail.com
|
||||
Signed-off-by: Barry Song <v-songbaohua@oppo.com>
|
||||
Suggested-by: Vlastimil Babka <vbabka@suse.cz>
|
||||
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
|
||||
Acked-by: David Hildenbrand <david@redhat.com>
|
||||
Acked-by: Michal Hocko <mhocko@suse.com>
|
||||
Cc: Christoph Hellwig <hch@lst.de>
|
||||
Cc: Christoph Lameter <cl@linux.com>
|
||||
Cc: Davidlohr Bueso <dave@stgolabs.net>
|
||||
Cc: David Rientjes <rientjes@google.com>
|
||||
Cc: "Eugenio Pérez" <eperezma@redhat.com>
|
||||
Cc: Hailong.Liu <hailong.liu@oppo.com>
|
||||
Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com>
|
||||
Cc: Jason Wang <jasowang@redhat.com>
|
||||
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
|
||||
Cc: Kees Cook <kees@kernel.org>
|
||||
Cc: Linus Torvalds <torvalds@linux-foundation.org>
|
||||
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
|
||||
Cc: Maxime Coquelin <maxime.coquelin@redhat.com>
|
||||
Cc: "Michael S. Tsirkin" <mst@redhat.com>
|
||||
Cc: Pekka Enberg <penberg@kernel.org>
|
||||
Cc: Roman Gushchin <roman.gushchin@linux.dev>
|
||||
Cc: Uladzislau Rezki (Sony) <urezki@gmail.com>
|
||||
Cc: Xie Yongji <xieyongji@bytedance.com>
|
||||
Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
|
||||
Cc: Yafang Shao <laoar.shao@gmail.com>
|
||||
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
||||
|
||||
JIRA: https://redhat.atlassian.net/browse/RHEL-148561
|
||||
Signed-off-by: Nico Pache <npache@redhat.com>
|
||||
|
||||
diff --git a/include/linux/gfp_types.h b/include/linux/gfp_types.h
|
||||
index 373d3871f61e..359ed69b14d9 100644
|
||||
--- a/include/linux/gfp_types.h
|
||||
+++ b/include/linux/gfp_types.h
|
||||
@@ -206,7 +206,8 @@ typedef unsigned int __bitwise gfp_t;
|
||||
* used only when there is no reasonable failure policy) but it is
|
||||
* definitely preferable to use the flag rather than opencode endless
|
||||
* loop around allocator.
|
||||
- * Using this flag for costly allocations is _highly_ discouraged.
|
||||
+ * Allocating pages from the buddy with __GFP_NOFAIL and order > 1 is
|
||||
+ * not supported. Please consider using kvmalloc() instead.
|
||||
*/
|
||||
#define __GFP_IO ((__force gfp_t)___GFP_IO)
|
||||
#define __GFP_FS ((__force gfp_t)___GFP_FS)
|
||||
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
|
||||
index 4e8615398f07..28cb88a3a758 100644
|
||||
--- a/mm/page_alloc.c
|
||||
+++ b/mm/page_alloc.c
|
||||
@@ -2914,12 +2914,6 @@ struct page *rmqueue(struct zone *preferred_zone,
|
||||
{
|
||||
struct page *page;
|
||||
|
||||
- /*
|
||||
- * We most definitely don't want callers attempting to
|
||||
- * allocate greater than order-1 page units with __GFP_NOFAIL.
|
||||
- */
|
||||
- WARN_ON_ONCE((gfp_flags & __GFP_NOFAIL) && (order > 1));
|
||||
-
|
||||
if (likely(pcp_allowed_order(order))) {
|
||||
page = rmqueue_pcplist(preferred_zone, zone, order,
|
||||
migratetype, alloc_flags);
|
||||
@@ -4062,6 +4056,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
|
||||
{
|
||||
bool can_direct_reclaim = gfp_mask & __GFP_DIRECT_RECLAIM;
|
||||
bool can_compact = gfp_compaction_allowed(gfp_mask);
|
||||
+ bool nofail = gfp_mask & __GFP_NOFAIL;
|
||||
const bool costly_order = order > PAGE_ALLOC_COSTLY_ORDER;
|
||||
struct page *page = NULL;
|
||||
unsigned int alloc_flags;
|
||||
@@ -4074,6 +4069,25 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
|
||||
unsigned int zonelist_iter_cookie;
|
||||
int reserve_flags;
|
||||
|
||||
+ if (unlikely(nofail)) {
|
||||
+ /*
|
||||
+ * We most definitely don't want callers attempting to
|
||||
+ * allocate greater than order-1 page units with __GFP_NOFAIL.
|
||||
+ */
|
||||
+ WARN_ON_ONCE(order > 1);
|
||||
+ /*
|
||||
+ * Also we don't support __GFP_NOFAIL without __GFP_DIRECT_RECLAIM,
|
||||
+ * otherwise, we may result in lockup.
|
||||
+ */
|
||||
+ WARN_ON_ONCE(!can_direct_reclaim);
|
||||
+ /*
|
||||
+ * PF_MEMALLOC request from this context is rather bizarre
|
||||
+ * because we cannot reclaim anything and only can loop waiting
|
||||
+ * for somebody to do a work for us.
|
||||
+ */
|
||||
+ WARN_ON_ONCE(current->flags & PF_MEMALLOC);
|
||||
+ }
|
||||
+
|
||||
restart:
|
||||
compaction_retries = 0;
|
||||
no_progress_loops = 0;
|
||||
@@ -4291,29 +4305,15 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
|
||||
* Make sure that __GFP_NOFAIL request doesn't leak out and make sure
|
||||
* we always retry
|
||||
*/
|
||||
- if (gfp_mask & __GFP_NOFAIL) {
|
||||
+ if (unlikely(nofail)) {
|
||||
/*
|
||||
- * All existing users of the __GFP_NOFAIL are blockable, so warn
|
||||
- * of any new users that actually require GFP_NOWAIT
|
||||
+ * Lacking direct_reclaim we can't do anything to reclaim memory,
|
||||
+ * we disregard these unreasonable nofail requests and still
|
||||
+ * return NULL
|
||||
*/
|
||||
- if (WARN_ON_ONCE_GFP(!can_direct_reclaim, gfp_mask))
|
||||
+ if (!can_direct_reclaim)
|
||||
goto fail;
|
||||
|
||||
- /*
|
||||
- * PF_MEMALLOC request from this context is rather bizarre
|
||||
- * because we cannot reclaim anything and only can loop waiting
|
||||
- * for somebody to do a work for us
|
||||
- */
|
||||
- WARN_ON_ONCE_GFP(current->flags & PF_MEMALLOC, gfp_mask);
|
||||
-
|
||||
- /*
|
||||
- * non failing costly orders are a hard requirement which we
|
||||
- * are not prepared for much so let's warn about these users
|
||||
- * so that we can identify them and convert them to something
|
||||
- * else.
|
||||
- */
|
||||
- WARN_ON_ONCE_GFP(costly_order, gfp_mask);
|
||||
-
|
||||
/*
|
||||
* Help non-failing allocations by giving some access to memory
|
||||
* reserves normally used for high priority non-blocking
|
||||
--
|
||||
2.50.1 (Apple Git-155)
|
||||
|
||||
@ -0,0 +1,85 @@
|
||||
From 76329fc4d67ac1854f97337545e428e181b1cbe5 Mon Sep 17 00:00:00 2001
|
||||
From: Nico Pache <npache@redhat.com>
|
||||
Date: Sat, 4 Apr 2026 19:30:21 -0600
|
||||
Subject: [PATCH] mm/page_alloc.c: avoid infinite retries caused by cpuset race
|
||||
|
||||
commit e05741fb10c38d70bbd7ec12b23c197b6355d519
|
||||
Author: Tianyang Zhang <zhangtianyang@loongson.cn>
|
||||
Date: Wed Apr 16 16:24:05 2025 +0800
|
||||
|
||||
mm/page_alloc.c: avoid infinite retries caused by cpuset race
|
||||
|
||||
__alloc_pages_slowpath has no change detection for ac->nodemask in the
|
||||
part of retry path, while cpuset can modify it in parallel. For some
|
||||
processes that set mempolicy as MPOL_BIND, this results ac->nodemask
|
||||
changes, and then the should_reclaim_retry will judge based on the latest
|
||||
nodemask and jump to retry, while the get_page_from_freelist only
|
||||
traverses the zonelist from ac->preferred_zoneref, which selected by a
|
||||
expired nodemask and may cause infinite retries in some cases
|
||||
|
||||
cpu 64:
|
||||
__alloc_pages_slowpath {
|
||||
/* ..... */
|
||||
retry:
|
||||
/* ac->nodemask = 0x1, ac->preferred->zone->nid = 1 */
|
||||
if (alloc_flags & ALLOC_KSWAPD)
|
||||
wake_all_kswapds(order, gfp_mask, ac);
|
||||
/* cpu 1:
|
||||
cpuset_write_resmask
|
||||
update_nodemask
|
||||
update_nodemasks_hier
|
||||
update_tasks_nodemask
|
||||
mpol_rebind_task
|
||||
mpol_rebind_policy
|
||||
mpol_rebind_nodemask
|
||||
// mempolicy->nodes has been modified,
|
||||
// which ac->nodemask point to
|
||||
|
||||
*/
|
||||
/* ac->nodemask = 0x3, ac->preferred->zone->nid = 1 */
|
||||
if (should_reclaim_retry(gfp_mask, order, ac, alloc_flags,
|
||||
did_some_progress > 0, &no_progress_loops))
|
||||
goto retry;
|
||||
}
|
||||
|
||||
Simultaneously starting multiple cpuset01 from LTP can quickly reproduce
|
||||
this issue on a multi node server when the maximum memory pressure is
|
||||
reached and the swap is enabled
|
||||
|
||||
Link: https://lkml.kernel.org/r/20250416082405.20988-1-zhangtianyang@loongson.cn
|
||||
Fixes: c33d6c06f60f ("mm, page_alloc: avoid looking up the first zone in a zonelist twice")
|
||||
Signed-off-by: Tianyang Zhang <zhangtianyang@loongson.cn>
|
||||
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
|
||||
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
|
||||
Cc: Michal Hocko <mhocko@suse.com>
|
||||
Cc: Brendan Jackman <jackmanb@google.com>
|
||||
Cc: Johannes Weiner <hannes@cmpxchg.org>
|
||||
Cc: Zi Yan <ziy@nvidia.com>
|
||||
Cc: <stable@vger.kernel.org>
|
||||
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
||||
|
||||
JIRA: https://redhat.atlassian.net/browse/RHEL-148561
|
||||
Signed-off-by: Nico Pache <npache@redhat.com>
|
||||
|
||||
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
|
||||
index 28cb88a3a758..dc1a7637bf97 100644
|
||||
--- a/mm/page_alloc.c
|
||||
+++ b/mm/page_alloc.c
|
||||
@@ -4193,6 +4193,14 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
|
||||
}
|
||||
|
||||
retry:
|
||||
+ /*
|
||||
+ * Deal with possible cpuset update races or zonelist updates to avoid
|
||||
+ * infinite retries.
|
||||
+ */
|
||||
+ if (check_retry_cpuset(cpuset_mems_cookie, ac) ||
|
||||
+ check_retry_zonelist(zonelist_iter_cookie))
|
||||
+ goto restart;
|
||||
+
|
||||
/* Ensure kswapd doesn't accidentally go to sleep as long as we loop */
|
||||
if (alloc_flags & ALLOC_KSWAPD)
|
||||
wake_all_kswapds(order, gfp_mask, ac);
|
||||
--
|
||||
2.50.1 (Apple Git-155)
|
||||
|
||||
@ -0,0 +1,96 @@
|
||||
From 32e483909f993e18474e30c11e0397a82c570b6e Mon Sep 17 00:00:00 2001
|
||||
From: Nico Pache <npache@redhat.com>
|
||||
Date: Sat, 4 Apr 2026 19:30:21 -0600
|
||||
Subject: [PATCH] mm, page_alloc, thp: prevent reclaim for __GFP_THISNODE THP
|
||||
allocations
|
||||
|
||||
commit 9c9828d3ead69416d731b1238802af31760c823e
|
||||
Author: Vlastimil Babka <vbabka@suse.cz>
|
||||
Date: Fri Dec 19 17:31:57 2025 +0100
|
||||
|
||||
mm, page_alloc, thp: prevent reclaim for __GFP_THISNODE THP allocations
|
||||
|
||||
Since commit cc638f329ef6 ("mm, thp: tweak reclaim/compaction effort of
|
||||
local-only and all-node allocations"), THP page fault allocations have
|
||||
settled on the following scheme (from the commit log):
|
||||
|
||||
1. local node only THP allocation with no reclaim, just compaction.
|
||||
2. for madvised VMA's or when synchronous compaction is enabled always - THP
|
||||
allocation from any node with effort determined by global defrag setting
|
||||
and VMA madvise
|
||||
3. fallback to base pages on any node
|
||||
|
||||
Recent customer reports however revealed we have a gap in step 1 above.
|
||||
What we have seen is excessive reclaim due to THP page faults on a NUMA
|
||||
node that's close to its high watermark, while other nodes have plenty of
|
||||
free memory.
|
||||
|
||||
The problem with step 1 is that it promises no reclaim after the
|
||||
compaction attempt, however reclaim is only avoided for certain compaction
|
||||
outcomes (deferred, or skipped due to insufficient free base pages), and
|
||||
not e.g. when compaction is actually performed but fails (we did see
|
||||
compact_fail vmstat counter increasing).
|
||||
|
||||
THP page faults can therefore exhibit a zone_reclaim_mode-like behavior,
|
||||
which is not the intention.
|
||||
|
||||
Thus add a check for __GFP_THISNODE that corresponds to this exact
|
||||
situation and prevents continuing with reclaim/compaction once the initial
|
||||
compaction attempt isn't successful in allocating the page.
|
||||
|
||||
Note that commit cc638f329ef6 has not introduced this over-reclaim
|
||||
possibility; it appears to exist in some form since commit 2f0799a0ffc0
|
||||
("mm, thp: restore node-local hugepage allocations"). Followup commits
|
||||
b39d0ee2632d ("mm, page_alloc: avoid expensive reclaim when compaction may
|
||||
not succeed") and cc638f329ef6 have moved in the right direction, but left
|
||||
the abovementioned gap.
|
||||
|
||||
Link: https://lkml.kernel.org/r/20251219-costly-noretry-thisnode-fix-v1-1-e1085a4a0c34@suse.cz
|
||||
Fixes: 2f0799a0ffc0 ("mm, thp: restore node-local hugepage allocations")
|
||||
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
|
||||
Acked-by: Michal Hocko <mhocko@suse.com>
|
||||
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
|
||||
Acked-by: Pedro Falcato <pfalcato@suse.de>
|
||||
Acked-by: Zi Yan <ziy@nvidia.com>
|
||||
Cc: Brendan Jackman <jackmanb@google.com>
|
||||
Cc: "David Hildenbrand (Red Hat)" <david@kernel.org>
|
||||
Cc: David Rientjes <rientjes@google.com>
|
||||
Cc: Joshua Hahn <joshua.hahnjy@gmail.com>
|
||||
Cc: Liam Howlett <liam.howlett@oracle.com>
|
||||
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
|
||||
Cc: Mike Rapoport <rppt@kernel.org>
|
||||
Cc: Suren Baghdasaryan <surenb@google.com>
|
||||
Cc: <stable@vger.kernel.org>
|
||||
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
||||
|
||||
JIRA: https://redhat.atlassian.net/browse/RHEL-148561
|
||||
Signed-off-by: Nico Pache <npache@redhat.com>
|
||||
|
||||
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
|
||||
index dc1a7637bf97..ca0d42a95410 100644
|
||||
--- a/mm/page_alloc.c
|
||||
+++ b/mm/page_alloc.c
|
||||
@@ -4183,6 +4183,20 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
|
||||
compact_result == COMPACT_DEFERRED)
|
||||
goto nopage;
|
||||
|
||||
+ /*
|
||||
+ * THP page faults may attempt local node only first,
|
||||
+ * but are then allowed to only compact, not reclaim,
|
||||
+ * see alloc_pages_mpol().
|
||||
+ *
|
||||
+ * Compaction can fail for other reasons than those
|
||||
+ * checked above and we don't want such THP allocations
|
||||
+ * to put reclaim pressure on a single node in a
|
||||
+ * situation where other nodes might have plenty of
|
||||
+ * available memory.
|
||||
+ */
|
||||
+ if (gfp_mask & __GFP_THISNODE)
|
||||
+ goto nopage;
|
||||
+
|
||||
/*
|
||||
* Looks like reclaim/compaction is worth trying, but
|
||||
* sync compaction could be very expensive, so keep
|
||||
--
|
||||
2.50.1 (Apple Git-155)
|
||||
|
||||
@ -0,0 +1,122 @@
|
||||
From 37f7d4a6be45deda800131b7a9ea6d1f3e4d97ab Mon Sep 17 00:00:00 2001
|
||||
From: Nico Pache <npache@redhat.com>
|
||||
Date: Sat, 4 Apr 2026 19:30:20 -0600
|
||||
Subject: [PATCH] mm/page_alloc: ignore the exact initial compaction result
|
||||
|
||||
commit 66987218154918a6341a3e3eeeee58110a69e0bb
|
||||
Author: Vlastimil Babka <vbabka@suse.cz>
|
||||
Date: Tue Jan 6 12:52:36 2026 +0100
|
||||
|
||||
mm/page_alloc: ignore the exact initial compaction result
|
||||
|
||||
Patch series "tweaks for __alloc_pages_slowpath()", v3.
|
||||
|
||||
This patch (of 3):
|
||||
|
||||
For allocations that are of costly order and __GFP_NORETRY (and can
|
||||
perform compaction) we attempt direct compaction first. If that fails, we
|
||||
continue with a single round of direct reclaim+compaction (as for other
|
||||
__GFP_NORETRY allocations, except the compaction is of lower priority),
|
||||
with two exceptions that fail immediately:
|
||||
|
||||
- __GFP_THISNODE is specified, to prevent zone_reclaim_mode-like
|
||||
behavior for e.g. THP page faults
|
||||
|
||||
- compaction failed because it was deferred (i.e. has been failing
|
||||
recently so further attempts are not done for a while) or skipped,
|
||||
which means there are insufficient free base pages to defragment to
|
||||
begin with
|
||||
|
||||
Upon closer inspection, the second condition has a somewhat flawed
|
||||
reasoning. If there are not enough base pages and reclaim could create
|
||||
them, we instead fail. When there are enough base pages and compaction
|
||||
has already ran and failed, we proceed and hope that reclaim and the
|
||||
subsequent compaction attempt will succeed. But it's unclear why they
|
||||
should and whether it will be as inexpensive as intended.
|
||||
|
||||
It might make therefore more sense to just fail unconditionally after the
|
||||
initial compaction attempt. However that would change the semantics of
|
||||
__GFP_NORETRY to attempt reclaim at least once.
|
||||
|
||||
Alternatively we can remove the compaction result checks and proceed with
|
||||
the single reclaim and (lower priority) compaction attempt, leaving only
|
||||
the __GFP_THISNODE exception for failing immediately.
|
||||
|
||||
Link: https://lkml.kernel.org/r/20260106-thp-thisnode-tweak-v3-0-f5d67c21a193@suse.cz
|
||||
Link: https://lkml.kernel.org/r/20260106-thp-thisnode-tweak-v3-1-f5d67c21a193@suse.cz
|
||||
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
|
||||
Acked-by: Michal Hocko <mhocko@suse.com>
|
||||
Cc: Brendan Jackman <jackmanb@google.com>
|
||||
Cc: David Hildenbrand (Red Hat) <david@kernel.org>
|
||||
Cc: David Rientjes <rientjes@google.com>
|
||||
Cc: Johannes Weiner <hannes@cmpxchg.org>
|
||||
Cc: Joshua Hahn <joshua.hahnjy@gmail.com>
|
||||
Cc: Liam Howlett <liam.howlett@oracle.com>
|
||||
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
|
||||
Cc: Mike Rapoport <rppt@kernel.org>
|
||||
Cc: Pedro Falcato <pfalcato@suse.de>
|
||||
Cc: Suren Baghdasaryan <surenb@google.com>
|
||||
Cc: Zi Yan <ziy@nvidia.com>
|
||||
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
||||
|
||||
JIRA: https://redhat.atlassian.net/browse/RHEL-148561
|
||||
Signed-off-by: Nico Pache <npache@redhat.com>
|
||||
|
||||
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
|
||||
index ca0d42a95410..3301e934dafa 100644
|
||||
--- a/mm/page_alloc.c
|
||||
+++ b/mm/page_alloc.c
|
||||
@@ -4162,44 +4162,22 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
|
||||
* includes some THP page fault allocations
|
||||
*/
|
||||
if (costly_order && (gfp_mask & __GFP_NORETRY)) {
|
||||
- /*
|
||||
- * If allocating entire pageblock(s) and compaction
|
||||
- * failed because all zones are below low watermarks
|
||||
- * or is prohibited because it recently failed at this
|
||||
- * order, fail immediately unless the allocator has
|
||||
- * requested compaction and reclaim retry.
|
||||
- *
|
||||
- * Reclaim is
|
||||
- * - potentially very expensive because zones are far
|
||||
- * below their low watermarks or this is part of very
|
||||
- * bursty high order allocations,
|
||||
- * - not guaranteed to help because isolate_freepages()
|
||||
- * may not iterate over freed pages as part of its
|
||||
- * linear scan, and
|
||||
- * - unlikely to make entire pageblocks free on its
|
||||
- * own.
|
||||
- */
|
||||
- if (compact_result == COMPACT_SKIPPED ||
|
||||
- compact_result == COMPACT_DEFERRED)
|
||||
- goto nopage;
|
||||
-
|
||||
/*
|
||||
* THP page faults may attempt local node only first,
|
||||
* but are then allowed to only compact, not reclaim,
|
||||
* see alloc_pages_mpol().
|
||||
*
|
||||
- * Compaction can fail for other reasons than those
|
||||
- * checked above and we don't want such THP allocations
|
||||
- * to put reclaim pressure on a single node in a
|
||||
- * situation where other nodes might have plenty of
|
||||
- * available memory.
|
||||
+ * Compaction has failed above and we don't want such
|
||||
+ * THP allocations to put reclaim pressure on a single
|
||||
+ * node in a situation where other nodes might have
|
||||
+ * plenty of available memory.
|
||||
*/
|
||||
if (gfp_mask & __GFP_THISNODE)
|
||||
goto nopage;
|
||||
|
||||
/*
|
||||
- * Looks like reclaim/compaction is worth trying, but
|
||||
- * sync compaction could be very expensive, so keep
|
||||
+ * Proceed with single round of reclaim/compaction, but
|
||||
+ * since sync compaction could be very expensive, keep
|
||||
* using async compaction.
|
||||
*/
|
||||
compact_priority = INIT_COMPACT_PRIORITY;
|
||||
--
|
||||
2.50.1 (Apple Git-155)
|
||||
|
||||
@ -0,0 +1,208 @@
|
||||
From 2490569160937bfa1556b9d2dc07998148eb5f77 Mon Sep 17 00:00:00 2001
|
||||
From: Nico Pache <npache@redhat.com>
|
||||
Date: Sat, 4 Apr 2026 19:30:20 -0600
|
||||
Subject: [PATCH] mm/page_alloc: refactor the initial compaction handling
|
||||
|
||||
commit 53a9b4646f67c95df1775aa5f381cb7f42cae957
|
||||
Author: Vlastimil Babka <vbabka@suse.cz>
|
||||
Date: Tue Jan 6 12:52:37 2026 +0100
|
||||
|
||||
mm/page_alloc: refactor the initial compaction handling
|
||||
|
||||
The initial direct compaction done in some cases in
|
||||
__alloc_pages_slowpath() stands out from the main retry loop of reclaim +
|
||||
compaction.
|
||||
|
||||
We can simplify this by instead skipping the initial reclaim attempt via a
|
||||
new local variable compact_first, and handle the compact_prority as
|
||||
necessary to match the original behavior. No functional change intended.
|
||||
|
||||
Link: https://lkml.kernel.org/r/20260106-thp-thisnode-tweak-v3-2-f5d67c21a193@suse.cz
|
||||
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
|
||||
Suggested-by: Johannes Weiner <hannes@cmpxchg.org>
|
||||
Reviewed-by: Joshua Hahn <joshua.hahnjy@gmail.com>
|
||||
Acked-by: Michal Hocko <mhocko@suse.com>
|
||||
Cc: Brendan Jackman <jackmanb@google.com>
|
||||
Cc: David Hildenbrand (Red Hat) <david@kernel.org>
|
||||
Cc: David Rientjes <rientjes@google.com>
|
||||
Cc: Liam Howlett <liam.howlett@oracle.com>
|
||||
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
|
||||
Cc: Mike Rapoport <rppt@kernel.org>
|
||||
Cc: Pedro Falcato <pfalcato@suse.de>
|
||||
Cc: Suren Baghdasaryan <surenb@google.com>
|
||||
Cc: Zi Yan <ziy@nvidia.com>
|
||||
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
||||
|
||||
JIRA: https://redhat.atlassian.net/browse/RHEL-148561
|
||||
Signed-off-by: Nico Pache <npache@redhat.com>
|
||||
|
||||
diff --git a/include/linux/gfp.h b/include/linux/gfp.h
|
||||
index 3e4c0c536a3d..ac836590ba3a 100644
|
||||
--- a/include/linux/gfp.h
|
||||
+++ b/include/linux/gfp.h
|
||||
@@ -348,9 +348,15 @@ extern gfp_t gfp_allowed_mask;
|
||||
/* Returns true if the gfp_mask allows use of ALLOC_NO_WATERMARK */
|
||||
bool gfp_pfmemalloc_allowed(gfp_t gfp_mask);
|
||||
|
||||
+/* A helper for checking if gfp includes all the specified flags */
|
||||
+static inline bool gfp_has_flags(gfp_t gfp, gfp_t flags)
|
||||
+{
|
||||
+ return (gfp & flags) == flags;
|
||||
+}
|
||||
+
|
||||
static inline bool gfp_has_io_fs(gfp_t gfp)
|
||||
{
|
||||
- return (gfp & (__GFP_IO | __GFP_FS)) == (__GFP_IO | __GFP_FS);
|
||||
+ return gfp_has_flags(gfp, __GFP_IO | __GFP_FS);
|
||||
}
|
||||
|
||||
/*
|
||||
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
|
||||
index 3301e934dafa..277ed887ec7a 100644
|
||||
--- a/mm/page_alloc.c
|
||||
+++ b/mm/page_alloc.c
|
||||
@@ -4055,7 +4055,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
|
||||
struct alloc_context *ac)
|
||||
{
|
||||
bool can_direct_reclaim = gfp_mask & __GFP_DIRECT_RECLAIM;
|
||||
- bool can_compact = gfp_compaction_allowed(gfp_mask);
|
||||
+ bool can_compact = can_direct_reclaim && gfp_compaction_allowed(gfp_mask);
|
||||
bool nofail = gfp_mask & __GFP_NOFAIL;
|
||||
const bool costly_order = order > PAGE_ALLOC_COSTLY_ORDER;
|
||||
struct page *page = NULL;
|
||||
@@ -4068,6 +4068,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
|
||||
unsigned int cpuset_mems_cookie;
|
||||
unsigned int zonelist_iter_cookie;
|
||||
int reserve_flags;
|
||||
+ bool compact_first = false;
|
||||
|
||||
if (unlikely(nofail)) {
|
||||
/*
|
||||
@@ -4095,6 +4096,19 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
|
||||
cpuset_mems_cookie = read_mems_allowed_begin();
|
||||
zonelist_iter_cookie = zonelist_iter_begin();
|
||||
|
||||
+ /*
|
||||
+ * For costly allocations, try direct compaction first, as it's likely
|
||||
+ * that we have enough base pages and don't need to reclaim. For non-
|
||||
+ * movable high-order allocations, do that as well, as compaction will
|
||||
+ * try prevent permanent fragmentation by migrating from blocks of the
|
||||
+ * same migratetype.
|
||||
+ */
|
||||
+ if (can_compact && (costly_order || (order > 0 &&
|
||||
+ ac->migratetype != MIGRATE_MOVABLE))) {
|
||||
+ compact_first = true;
|
||||
+ compact_priority = INIT_COMPACT_PRIORITY;
|
||||
+ }
|
||||
+
|
||||
/*
|
||||
* The fast path uses conservative alloc_flags to succeed only until
|
||||
* kswapd needs to be woken up, and to avoid the cost of setting up
|
||||
@@ -4137,53 +4151,6 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
|
||||
if (page)
|
||||
goto got_pg;
|
||||
|
||||
- /*
|
||||
- * For costly allocations, try direct compaction first, as it's likely
|
||||
- * that we have enough base pages and don't need to reclaim. For non-
|
||||
- * movable high-order allocations, do that as well, as compaction will
|
||||
- * try prevent permanent fragmentation by migrating from blocks of the
|
||||
- * same migratetype.
|
||||
- * Don't try this for allocations that are allowed to ignore
|
||||
- * watermarks, as the ALLOC_NO_WATERMARKS attempt didn't yet happen.
|
||||
- */
|
||||
- if (can_direct_reclaim && can_compact &&
|
||||
- (costly_order ||
|
||||
- (order > 0 && ac->migratetype != MIGRATE_MOVABLE))
|
||||
- && !gfp_pfmemalloc_allowed(gfp_mask)) {
|
||||
- page = __alloc_pages_direct_compact(gfp_mask, order,
|
||||
- alloc_flags, ac,
|
||||
- INIT_COMPACT_PRIORITY,
|
||||
- &compact_result);
|
||||
- if (page)
|
||||
- goto got_pg;
|
||||
-
|
||||
- /*
|
||||
- * Checks for costly allocations with __GFP_NORETRY, which
|
||||
- * includes some THP page fault allocations
|
||||
- */
|
||||
- if (costly_order && (gfp_mask & __GFP_NORETRY)) {
|
||||
- /*
|
||||
- * THP page faults may attempt local node only first,
|
||||
- * but are then allowed to only compact, not reclaim,
|
||||
- * see alloc_pages_mpol().
|
||||
- *
|
||||
- * Compaction has failed above and we don't want such
|
||||
- * THP allocations to put reclaim pressure on a single
|
||||
- * node in a situation where other nodes might have
|
||||
- * plenty of available memory.
|
||||
- */
|
||||
- if (gfp_mask & __GFP_THISNODE)
|
||||
- goto nopage;
|
||||
-
|
||||
- /*
|
||||
- * Proceed with single round of reclaim/compaction, but
|
||||
- * since sync compaction could be very expensive, keep
|
||||
- * using async compaction.
|
||||
- */
|
||||
- compact_priority = INIT_COMPACT_PRIORITY;
|
||||
- }
|
||||
- }
|
||||
-
|
||||
retry:
|
||||
/*
|
||||
* Deal with possible cpuset update races or zonelist updates to avoid
|
||||
@@ -4227,10 +4194,12 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
|
||||
goto nopage;
|
||||
|
||||
/* Try direct reclaim and then allocating */
|
||||
- page = __alloc_pages_direct_reclaim(gfp_mask, order, alloc_flags, ac,
|
||||
- &did_some_progress);
|
||||
- if (page)
|
||||
- goto got_pg;
|
||||
+ if (!compact_first) {
|
||||
+ page = __alloc_pages_direct_reclaim(gfp_mask, order, alloc_flags,
|
||||
+ ac, &did_some_progress);
|
||||
+ if (page)
|
||||
+ goto got_pg;
|
||||
+ }
|
||||
|
||||
/* Try direct compaction and then allocating */
|
||||
page = __alloc_pages_direct_compact(gfp_mask, order, alloc_flags, ac,
|
||||
@@ -4238,6 +4207,33 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
|
||||
if (page)
|
||||
goto got_pg;
|
||||
|
||||
+ if (compact_first) {
|
||||
+ /*
|
||||
+ * THP page faults may attempt local node only first, but are
|
||||
+ * then allowed to only compact, not reclaim, see
|
||||
+ * alloc_pages_mpol().
|
||||
+ *
|
||||
+ * Compaction has failed above and we don't want such THP
|
||||
+ * allocations to put reclaim pressure on a single node in a
|
||||
+ * situation where other nodes might have plenty of available
|
||||
+ * memory.
|
||||
+ */
|
||||
+ if (gfp_has_flags(gfp_mask, __GFP_NORETRY | __GFP_THISNODE))
|
||||
+ goto nopage;
|
||||
+
|
||||
+ /*
|
||||
+ * For the initial compaction attempt we have lowered its
|
||||
+ * priority. Restore it for further retries, if those are
|
||||
+ * allowed. With __GFP_NORETRY there will be a single round of
|
||||
+ * reclaim and compaction with the lowered priority.
|
||||
+ */
|
||||
+ if (!(gfp_mask & __GFP_NORETRY))
|
||||
+ compact_priority = DEF_COMPACT_PRIORITY;
|
||||
+
|
||||
+ compact_first = false;
|
||||
+ goto retry;
|
||||
+ }
|
||||
+
|
||||
/* Do not loop if specifically requested */
|
||||
if (gfp_mask & __GFP_NORETRY)
|
||||
goto nopage;
|
||||
--
|
||||
2.50.1 (Apple Git-155)
|
||||
|
||||
@ -0,0 +1,138 @@
|
||||
From a17c77996e1aa930c05901e213f1441f0db7a46a Mon Sep 17 00:00:00 2001
|
||||
From: Nico Pache <npache@redhat.com>
|
||||
Date: Sat, 4 Apr 2026 19:30:20 -0600
|
||||
Subject: [PATCH] mm/page_alloc: simplify __alloc_pages_slowpath() flow
|
||||
|
||||
commit 2c4c3e29897d43c431b1cf9432fb66977f262ac2
|
||||
Author: Vlastimil Babka <vbabka@suse.cz>
|
||||
Date: Tue Jan 6 12:52:38 2026 +0100
|
||||
|
||||
mm/page_alloc: simplify __alloc_pages_slowpath() flow
|
||||
|
||||
The actions done before entering the main retry loop include waking up
|
||||
kswapds and an allocation attempt with the precise alloc_flags. Then in
|
||||
the loop we keep waking up kswapds, and we retry the allocation with flags
|
||||
potentially further adjusted by being allowed to use reserves (due to e.g.
|
||||
becoming an OOM killer victim).
|
||||
|
||||
We can adjust the retry loop to keep only one instance of waking up
|
||||
kswapds and allocation attempt. Introduce the can_retry_reserves variable
|
||||
for retrying once when we become eligible for reserves. It is still
|
||||
useful not to evaluate reserve_flags immediately for the first allocation
|
||||
attempt, because it's better to first try succeed in a non-preferred zone
|
||||
above the min watermark before allocating immediately from the preferred
|
||||
zone below min watermark.
|
||||
|
||||
Additionally move the cpuset update checks introduced by e05741fb10c3
|
||||
("mm/page_alloc.c: avoid infinite retries caused by cpuset race") further
|
||||
down the retry loop. It's enough to do the checks only before reaching
|
||||
any potentially infinite 'goto retry;' loop.
|
||||
|
||||
There should be no meaningful functional changes. The change of exact
|
||||
moments the retry for reserves and cpuset updates are checked should not
|
||||
result in different outomes modulo races with concurrent allocator
|
||||
activity.
|
||||
|
||||
Link: https://lkml.kernel.org/r/20260106-thp-thisnode-tweak-v3-3-f5d67c21a193@suse.cz
|
||||
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
|
||||
Acked-by: Michal Hocko <mhocko@suse.com>
|
||||
Cc: Johannes Weiner <hannes@cmpxchg.org>
|
||||
Cc: Joshua Hahn <joshua.hahnjy@gmail.com>
|
||||
Cc: Brendan Jackman <jackmanb@google.com>
|
||||
Cc: David Hildenbrand (Red Hat) <david@kernel.org>
|
||||
Cc: David Rientjes <rientjes@google.com>
|
||||
Cc: Liam Howlett <liam.howlett@oracle.com>
|
||||
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
|
||||
Cc: Mike Rapoport <rppt@kernel.org>
|
||||
Cc: Pedro Falcato <pfalcato@suse.de>
|
||||
Cc: Suren Baghdasaryan <surenb@google.com>
|
||||
Cc: Zi Yan <ziy@nvidia.com>
|
||||
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
||||
|
||||
JIRA: https://redhat.atlassian.net/browse/RHEL-148561
|
||||
Signed-off-by: Nico Pache <npache@redhat.com>
|
||||
|
||||
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
|
||||
index 277ed887ec7a..4c2b622a39cf 100644
|
||||
--- a/mm/page_alloc.c
|
||||
+++ b/mm/page_alloc.c
|
||||
@@ -4069,6 +4069,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
|
||||
unsigned int zonelist_iter_cookie;
|
||||
int reserve_flags;
|
||||
bool compact_first = false;
|
||||
+ bool can_retry_reserves = true;
|
||||
|
||||
if (unlikely(nofail)) {
|
||||
/*
|
||||
@@ -4140,6 +4141,8 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
|
||||
goto nopage;
|
||||
}
|
||||
|
||||
+retry:
|
||||
+ /* Ensure kswapd doesn't accidentally go to sleep as long as we loop */
|
||||
if (alloc_flags & ALLOC_KSWAPD)
|
||||
wake_all_kswapds(order, gfp_mask, ac);
|
||||
|
||||
@@ -4151,19 +4154,6 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
|
||||
if (page)
|
||||
goto got_pg;
|
||||
|
||||
-retry:
|
||||
- /*
|
||||
- * Deal with possible cpuset update races or zonelist updates to avoid
|
||||
- * infinite retries.
|
||||
- */
|
||||
- if (check_retry_cpuset(cpuset_mems_cookie, ac) ||
|
||||
- check_retry_zonelist(zonelist_iter_cookie))
|
||||
- goto restart;
|
||||
-
|
||||
- /* Ensure kswapd doesn't accidentally go to sleep as long as we loop */
|
||||
- if (alloc_flags & ALLOC_KSWAPD)
|
||||
- wake_all_kswapds(order, gfp_mask, ac);
|
||||
-
|
||||
reserve_flags = __gfp_pfmemalloc_flags(gfp_mask);
|
||||
if (reserve_flags)
|
||||
alloc_flags = gfp_to_alloc_flags_cma(gfp_mask, reserve_flags) |
|
||||
@@ -4178,12 +4168,18 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
|
||||
ac->nodemask = NULL;
|
||||
ac->preferred_zoneref = first_zones_zonelist(ac->zonelist,
|
||||
ac->highest_zoneidx, ac->nodemask);
|
||||
- }
|
||||
|
||||
- /* Attempt with potentially adjusted zonelist and alloc_flags */
|
||||
- page = get_page_from_freelist(gfp_mask, order, alloc_flags, ac);
|
||||
- if (page)
|
||||
- goto got_pg;
|
||||
+ /*
|
||||
+ * The first time we adjust anything due to being allowed to
|
||||
+ * ignore memory policies or watermarks, retry immediately. This
|
||||
+ * allows us to keep the first allocation attempt optimistic so
|
||||
+ * it can succeed in a zone that is still above watermarks.
|
||||
+ */
|
||||
+ if (can_retry_reserves) {
|
||||
+ can_retry_reserves = false;
|
||||
+ goto retry;
|
||||
+ }
|
||||
+ }
|
||||
|
||||
/* Caller is not willing to reclaim, we can't balance anything */
|
||||
if (!can_direct_reclaim)
|
||||
@@ -4246,6 +4242,15 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
|
||||
!(gfp_mask & __GFP_RETRY_MAYFAIL)))
|
||||
goto nopage;
|
||||
|
||||
+ /*
|
||||
+ * Deal with possible cpuset update races or zonelist updates to avoid
|
||||
+ * infinite retries. No "goto retry;" can be placed above this check
|
||||
+ * unless it can execute just once.
|
||||
+ */
|
||||
+ if (check_retry_cpuset(cpuset_mems_cookie, ac) ||
|
||||
+ check_retry_zonelist(zonelist_iter_cookie))
|
||||
+ goto restart;
|
||||
+
|
||||
if (should_reclaim_retry(gfp_mask, order, ac, alloc_flags,
|
||||
did_some_progress > 0, &no_progress_loops))
|
||||
goto retry;
|
||||
--
|
||||
2.50.1 (Apple Git-155)
|
||||
|
||||
@ -0,0 +1,85 @@
|
||||
From fe93b8af523e8f3cf1e7304d100adf6ed44f6345 Mon Sep 17 00:00:00 2001
|
||||
From: Nico Pache <npache@redhat.com>
|
||||
Date: Sat, 28 Mar 2026 16:17:55 -0600
|
||||
Subject: [PATCH] mm/page_alloc: add vm.thp_thisnode_reclaim sysctl to allow
|
||||
THP reclaim on local node
|
||||
|
||||
Upstream commit cd2e3c32636e ("mm, page_alloc, thp: prevent reclaim for
|
||||
__GFP_THISNODE THP allocations") prevents __GFP_THISNODE THP allocations
|
||||
from proceeding into reclaim after compaction failure, to avoid
|
||||
zone_reclaim_mode-like excessive reclaim on a single NUMA node when other
|
||||
nodes have plenty of free memory. This was further refined by upstream
|
||||
commits 66987218154918a6 and 53a9b4646f67 which refactored the check
|
||||
into gfp_has_flags(gfp_mask, __GFP_NORETRY | __GFP_THISNODE).
|
||||
While this is the correct default, to prevent workloads regressing on older
|
||||
releases, or for customers/workloads that may benefit from the more aggressive
|
||||
reclaim behavior. Add a sysctl knob (vm.thp_thisnode_reclaim) to restore the
|
||||
previous behavior.
|
||||
|
||||
The sysctl defaults to 1 to avoid regressions and keep the pre-fix behavior.
|
||||
|
||||
Upstream-status: RHEL-Only
|
||||
JIRA: https://redhat.atlassian.net/browse/RHEL-148561
|
||||
Signed-off-by: Nico Pache <npache@redhat.com>
|
||||
|
||||
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
|
||||
index 4c2b622a39cf..b26f3d53b751 100644
|
||||
--- a/mm/page_alloc.c
|
||||
+++ b/mm/page_alloc.c
|
||||
@@ -290,6 +290,14 @@ int user_min_free_kbytes = -1;
|
||||
static int watermark_boost_factor __read_mostly = 15000;
|
||||
static int watermark_scale_factor = 10;
|
||||
|
||||
+/*
|
||||
+ * RHEL-ONLY: When set to 1, allows reclaim for __GFP_THISNODE THP allocations,
|
||||
+ * restoring the behavior prior to the fix that prevents zone_reclaim_mode-like
|
||||
+ * excessive reclaim on a single NUMA node when other nodes have plenty of free
|
||||
+ * memory.
|
||||
+ */
|
||||
+static int thp_thisnode_reclaim __read_mostly = 1;
|
||||
+
|
||||
/* movable_zone is the "real" zone pages in ZONE_MOVABLE are taken from */
|
||||
int movable_zone;
|
||||
EXPORT_SYMBOL(movable_zone);
|
||||
@@ -4213,9 +4221,20 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
|
||||
* allocations to put reclaim pressure on a single node in a
|
||||
* situation where other nodes might have plenty of available
|
||||
* memory.
|
||||
+ *
|
||||
+ * RHEL-ONLY: vm.thp_thisnode_reclaim can override this to
|
||||
+ * restore the pre-fix behavior: allow reclaim for THISNODE THP
|
||||
+ * allocations, but still fail immediately when compaction was
|
||||
+ * skipped (insufficient free base pages) or deferred (recent
|
||||
+ * compaction failures at this order).
|
||||
*/
|
||||
- if (gfp_has_flags(gfp_mask, __GFP_NORETRY | __GFP_THISNODE))
|
||||
- goto nopage;
|
||||
+ if (gfp_has_flags(gfp_mask, __GFP_NORETRY | __GFP_THISNODE)) {
|
||||
+ if (!thp_thisnode_reclaim)
|
||||
+ goto nopage;
|
||||
+ if (compact_result == COMPACT_SKIPPED ||
|
||||
+ compact_result == COMPACT_DEFERRED)
|
||||
+ goto nopage;
|
||||
+ }
|
||||
|
||||
/*
|
||||
* For the initial compaction attempt we have lowered its
|
||||
@@ -6213,6 +6232,15 @@ static struct ctl_table page_alloc_sysctl_table[] = {
|
||||
.extra1 = SYSCTL_ZERO,
|
||||
.extra2 = SYSCTL_ONE_HUNDRED,
|
||||
},
|
||||
+ {
|
||||
+ .procname = "thp_thisnode_reclaim", //RHEL-ONLY
|
||||
+ .data = &thp_thisnode_reclaim,
|
||||
+ .maxlen = sizeof(thp_thisnode_reclaim),
|
||||
+ .mode = 0644,
|
||||
+ .proc_handler = proc_dointvec_minmax,
|
||||
+ .extra1 = SYSCTL_ZERO,
|
||||
+ .extra2 = SYSCTL_ONE,
|
||||
+ },
|
||||
#endif
|
||||
{}
|
||||
};
|
||||
--
|
||||
2.50.1 (Apple Git-155)
|
||||
|
||||
@ -0,0 +1,123 @@
|
||||
From e0c8209f463129749b824ebf8068fd75774dd5d7 Mon Sep 17 00:00:00 2001
|
||||
From: CKI Backport Bot <cki-ci-bot+cki-gitlab-backport-bot@redhat.com>
|
||||
Date: Tue, 28 Apr 2026 12:07:13 +0000
|
||||
Subject: [PATCH] smb: client: fix OOB reads parsing symlink error response
|
||||
|
||||
JIRA: https://redhat.atlassian.net/browse/RHEL-171472
|
||||
CVE: CVE-2026-31613
|
||||
|
||||
commit 3df690bba28edec865cf7190be10708ad0ddd67e
|
||||
Author: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
||||
Date: Mon Apr 6 15:49:38 2026 +0200
|
||||
|
||||
smb: client: fix OOB reads parsing symlink error response
|
||||
|
||||
When a CREATE returns STATUS_STOPPED_ON_SYMLINK, smb2_check_message()
|
||||
returns success without any length validation, leaving the symlink
|
||||
parsers as the only defense against an untrusted server.
|
||||
|
||||
symlink_data() walks SMB 3.1.1 error contexts with the loop test "p <
|
||||
end", but reads p->ErrorId at offset 4 and p->ErrorDataLength at offset
|
||||
0. When the server-controlled ErrorDataLength advances p to within 1-7
|
||||
bytes of end, the next iteration will read past it. When the matching
|
||||
context is found, sym->SymLinkErrorTag is read at offset 4 from
|
||||
p->ErrorContextData with no check that the symlink header itself fits.
|
||||
|
||||
smb2_parse_symlink_response() then bounds-checks the substitute name
|
||||
using SMB2_SYMLINK_STRUCT_SIZE as the offset of PathBuffer from
|
||||
iov_base. That value is computed as sizeof(smb2_err_rsp) +
|
||||
sizeof(smb2_symlink_err_rsp), which is correct only when
|
||||
ErrorContextCount == 0.
|
||||
|
||||
With at least one error context the symlink data sits 8 bytes deeper,
|
||||
and each skipped non-matching context shifts it further by 8 +
|
||||
ALIGN(ErrorDataLength, 8). The check is too short, allowing the
|
||||
substitute name read to run past iov_len. The out-of-bound heap bytes
|
||||
are UTF-16-decoded into the symlink target and returned to userspace via
|
||||
readlink(2).
|
||||
|
||||
Fix this all up by making the loops test require the full context header
|
||||
to fit, rejecting sym if its header runs past end, and bound the
|
||||
substitute name against the actual position of sym->PathBuffer rather
|
||||
than a fixed offset.
|
||||
|
||||
Because sub_offs and sub_len are 16bits, the pointer math will not
|
||||
overflow here with the new greater-than.
|
||||
|
||||
Cc: Ronnie Sahlberg <ronniesahlberg@gmail.com>
|
||||
Cc: Shyam Prasad N <sprasad@microsoft.com>
|
||||
Cc: Tom Talpey <tom@talpey.com>
|
||||
Cc: Bharath SM <bharathsm@microsoft.com>
|
||||
Cc: linux-cifs@vger.kernel.org
|
||||
Cc: samba-technical@lists.samba.org
|
||||
Cc: stable <stable@kernel.org>
|
||||
Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.org>
|
||||
Assisted-by: gregkh_clanker_t1000
|
||||
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
||||
Signed-off-by: Steve French <stfrench@microsoft.com>
|
||||
|
||||
Signed-off-by: CKI Backport Bot <cki-ci-bot+cki-gitlab-backport-bot@redhat.com>
|
||||
|
||||
diff --git a/fs/smb/client/smb2file.c b/fs/smb/client/smb2file.c
|
||||
index fa9726f53143..4793981e2bb7 100644
|
||||
--- a/fs/smb/client/smb2file.c
|
||||
+++ b/fs/smb/client/smb2file.c
|
||||
@@ -27,10 +27,11 @@ static struct smb2_symlink_err_rsp *symlink_data(const struct kvec *iov)
|
||||
{
|
||||
struct smb2_err_rsp *err = iov->iov_base;
|
||||
struct smb2_symlink_err_rsp *sym = ERR_PTR(-EINVAL);
|
||||
+ u8 *end = (u8 *)err + iov->iov_len;
|
||||
u32 len;
|
||||
|
||||
if (err->ErrorContextCount) {
|
||||
- struct smb2_error_context_rsp *p, *end;
|
||||
+ struct smb2_error_context_rsp *p;
|
||||
|
||||
len = (u32)err->ErrorContextCount * (offsetof(struct smb2_error_context_rsp,
|
||||
ErrorContextData) +
|
||||
@@ -39,8 +40,7 @@ static struct smb2_symlink_err_rsp *symlink_data(const struct kvec *iov)
|
||||
return ERR_PTR(-EINVAL);
|
||||
|
||||
p = (struct smb2_error_context_rsp *)err->ErrorData;
|
||||
- end = (struct smb2_error_context_rsp *)((u8 *)err + iov->iov_len);
|
||||
- do {
|
||||
+ while ((u8 *)p + sizeof(*p) <= end) {
|
||||
if (le32_to_cpu(p->ErrorId) == SMB2_ERROR_ID_DEFAULT) {
|
||||
sym = (struct smb2_symlink_err_rsp *)p->ErrorContextData;
|
||||
break;
|
||||
@@ -50,14 +50,16 @@ static struct smb2_symlink_err_rsp *symlink_data(const struct kvec *iov)
|
||||
|
||||
len = ALIGN(le32_to_cpu(p->ErrorDataLength), 8);
|
||||
p = (struct smb2_error_context_rsp *)(p->ErrorContextData + len);
|
||||
- } while (p < end);
|
||||
+ }
|
||||
} else if (le32_to_cpu(err->ByteCount) >= sizeof(*sym) &&
|
||||
iov->iov_len >= SMB2_SYMLINK_STRUCT_SIZE) {
|
||||
sym = (struct smb2_symlink_err_rsp *)err->ErrorData;
|
||||
}
|
||||
|
||||
- if (!IS_ERR(sym) && (le32_to_cpu(sym->SymLinkErrorTag) != SYMLINK_ERROR_TAG ||
|
||||
- le32_to_cpu(sym->ReparseTag) != IO_REPARSE_TAG_SYMLINK))
|
||||
+ if (!IS_ERR(sym) &&
|
||||
+ ((u8 *)sym + sizeof(*sym) > end ||
|
||||
+ le32_to_cpu(sym->SymLinkErrorTag) != SYMLINK_ERROR_TAG ||
|
||||
+ le32_to_cpu(sym->ReparseTag) != IO_REPARSE_TAG_SYMLINK))
|
||||
sym = ERR_PTR(-EINVAL);
|
||||
|
||||
return sym;
|
||||
@@ -128,8 +130,10 @@ int smb2_parse_symlink_response(struct cifs_sb_info *cifs_sb, const struct kvec
|
||||
print_len = le16_to_cpu(sym->PrintNameLength);
|
||||
print_offs = le16_to_cpu(sym->PrintNameOffset);
|
||||
|
||||
- if (iov->iov_len < SMB2_SYMLINK_STRUCT_SIZE + sub_offs + sub_len ||
|
||||
- iov->iov_len < SMB2_SYMLINK_STRUCT_SIZE + print_offs + print_len)
|
||||
+ if ((char *)sym->PathBuffer + sub_offs + sub_len >
|
||||
+ (char *)iov->iov_base + iov->iov_len ||
|
||||
+ (char *)sym->PathBuffer + print_offs + print_len >
|
||||
+ (char *)iov->iov_base + iov->iov_len)
|
||||
return -EINVAL;
|
||||
|
||||
return smb2_parse_native_symlink(path,
|
||||
--
|
||||
2.50.1 (Apple Git-155)
|
||||
|
||||
@ -0,0 +1,46 @@
|
||||
From b68bb0a260effb5982ab52535a3213ff03b57ed9 Mon Sep 17 00:00:00 2001
|
||||
From: Vladislav Dronov <vdronov@redhat.com>
|
||||
Date: Wed, 29 Apr 2026 23:04:01 +0200
|
||||
Subject: [PATCH] crypto: authenc - Fix sleep in atomic context in decrypt_tail
|
||||
|
||||
JIRA: https://issues.redhat.com/browse/RHEL-172166
|
||||
Upstream Status: merged into the linux.git
|
||||
|
||||
commit 66eae850333d639fc278d6f915c6fc01499ea893
|
||||
Author: Herbert Xu <herbert@gondor.apana.org.au>
|
||||
Date: Wed Jan 19 17:58:40 2022 +1100
|
||||
|
||||
crypto: authenc - Fix sleep in atomic context in decrypt_tail
|
||||
|
||||
The function crypto_authenc_decrypt_tail discards its flags
|
||||
argument and always relies on the flags from the original request
|
||||
when starting its sub-request.
|
||||
|
||||
This is clearly wrong as it may cause the SLEEPABLE flag to be
|
||||
set when it shouldn't.
|
||||
|
||||
Fixes: 92d95ba91772 ("crypto: authenc - Convert to new AEAD interface")
|
||||
Reported-by: Corentin Labbe <clabbe.montjoie@gmail.com>
|
||||
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
||||
Tested-by: Corentin Labbe <clabbe.montjoie@gmail.com>
|
||||
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
||||
|
||||
Assisted-by: Patchpal 0.7.1
|
||||
Signed-off-by: Vladislav Dronov <vdronov@redhat.com>
|
||||
|
||||
diff --git a/crypto/authenc.c b/crypto/authenc.c
|
||||
index 670bf1a01d00..17f674a7cdff 100644
|
||||
--- a/crypto/authenc.c
|
||||
+++ b/crypto/authenc.c
|
||||
@@ -253,7 +253,7 @@ static int crypto_authenc_decrypt_tail(struct aead_request *req,
|
||||
dst = scatterwalk_ffwd(areq_ctx->dst, req->dst, req->assoclen);
|
||||
|
||||
skcipher_request_set_tfm(skreq, ctx->enc);
|
||||
- skcipher_request_set_callback(skreq, aead_request_flags(req),
|
||||
+ skcipher_request_set_callback(skreq, flags,
|
||||
req->base.complete, req->base.data);
|
||||
skcipher_request_set_crypt(skreq, src, dst,
|
||||
req->cryptlen - authsize, req->iv);
|
||||
--
|
||||
2.50.1 (Apple Git-155)
|
||||
|
||||
@ -0,0 +1,213 @@
|
||||
From 6cedc3414cbe4e00b4a85ac4381edb18805194d6 Mon Sep 17 00:00:00 2001
|
||||
From: Vladislav Dronov <vdronov@redhat.com>
|
||||
Date: Wed, 29 Apr 2026 23:05:43 +0200
|
||||
Subject: [PATCH] crypto: authenc - Correctly pass EINPROGRESS back up to the
|
||||
caller
|
||||
|
||||
JIRA: https://issues.redhat.com/browse/RHEL-172166
|
||||
Upstream Status: merged into the linux.git
|
||||
|
||||
Conflicts: Missing a large crypto-tree-wide upstream patch 255e48eb1768
|
||||
("crypto: api - Use data directly in completion function"). To apply:
|
||||
- Change "void *data" back to "struct crypto_async_request *areq".
|
||||
- Changle "struct aead_request *req = data" back to "struct aead_request
|
||||
*req = areq->data".
|
||||
|
||||
commit 96feb73def02d175850daa0e7c2c90c876681b5c
|
||||
Author: Herbert Xu <herbert@gondor.apana.org.au>
|
||||
Date: Wed Sep 24 18:20:17 2025 +0800
|
||||
|
||||
crypto: authenc - Correctly pass EINPROGRESS back up to the caller
|
||||
|
||||
When authenc is invoked with MAY_BACKLOG, it needs to pass EINPROGRESS
|
||||
notifications back up to the caller when the underlying algorithm
|
||||
returns EBUSY synchronously.
|
||||
|
||||
However, if the EBUSY comes from the second part of an authenc call,
|
||||
i.e., it is asynchronous, both the EBUSY and the subsequent EINPROGRESS
|
||||
notification must not be passed to the caller.
|
||||
|
||||
Implement this by passing a mask to the function that starts the
|
||||
second half of authenc and using it to determine whether EBUSY
|
||||
and EINPROGRESS should be passed to the caller.
|
||||
|
||||
This was a deficiency in the original implementation of authenc
|
||||
because it was not expected to be used with MAY_BACKLOG.
|
||||
|
||||
Reported-by: Ingo Franzki <ifranzki@linux.ibm.com>
|
||||
Reported-by: Mikulas Patocka <mpatocka@redhat.com>
|
||||
Fixes: 180ce7e81030 ("crypto: authenc - Add EINPROGRESS check")
|
||||
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
||||
|
||||
Assisted-by: Patchpal AI 0.7.1
|
||||
Signed-off-by: Vladislav Dronov <vdronov@redhat.com>
|
||||
|
||||
diff --git a/crypto/authenc.c b/crypto/authenc.c
|
||||
index 17f674a7cdff..494c0b6db431 100644
|
||||
--- a/crypto/authenc.c
|
||||
+++ b/crypto/authenc.c
|
||||
@@ -39,7 +39,7 @@ struct authenc_request_ctx {
|
||||
|
||||
static void authenc_request_complete(struct aead_request *req, int err)
|
||||
{
|
||||
- if (err != -EINPROGRESS)
|
||||
+ if (err != -EINPROGRESS && err != -EBUSY)
|
||||
aead_request_complete(req, err);
|
||||
}
|
||||
|
||||
@@ -109,27 +109,42 @@ static int crypto_authenc_setkey(struct crypto_aead *authenc, const u8 *key,
|
||||
return err;
|
||||
}
|
||||
|
||||
-static void authenc_geniv_ahash_done(struct crypto_async_request *areq, int err)
|
||||
+static void authenc_geniv_ahash_finish(struct aead_request *req)
|
||||
{
|
||||
- struct aead_request *req = areq->data;
|
||||
struct crypto_aead *authenc = crypto_aead_reqtfm(req);
|
||||
struct aead_instance *inst = aead_alg_instance(authenc);
|
||||
struct authenc_instance_ctx *ictx = aead_instance_ctx(inst);
|
||||
struct authenc_request_ctx *areq_ctx = aead_request_ctx(req);
|
||||
struct ahash_request *ahreq = (void *)(areq_ctx->tail + ictx->reqoff);
|
||||
|
||||
- if (err)
|
||||
- goto out;
|
||||
-
|
||||
scatterwalk_map_and_copy(ahreq->result, req->dst,
|
||||
req->assoclen + req->cryptlen,
|
||||
crypto_aead_authsize(authenc), 1);
|
||||
+}
|
||||
|
||||
-out:
|
||||
+static void authenc_geniv_ahash_done(struct crypto_async_request *areq, int err)
|
||||
+{
|
||||
+ struct aead_request *req = areq->data;
|
||||
+
|
||||
+ if (!err)
|
||||
+ authenc_geniv_ahash_finish(req);
|
||||
aead_request_complete(req, err);
|
||||
}
|
||||
|
||||
-static int crypto_authenc_genicv(struct aead_request *req, unsigned int flags)
|
||||
+/*
|
||||
+ * Used when the ahash request was invoked in the async callback context
|
||||
+ * of the previous skcipher request. Eat any EINPROGRESS notifications.
|
||||
+ */
|
||||
+static void authenc_geniv_ahash_done2(struct crypto_async_request *areq, int err)
|
||||
+{
|
||||
+ struct aead_request *req = areq->data;
|
||||
+
|
||||
+ if (!err)
|
||||
+ authenc_geniv_ahash_finish(req);
|
||||
+ authenc_request_complete(req, err);
|
||||
+}
|
||||
+
|
||||
+static int crypto_authenc_genicv(struct aead_request *req, unsigned int mask)
|
||||
{
|
||||
struct crypto_aead *authenc = crypto_aead_reqtfm(req);
|
||||
struct aead_instance *inst = aead_alg_instance(authenc);
|
||||
@@ -138,6 +153,7 @@ static int crypto_authenc_genicv(struct aead_request *req, unsigned int flags)
|
||||
struct crypto_ahash *auth = ctx->auth;
|
||||
struct authenc_request_ctx *areq_ctx = aead_request_ctx(req);
|
||||
struct ahash_request *ahreq = (void *)(areq_ctx->tail + ictx->reqoff);
|
||||
+ unsigned int flags = aead_request_flags(req) & ~mask;
|
||||
u8 *hash = areq_ctx->tail;
|
||||
int err;
|
||||
|
||||
@@ -148,7 +164,8 @@ static int crypto_authenc_genicv(struct aead_request *req, unsigned int flags)
|
||||
ahash_request_set_crypt(ahreq, req->dst, hash,
|
||||
req->assoclen + req->cryptlen);
|
||||
ahash_request_set_callback(ahreq, flags,
|
||||
- authenc_geniv_ahash_done, req);
|
||||
+ mask ? authenc_geniv_ahash_done2 :
|
||||
+ authenc_geniv_ahash_done, req);
|
||||
|
||||
err = crypto_ahash_digest(ahreq);
|
||||
if (err)
|
||||
@@ -165,12 +182,11 @@ static void crypto_authenc_encrypt_done(struct crypto_async_request *req,
|
||||
{
|
||||
struct aead_request *areq = req->data;
|
||||
|
||||
- if (err)
|
||||
- goto out;
|
||||
-
|
||||
- err = crypto_authenc_genicv(areq, 0);
|
||||
-
|
||||
-out:
|
||||
+ if (err) {
|
||||
+ aead_request_complete(areq, err);
|
||||
+ return;
|
||||
+ }
|
||||
+ err = crypto_authenc_genicv(areq, CRYPTO_TFM_REQ_MAY_SLEEP);
|
||||
authenc_request_complete(areq, err);
|
||||
}
|
||||
|
||||
@@ -223,11 +239,18 @@ static int crypto_authenc_encrypt(struct aead_request *req)
|
||||
if (err)
|
||||
return err;
|
||||
|
||||
- return crypto_authenc_genicv(req, aead_request_flags(req));
|
||||
+ return crypto_authenc_genicv(req, 0);
|
||||
+}
|
||||
+
|
||||
+static void authenc_decrypt_tail_done(struct crypto_async_request *areq, int err)
|
||||
+{
|
||||
+ struct aead_request *req = areq->data;
|
||||
+
|
||||
+ authenc_request_complete(req, err);
|
||||
}
|
||||
|
||||
static int crypto_authenc_decrypt_tail(struct aead_request *req,
|
||||
- unsigned int flags)
|
||||
+ unsigned int mask)
|
||||
{
|
||||
struct crypto_aead *authenc = crypto_aead_reqtfm(req);
|
||||
struct aead_instance *inst = aead_alg_instance(authenc);
|
||||
@@ -238,6 +261,7 @@ static int crypto_authenc_decrypt_tail(struct aead_request *req,
|
||||
struct skcipher_request *skreq = (void *)(areq_ctx->tail +
|
||||
ictx->reqoff);
|
||||
unsigned int authsize = crypto_aead_authsize(authenc);
|
||||
+ unsigned int flags = aead_request_flags(req) & ~mask;
|
||||
u8 *ihash = ahreq->result + authsize;
|
||||
struct scatterlist *src, *dst;
|
||||
|
||||
@@ -254,7 +278,9 @@ static int crypto_authenc_decrypt_tail(struct aead_request *req,
|
||||
|
||||
skcipher_request_set_tfm(skreq, ctx->enc);
|
||||
skcipher_request_set_callback(skreq, flags,
|
||||
- req->base.complete, req->base.data);
|
||||
+ mask ? authenc_decrypt_tail_done :
|
||||
+ req->base.complete,
|
||||
+ mask ? req : req->base.data);
|
||||
skcipher_request_set_crypt(skreq, src, dst,
|
||||
req->cryptlen - authsize, req->iv);
|
||||
|
||||
@@ -266,12 +292,11 @@ static void authenc_verify_ahash_done(struct crypto_async_request *areq,
|
||||
{
|
||||
struct aead_request *req = areq->data;
|
||||
|
||||
- if (err)
|
||||
- goto out;
|
||||
-
|
||||
- err = crypto_authenc_decrypt_tail(req, 0);
|
||||
-
|
||||
-out:
|
||||
+ if (err) {
|
||||
+ aead_request_complete(req, err);
|
||||
+ return;
|
||||
+ }
|
||||
+ err = crypto_authenc_decrypt_tail(req, CRYPTO_TFM_REQ_MAY_SLEEP);
|
||||
authenc_request_complete(req, err);
|
||||
}
|
||||
|
||||
@@ -301,7 +326,7 @@ static int crypto_authenc_decrypt(struct aead_request *req)
|
||||
if (err)
|
||||
return err;
|
||||
|
||||
- return crypto_authenc_decrypt_tail(req, aead_request_flags(req));
|
||||
+ return crypto_authenc_decrypt_tail(req, 0);
|
||||
}
|
||||
|
||||
static int crypto_authenc_init_tfm(struct crypto_aead *tfm)
|
||||
--
|
||||
2.50.1 (Apple Git-155)
|
||||
|
||||
@ -0,0 +1,60 @@
|
||||
From 27fdbab4221b375de54bf91919798d88520c6e28 Mon Sep 17 00:00:00 2001
|
||||
From: Juergen Gross <jgross@suse.com>
|
||||
Date: Fri, 27 Mar 2026 14:13:38 +0100
|
||||
Subject: [PATCH] Buffer overflow in drivers/xen/sys-hypervisor.c
|
||||
|
||||
The build id returned by HYPERVISOR_xen_version(XENVER_build_id) is
|
||||
neither NUL terminated nor a string.
|
||||
|
||||
The first causes a buffer overflow as sprintf in buildid_show will
|
||||
read and copy till it finds a NUL.
|
||||
|
||||
00000000 f4 91 51 f4 dd 38 9e 9d 65 47 52 eb 10 71 db 50 |..Q..8..eGR..q.P|
|
||||
00000010 b9 a8 01 42 6f 2e 32 |...Bo.2|
|
||||
00000017
|
||||
|
||||
So use a memcpy instead of sprintf to have the correct value:
|
||||
|
||||
00000000 f4 91 51 f4 dd 00 9e 9d 65 47 52 eb 10 71 db 50 |..Q.....eGR..q.P|
|
||||
00000010 b9 a8 01 42 |...B|
|
||||
00000014
|
||||
|
||||
(the above have a hack to embed a zero inside and check it's
|
||||
returned correctly).
|
||||
|
||||
This is XSA-485 / CVE-2026-31786
|
||||
|
||||
Fixes: 84b7625728ea ("xen: add sysfs node for hypervisor build id")
|
||||
Signed-off-by: Frediano Ziglio <frediano.ziglio@citrix.com>
|
||||
Reviewed-by: Juergen Gross <jgross@suse.com>
|
||||
Signed-off-by: Juergen Gross <jgross@suse.com>
|
||||
|
||||
diff --git a/drivers/xen/sys-hypervisor.c b/drivers/xen/sys-hypervisor.c
|
||||
index b1bb01ba82f8..91923242a5ae 100644
|
||||
--- a/drivers/xen/sys-hypervisor.c
|
||||
+++ b/drivers/xen/sys-hypervisor.c
|
||||
@@ -366,6 +366,8 @@ static ssize_t buildid_show(struct hyp_sysfs_attr *attr, char *buffer)
|
||||
ret = sprintf(buffer, "<denied>");
|
||||
return ret;
|
||||
}
|
||||
+ if (ret > PAGE_SIZE)
|
||||
+ return -ENOSPC;
|
||||
|
||||
buildid = kmalloc(sizeof(*buildid) + ret, GFP_KERNEL);
|
||||
if (!buildid)
|
||||
@@ -373,8 +375,10 @@ static ssize_t buildid_show(struct hyp_sysfs_attr *attr, char *buffer)
|
||||
|
||||
buildid->len = ret;
|
||||
ret = HYPERVISOR_xen_version(XENVER_build_id, buildid);
|
||||
- if (ret > 0)
|
||||
- ret = sprintf(buffer, "%s", buildid->buf);
|
||||
+ if (ret > 0) {
|
||||
+ /* Build id is binary, not a string. */
|
||||
+ memcpy(buffer, buildid->buf, ret);
|
||||
+ }
|
||||
kfree(buildid);
|
||||
|
||||
return ret;
|
||||
--
|
||||
2.50.1 (Apple Git-155)
|
||||
|
||||
@ -0,0 +1,80 @@
|
||||
From 75537b257b7125983cc0a54f0a2878d28677eb50 Mon Sep 17 00:00:00 2001
|
||||
From: "Ewan D. Milne" <emilne@redhat.com>
|
||||
Date: Mon, 18 May 2026 11:14:05 -0400
|
||||
Subject: [PATCH] nvme: nvme-fc: move tagset removal to nvme_fc_delete_ctrl()
|
||||
|
||||
JIRA: https://redhat.atlassian.net/browse/RHEL-171725
|
||||
Upstream Status: From upstream linux mainline
|
||||
|
||||
Now target is removed from nvme_fc_ctrl_free() which is the ctrl->ref
|
||||
release handler. And even admin queue is unquiesced there, this way
|
||||
is definitely wrong because the ctr->ref is grabbed when submitting
|
||||
command.
|
||||
|
||||
And Marco observed that nvme_fc_ctrl_free() can be called from request
|
||||
completion code path, and trigger kernel warning since request completes
|
||||
from softirq context.
|
||||
|
||||
Fix the issue by moveing target removal into nvme_fc_delete_ctrl(),
|
||||
which is also aligned with nvme-tcp and nvme-rdma.
|
||||
|
||||
Patch originally proposed by Ming Lei, then modified to move the tagset
|
||||
removal down to after nvme_fc_delete_association() after further testing.
|
||||
|
||||
Cc: Marco Patalano <mpatalan@redhat.com>
|
||||
Cc: Ewan Milne <emilne@redhat.com>
|
||||
Cc: James Smart <james.smart@broadcom.com>
|
||||
Cc: Sagi Grimberg <sagi@grimberg.me>
|
||||
Signed-off-by: Ming Lei <ming.lei@redhat.com>
|
||||
Cc: stable@vger.kernel.org
|
||||
Tested-by: Marco Patalano <mpatalan@redhat.com>
|
||||
Reviewed-by: Justin Tee <justin.tee@broadcom.com>
|
||||
Signed-off-by: Ewan D. Milne <emilne@redhat.com>
|
||||
Signed-off-by: Keith Busch <kbusch@kernel.org>
|
||||
(cherry picked from commit ea3442efabd0aa3930c5bab73c3901ef38ef6ac3)
|
||||
Signed-off-by: Ewan D. Milne <emilne@redhat.com>
|
||||
|
||||
diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
|
||||
index bd6cbe35dace..3e500e87e30c 100644
|
||||
--- a/drivers/nvme/host/fc.c
|
||||
+++ b/drivers/nvme/host/fc.c
|
||||
@@ -2354,17 +2354,11 @@ nvme_fc_ctrl_free(struct kref *ref)
|
||||
container_of(ref, struct nvme_fc_ctrl, ref);
|
||||
unsigned long flags;
|
||||
|
||||
- if (ctrl->ctrl.tagset)
|
||||
- nvme_remove_io_tag_set(&ctrl->ctrl);
|
||||
-
|
||||
/* remove from rport list */
|
||||
spin_lock_irqsave(&ctrl->rport->lock, flags);
|
||||
list_del(&ctrl->ctrl_list);
|
||||
spin_unlock_irqrestore(&ctrl->rport->lock, flags);
|
||||
|
||||
- nvme_unquiesce_admin_queue(&ctrl->ctrl);
|
||||
- nvme_remove_admin_tag_set(&ctrl->ctrl);
|
||||
-
|
||||
kfree(ctrl->queues);
|
||||
|
||||
put_device(ctrl->dev);
|
||||
@@ -3252,11 +3246,18 @@ nvme_fc_delete_ctrl(struct nvme_ctrl *nctrl)
|
||||
|
||||
cancel_work_sync(&ctrl->ioerr_work);
|
||||
cancel_delayed_work_sync(&ctrl->connect_work);
|
||||
+
|
||||
/*
|
||||
* kill the association on the link side. this will block
|
||||
* waiting for io to terminate
|
||||
*/
|
||||
nvme_fc_delete_association(ctrl);
|
||||
+
|
||||
+ if (ctrl->ctrl.tagset)
|
||||
+ nvme_remove_io_tag_set(&ctrl->ctrl);
|
||||
+
|
||||
+ nvme_unquiesce_admin_queue(&ctrl->ctrl);
|
||||
+ nvme_remove_admin_tag_set(&ctrl->ctrl);
|
||||
}
|
||||
|
||||
static void
|
||||
--
|
||||
2.50.1 (Apple Git-155)
|
||||
|
||||
@ -0,0 +1,93 @@
|
||||
From 08fc4018856d6e04b7517e1ea515f06e86a05128 Mon Sep 17 00:00:00 2001
|
||||
From: "Ewan D. Milne" <emilne@redhat.com>
|
||||
Date: Mon, 18 May 2026 11:19:16 -0400
|
||||
Subject: [PATCH] nvme: nvme-fc: Ensure ->ioerr_work is cancelled in
|
||||
nvme_fc_delete_ctrl()
|
||||
|
||||
JIRA: https://redhat.atlassian.net/browse/RHEL-171725
|
||||
Upstream Status: From upstream linux mainline
|
||||
|
||||
nvme_fc_delete_assocation() waits for pending I/O to complete before
|
||||
returning, and an error can cause ->ioerr_work to be queued after
|
||||
cancel_work_sync() had been called. Move the call to cancel_work_sync() to
|
||||
be after nvme_fc_delete_association() to ensure ->ioerr_work is not running
|
||||
when the nvme_fc_ctrl object is freed. Otherwise the following can occur:
|
||||
|
||||
[ 1135.911754] list_del corruption, ff2d24c8093f31f8->next is NULL
|
||||
[ 1135.917705] ------------[ cut here ]------------
|
||||
[ 1135.922336] kernel BUG at lib/list_debug.c:52!
|
||||
[ 1135.926784] Oops: invalid opcode: 0000 [#1] SMP NOPTI
|
||||
[ 1135.931851] CPU: 48 UID: 0 PID: 726 Comm: kworker/u449:23 Kdump: loaded Not tainted 6.12.0 #1 PREEMPT(voluntary)
|
||||
[ 1135.943490] Hardware name: Dell Inc. PowerEdge R660/0HGTK9, BIOS 2.5.4 01/16/2025
|
||||
[ 1135.950969] Workqueue: 0x0 (nvme-wq)
|
||||
[ 1135.954673] RIP: 0010:__list_del_entry_valid_or_report.cold+0xf/0x6f
|
||||
[ 1135.961041] Code: c7 c7 98 68 72 94 e8 26 45 fe ff 0f 0b 48 c7 c7 70 68 72 94 e8 18 45 fe ff 0f 0b 48 89 fe 48 c7 c7 80 69 72 94 e8 07 45 fe ff <0f> 0b 48 89 d1 48 c7 c7 a0 6a 72 94 48 89 c2 e8 f3 44 fe ff 0f 0b
|
||||
[ 1135.979788] RSP: 0018:ff579b19482d3e50 EFLAGS: 00010046
|
||||
[ 1135.985015] RAX: 0000000000000033 RBX: ff2d24c8093f31f0 RCX: 0000000000000000
|
||||
[ 1135.992148] RDX: 0000000000000000 RSI: ff2d24d6bfa1d0c0 RDI: ff2d24d6bfa1d0c0
|
||||
[ 1135.999278] RBP: ff2d24c8093f31f8 R08: 0000000000000000 R09: ffffffff951e2b08
|
||||
[ 1136.006413] R10: ffffffff95122ac8 R11: 0000000000000003 R12: ff2d24c78697c100
|
||||
[ 1136.013546] R13: fffffffffffffff8 R14: 0000000000000000 R15: ff2d24c78697c0c0
|
||||
[ 1136.020677] FS: 0000000000000000(0000) GS:ff2d24d6bfa00000(0000) knlGS:0000000000000000
|
||||
[ 1136.028765] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
|
||||
[ 1136.034510] CR2: 00007fd207f90b80 CR3: 000000163ea22003 CR4: 0000000000f73ef0
|
||||
[ 1136.041641] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
|
||||
[ 1136.048776] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
|
||||
[ 1136.055910] PKRU: 55555554
|
||||
[ 1136.058623] Call Trace:
|
||||
[ 1136.061074] <TASK>
|
||||
[ 1136.063179] ? show_trace_log_lvl+0x1b0/0x2f0
|
||||
[ 1136.067540] ? show_trace_log_lvl+0x1b0/0x2f0
|
||||
[ 1136.071898] ? move_linked_works+0x4a/0xa0
|
||||
[ 1136.075998] ? __list_del_entry_valid_or_report.cold+0xf/0x6f
|
||||
[ 1136.081744] ? __die_body.cold+0x8/0x12
|
||||
[ 1136.085584] ? die+0x2e/0x50
|
||||
[ 1136.088469] ? do_trap+0xca/0x110
|
||||
[ 1136.091789] ? do_error_trap+0x65/0x80
|
||||
[ 1136.095543] ? __list_del_entry_valid_or_report.cold+0xf/0x6f
|
||||
[ 1136.101289] ? exc_invalid_op+0x50/0x70
|
||||
[ 1136.105127] ? __list_del_entry_valid_or_report.cold+0xf/0x6f
|
||||
[ 1136.110874] ? asm_exc_invalid_op+0x1a/0x20
|
||||
[ 1136.115059] ? __list_del_entry_valid_or_report.cold+0xf/0x6f
|
||||
[ 1136.120806] move_linked_works+0x4a/0xa0
|
||||
[ 1136.124733] worker_thread+0x216/0x3a0
|
||||
[ 1136.128485] ? __pfx_worker_thread+0x10/0x10
|
||||
[ 1136.132758] kthread+0xfa/0x240
|
||||
[ 1136.135904] ? __pfx_kthread+0x10/0x10
|
||||
[ 1136.139657] ret_from_fork+0x31/0x50
|
||||
[ 1136.143236] ? __pfx_kthread+0x10/0x10
|
||||
[ 1136.146988] ret_from_fork_asm+0x1a/0x30
|
||||
[ 1136.150915] </TASK>
|
||||
|
||||
Fixes: 19fce0470f05 ("nvme-fc: avoid calling _nvme_fc_abort_outstanding_ios from interrupt context")
|
||||
Cc: stable@vger.kernel.org
|
||||
Tested-by: Marco Patalano <mpatalan@redhat.com>
|
||||
Reviewed-by: Justin Tee <justin.tee@broadcom.com>
|
||||
Signed-off-by: Ewan D. Milne <emilne@redhat.com>
|
||||
Signed-off-by: Keith Busch <kbusch@kernel.org>
|
||||
(cherry picked from commit 0a2c5495b6d1ecb0fa18ef6631450f391a888256)
|
||||
Signed-off-by: Ewan D. Milne <emilne@redhat.com>
|
||||
|
||||
diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
|
||||
index 3e500e87e30c..6cc7d11ad5c0 100644
|
||||
--- a/drivers/nvme/host/fc.c
|
||||
+++ b/drivers/nvme/host/fc.c
|
||||
@@ -3244,7 +3244,6 @@ nvme_fc_delete_ctrl(struct nvme_ctrl *nctrl)
|
||||
{
|
||||
struct nvme_fc_ctrl *ctrl = to_fc_ctrl(nctrl);
|
||||
|
||||
- cancel_work_sync(&ctrl->ioerr_work);
|
||||
cancel_delayed_work_sync(&ctrl->connect_work);
|
||||
|
||||
/*
|
||||
@@ -3252,6 +3251,7 @@ nvme_fc_delete_ctrl(struct nvme_ctrl *nctrl)
|
||||
* waiting for io to terminate
|
||||
*/
|
||||
nvme_fc_delete_association(ctrl);
|
||||
+ cancel_work_sync(&ctrl->ioerr_work);
|
||||
|
||||
if (ctrl->ctrl.tagset)
|
||||
nvme_remove_io_tag_set(&ctrl->ctrl);
|
||||
--
|
||||
2.50.1 (Apple Git-155)
|
||||
|
||||
@ -0,0 +1,58 @@
|
||||
From e88ced2e3c3091122785c0a2dd822b61d1839d58 Mon Sep 17 00:00:00 2001
|
||||
From: Mete Durlu <mdurlu@redhat.com>
|
||||
Date: Fri, 27 Mar 2026 13:14:31 +0100
|
||||
Subject: [PATCH] s390/dasd: Fix gendisk parent after copy pair swap
|
||||
|
||||
JIRA: https://issues.redhat.com/browse/RHEL-161530
|
||||
|
||||
commit c943bfc6afb8d0e781b9b7406f36caa8bbf95cb9
|
||||
Author: Stefan Haberland <sth@linux.ibm.com>
|
||||
Date: Wed Nov 26 17:06:31 2025 +0100
|
||||
|
||||
s390/dasd: Fix gendisk parent after copy pair swap
|
||||
|
||||
After a copy pair swap the block device's "device" symlink points to
|
||||
the secondary CCW device, but the gendisk's parent remained the
|
||||
primary, leaving /sys/block/<dasdx> under the wrong parent.
|
||||
|
||||
Move the gendisk to the secondary's device with device_move(), keeping
|
||||
the sysfs topology consistent after the swap.
|
||||
|
||||
Fixes: 413862caad6f ("s390/dasd: add copy pair swap capability")
|
||||
Cc: stable@vger.kernel.org #6.1
|
||||
Reviewed-by: Jan Hoeppner <hoeppner@linux.ibm.com>
|
||||
Signed-off-by: Stefan Haberland <sth@linux.ibm.com>
|
||||
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
||||
|
||||
Signed-off-by: Stefan Haberland <sth@linux.ibm.com>
|
||||
Signed-off-by: Mete Durlu <mdurlu@redhat.com>
|
||||
|
||||
diff --git a/drivers/s390/block/dasd_eckd.c b/drivers/s390/block/dasd_eckd.c
|
||||
index c60424afaf04..7e8679d7c686 100644
|
||||
--- a/drivers/s390/block/dasd_eckd.c
|
||||
+++ b/drivers/s390/block/dasd_eckd.c
|
||||
@@ -6148,6 +6148,7 @@ static int dasd_eckd_copy_pair_swap(struct dasd_device *device, char *prim_busid
|
||||
struct dasd_copy_relation *copy;
|
||||
struct dasd_block *block;
|
||||
struct gendisk *gdp;
|
||||
+ int rc;
|
||||
|
||||
copy = device->copy;
|
||||
if (!copy)
|
||||
@@ -6182,6 +6183,13 @@ static int dasd_eckd_copy_pair_swap(struct dasd_device *device, char *prim_busid
|
||||
/* swap blocklayer device link */
|
||||
gdp = block->gdp;
|
||||
dasd_add_link_to_gendisk(gdp, secondary);
|
||||
+ rc = device_move(disk_to_dev(gdp), &secondary->cdev->dev, DPM_ORDER_NONE);
|
||||
+ if (rc) {
|
||||
+ dev_err(&primary->cdev->dev,
|
||||
+ "copy_pair_swap: moving blockdevice parent %s->%s failed (%d)\n",
|
||||
+ dev_name(&primary->cdev->dev),
|
||||
+ dev_name(&secondary->cdev->dev), rc);
|
||||
+ }
|
||||
|
||||
/* re-enable device */
|
||||
dasd_device_remove_stop_bits(primary, DASD_STOPPED_PPRC);
|
||||
--
|
||||
2.50.1 (Apple Git-155)
|
||||
|
||||
@ -0,0 +1,54 @@
|
||||
From 02a659e928f9ef15fc673384e95def0b088c9684 Mon Sep 17 00:00:00 2001
|
||||
From: Mete Durlu <mdurlu@redhat.com>
|
||||
Date: Fri, 27 Mar 2026 13:14:33 +0100
|
||||
Subject: [PATCH] s390/dasd: Move quiesce state with pprc swap
|
||||
|
||||
JIRA: https://issues.redhat.com/browse/RHEL-161530
|
||||
|
||||
commit 40e9cd4ae8ec43b107ed2bff422a8fa39dcf4e4b
|
||||
Author: Stefan Haberland <sth@linux.ibm.com>
|
||||
Date: Tue Mar 10 15:23:29 2026 +0100
|
||||
|
||||
s390/dasd: Move quiesce state with pprc swap
|
||||
|
||||
Quiesce and resume is a mechanism to suspend operations on DASD devices.
|
||||
In the context of a controlled copy pair swap operation, the quiesce
|
||||
operation is usually issued before the actual swap and a resume
|
||||
afterwards.
|
||||
|
||||
During the swap operation, the underlying device is exchanged. Therefore,
|
||||
the quiesce flag must be moved to the secondary device to ensure a
|
||||
consistent quiesce state after the swap.
|
||||
|
||||
The secondary device itself cannot be suspended separately because there
|
||||
is no separate block device representation for it.
|
||||
|
||||
Fixes: 413862caad6f ("s390/dasd: add copy pair swap capability")
|
||||
Cc: stable@vger.kernel.org #6.1
|
||||
Reviewed-by: Jan Hoeppner <hoeppner@linux.ibm.com>
|
||||
Signed-off-by: Stefan Haberland <sth@linux.ibm.com>
|
||||
Link: https://patch.msgid.link/20260310142330.4080106-2-sth@linux.ibm.com
|
||||
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
||||
|
||||
Signed-off-by: Stefan Haberland <sth@linux.ibm.com>
|
||||
Signed-off-by: Mete Durlu <mdurlu@redhat.com>
|
||||
|
||||
diff --git a/drivers/s390/block/dasd_eckd.c b/drivers/s390/block/dasd_eckd.c
|
||||
index 7e8679d7c686..f53791c9cbe7 100644
|
||||
--- a/drivers/s390/block/dasd_eckd.c
|
||||
+++ b/drivers/s390/block/dasd_eckd.c
|
||||
@@ -6191,6 +6191,11 @@ static int dasd_eckd_copy_pair_swap(struct dasd_device *device, char *prim_busid
|
||||
dev_name(&secondary->cdev->dev), rc);
|
||||
}
|
||||
|
||||
+ if (primary->stopped & DASD_STOPPED_QUIESCE) {
|
||||
+ dasd_device_set_stop_bits(secondary, DASD_STOPPED_QUIESCE);
|
||||
+ dasd_device_remove_stop_bits(primary, DASD_STOPPED_QUIESCE);
|
||||
+ }
|
||||
+
|
||||
/* re-enable device */
|
||||
dasd_device_remove_stop_bits(primary, DASD_STOPPED_PPRC);
|
||||
dasd_device_remove_stop_bits(secondary, DASD_STOPPED_PPRC);
|
||||
--
|
||||
2.50.1 (Apple Git-155)
|
||||
|
||||
@ -0,0 +1,83 @@
|
||||
From 85945142a2a0c5d6a104b9d86eab6648a023765d Mon Sep 17 00:00:00 2001
|
||||
From: Mete Durlu <mdurlu@redhat.com>
|
||||
Date: Fri, 27 Mar 2026 13:14:35 +0100
|
||||
Subject: [PATCH] s390/dasd: Copy detected format information to secondary
|
||||
device
|
||||
|
||||
JIRA: https://issues.redhat.com/browse/RHEL-161530
|
||||
|
||||
commit 4c527c7e030672efd788d0806d7a68972a7ba3c1
|
||||
Author: Stefan Haberland <sth@linux.ibm.com>
|
||||
Date: Tue Mar 10 15:23:30 2026 +0100
|
||||
|
||||
s390/dasd: Copy detected format information to secondary device
|
||||
|
||||
During online processing for a DASD device an IO operation is started to
|
||||
determine the format of the device. CDL format contains specifically
|
||||
sized blocks at the beginning of the disk.
|
||||
|
||||
For a PPRC secondary device no real IO operation is possible therefore
|
||||
this IO request can not be started and this step is skipped for online
|
||||
processing of secondary devices. This is generally fine since the
|
||||
secondary is a copy of the primary device.
|
||||
|
||||
In case of an additional partition detection that is run after a swap
|
||||
operation the format information is needed to properly drive partition
|
||||
detection IO.
|
||||
|
||||
Currently the information is not passed leading to IO errors during
|
||||
partition detection and a wrongly detected partition table which in turn
|
||||
might lead to data corruption on the disk with the wrong partition table.
|
||||
|
||||
Fix by passing the format information from primary to secondary device.
|
||||
|
||||
Fixes: 413862caad6f ("s390/dasd: add copy pair swap capability")
|
||||
Cc: stable@vger.kernel.org #6.1
|
||||
Reviewed-by: Jan Hoeppner <hoeppner@linux.ibm.com>
|
||||
Acked-by: Eduard Shishkin <edward6@linux.ibm.com>
|
||||
Signed-off-by: Stefan Haberland <sth@linux.ibm.com>
|
||||
Link: https://patch.msgid.link/20260310142330.4080106-3-sth@linux.ibm.com
|
||||
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
||||
|
||||
Signed-off-by: Stefan Haberland <sth@linux.ibm.com>
|
||||
Signed-off-by: Mete Durlu <mdurlu@redhat.com>
|
||||
|
||||
diff --git a/drivers/s390/block/dasd_eckd.c b/drivers/s390/block/dasd_eckd.c
|
||||
index f53791c9cbe7..54d6d29477e4 100644
|
||||
--- a/drivers/s390/block/dasd_eckd.c
|
||||
+++ b/drivers/s390/block/dasd_eckd.c
|
||||
@@ -6144,6 +6144,7 @@ static void copy_pair_set_active(struct dasd_copy_relation *copy, char *new_busi
|
||||
static int dasd_eckd_copy_pair_swap(struct dasd_device *device, char *prim_busid,
|
||||
char *sec_busid)
|
||||
{
|
||||
+ struct dasd_eckd_private *prim_priv, *sec_priv;
|
||||
struct dasd_device *primary, *secondary;
|
||||
struct dasd_copy_relation *copy;
|
||||
struct dasd_block *block;
|
||||
@@ -6164,6 +6165,9 @@ static int dasd_eckd_copy_pair_swap(struct dasd_device *device, char *prim_busid
|
||||
if (!secondary)
|
||||
return DASD_COPYPAIRSWAP_SECONDARY;
|
||||
|
||||
+ prim_priv = primary->private;
|
||||
+ sec_priv = secondary->private;
|
||||
+
|
||||
/*
|
||||
* usually the device should be quiesced for swap
|
||||
* for paranoia stop device and requeue requests again
|
||||
@@ -6196,6 +6200,13 @@ static int dasd_eckd_copy_pair_swap(struct dasd_device *device, char *prim_busid
|
||||
dasd_device_remove_stop_bits(primary, DASD_STOPPED_QUIESCE);
|
||||
}
|
||||
|
||||
+ /*
|
||||
+ * The secondary device never got through format detection, but since it
|
||||
+ * is a copy of the primary device, the format is exactly the same;
|
||||
+ * therefore, the detected layout can simply be copied.
|
||||
+ */
|
||||
+ sec_priv->uses_cdl = prim_priv->uses_cdl;
|
||||
+
|
||||
/* re-enable device */
|
||||
dasd_device_remove_stop_bits(primary, DASD_STOPPED_PPRC);
|
||||
dasd_device_remove_stop_bits(secondary, DASD_STOPPED_PPRC);
|
||||
--
|
||||
2.50.1 (Apple Git-155)
|
||||
|
||||
@ -176,13 +176,13 @@ Summary: The Linux kernel
|
||||
# define buildid .local
|
||||
%define specversion 5.14.0
|
||||
%define patchversion 5.14
|
||||
%define pkgrelease 687.5.1
|
||||
%define pkgrelease 687.13.1
|
||||
%define kversion 5
|
||||
%define tarfile_release 5.14.0-687.5.1.el9_8
|
||||
# This is needed to do merge window version magic
|
||||
%define patchlevel 14
|
||||
# This allows pkg_release to have configurable %%{?dist} tag
|
||||
%define specrelease 687.12.1%{?buildid}%{?dist}
|
||||
%define specrelease 687.13.1%{?buildid}%{?dist}
|
||||
# This defines the kabi tarball version
|
||||
%define kabiversion 5.14.0-687.5.1.el9_8
|
||||
|
||||
@ -1130,6 +1130,23 @@ Patch1251: 1251-netfilter-xt-tcpmss-check-remaining-length-before-reading-op.pat
|
||||
Patch1252: 1252-dm-thin-fix-metadata-refcount-underflow.patch
|
||||
Patch11111: ppc64le-kvm-support.patch
|
||||
|
||||
Patch1253: 1253-mm-document-gfp-nofail-must-be-blockable.patch
|
||||
Patch1254: 1254-mm-warn-about-illegal-gfp-nofail-usage-in-a-more-appropriate.patch
|
||||
Patch1255: 1255-mm-page-alloc-c-avoid-infinite-retries-caused-by-cpuset-race.patch
|
||||
Patch1256: 1256-mm-page-alloc-thp-prevent-reclaim-for-gfp-thisnode-thp-alloc.patch
|
||||
Patch1257: 1257-mm-page-alloc-ignore-the-exact-initial-compaction-result.patch
|
||||
Patch1258: 1258-mm-page-alloc-refactor-the-initial-compaction-handling.patch
|
||||
Patch1259: 1259-mm-page-alloc-simplify-alloc-pages-slowpath-flow.patch
|
||||
Patch1260: 1260-mm-page-alloc-add-vm-thp-thisnode-reclaim-sysctl-to-allow-th.patch
|
||||
Patch1261: 1261-smb-client-fix-oob-reads-parsing-symlink-error-response.patch
|
||||
Patch1262: 1262-crypto-authenc-fix-sleep-in-atomic-context-in-decrypt-tail.patch
|
||||
Patch1263: 1263-crypto-authenc-correctly-pass-einprogress-back-up-to-the-cal.patch
|
||||
Patch1264: 1264-buffer-overflow-in-drivers-xen-sys-hypervisor-c.patch
|
||||
Patch1265: 1265-nvme-nvme-fc-move-tagset-removal-to-nvme-fc-delete-ctrl.patch
|
||||
Patch1266: 1266-nvme-nvme-fc-ensure-ioerr-work-is-cancelled-in-nvme-fc-delet.patch
|
||||
Patch1267: 1267-s390-dasd-fix-gendisk-parent-after-copy-pair-swap.patch
|
||||
Patch1268: 1268-s390-dasd-move-quiesce-state-with-pprc-swap.patch
|
||||
Patch1269: 1269-s390-dasd-copy-detected-format-information-to-secondary-devi.patch
|
||||
# END OF PATCH DEFINITIONS
|
||||
|
||||
%description
|
||||
@ -2027,6 +2044,23 @@ ApplyPatch 1249-bluetooth-sco-fix-race-conditions-in-sco-sock-connect.patch
|
||||
ApplyPatch 1250-wifi-brcmfmac-validate-bsscfg-indices-in-if-events.patch
|
||||
ApplyPatch 1251-netfilter-xt-tcpmss-check-remaining-length-before-reading-op.patch
|
||||
ApplyPatch 1252-dm-thin-fix-metadata-refcount-underflow.patch
|
||||
ApplyPatch 1253-mm-document-gfp-nofail-must-be-blockable.patch
|
||||
ApplyPatch 1254-mm-warn-about-illegal-gfp-nofail-usage-in-a-more-appropriate.patch
|
||||
ApplyPatch 1255-mm-page-alloc-c-avoid-infinite-retries-caused-by-cpuset-race.patch
|
||||
ApplyPatch 1256-mm-page-alloc-thp-prevent-reclaim-for-gfp-thisnode-thp-alloc.patch
|
||||
ApplyPatch 1257-mm-page-alloc-ignore-the-exact-initial-compaction-result.patch
|
||||
ApplyPatch 1258-mm-page-alloc-refactor-the-initial-compaction-handling.patch
|
||||
ApplyPatch 1259-mm-page-alloc-simplify-alloc-pages-slowpath-flow.patch
|
||||
ApplyPatch 1260-mm-page-alloc-add-vm-thp-thisnode-reclaim-sysctl-to-allow-th.patch
|
||||
ApplyPatch 1261-smb-client-fix-oob-reads-parsing-symlink-error-response.patch
|
||||
ApplyPatch 1262-crypto-authenc-fix-sleep-in-atomic-context-in-decrypt-tail.patch
|
||||
ApplyPatch 1263-crypto-authenc-correctly-pass-einprogress-back-up-to-the-cal.patch
|
||||
ApplyPatch 1264-buffer-overflow-in-drivers-xen-sys-hypervisor-c.patch
|
||||
ApplyPatch 1265-nvme-nvme-fc-move-tagset-removal-to-nvme-fc-delete-ctrl.patch
|
||||
ApplyPatch 1266-nvme-nvme-fc-ensure-ioerr-work-is-cancelled-in-nvme-fc-delet.patch
|
||||
ApplyPatch 1267-s390-dasd-fix-gendisk-parent-after-copy-pair-swap.patch
|
||||
ApplyPatch 1268-s390-dasd-move-quiesce-state-with-pprc-swap.patch
|
||||
ApplyPatch 1269-s390-dasd-copy-detected-format-information-to-secondary-devi.patch
|
||||
# END OF PATCH APPLICATIONS
|
||||
|
||||
# Any further pre-build tree manipulations happen here.
|
||||
@ -4101,6 +4135,30 @@ fi
|
||||
#
|
||||
#
|
||||
%changelog
|
||||
* Wed Jun 11 2026 Andrew Lukoshko <alukoshko@almalinux.org> - 5.14.0-687.13.1
|
||||
- Recreate RHEL 5.14.0-687.13.1 from CentOS Stream 9 and upstream stable backports (1253-1269)
|
||||
- RHEL changelog for 687.13.1 follows:
|
||||
|
||||
* Tue Jun 02 2026 CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com> [5.14.0-687.13.1.el9_8]
|
||||
- smb: client: reject userspace cifs.spnego descriptions (Paulo Alcantara) [RHEL-178944] {CVE-2026-46243}
|
||||
- s390/dasd: Copy detected format information to secondary device (Ramesh Chhetri) [RHEL-176472]
|
||||
- s390/dasd: Move quiesce state with pprc swap (Ramesh Chhetri) [RHEL-176472]
|
||||
- s390/dasd: Fix gendisk parent after copy pair swap (Ramesh Chhetri) [RHEL-176472]
|
||||
- nvme: nvme-fc: Ensure ->ioerr_work is cancelled in nvme_fc_delete_ctrl() (Ewan D. Milne) [RHEL-171745]
|
||||
- nvme: nvme-fc: move tagset removal to nvme_fc_delete_ctrl() (Ewan D. Milne) [RHEL-171745]
|
||||
- Buffer overflow in drivers/xen/sys-hypervisor.c (Vitaly Kuznetsov) [RHEL-172510] {CVE-2026-31786}
|
||||
- crypto: authenc - Correctly pass EINPROGRESS back up to the caller (Vladislav Dronov) [RHEL-172167]
|
||||
- crypto: authenc - Fix sleep in atomic context in decrypt_tail (Vladislav Dronov) [RHEL-172167]
|
||||
- smb: client: fix OOB reads parsing symlink error response (CKI Backport Bot) [RHEL-171471] {CVE-2026-31613}
|
||||
- mm/page_alloc: add vm.thp_thisnode_reclaim sysctl to allow THP reclaim on local node (Nico Pache) [RHEL-164778]
|
||||
- mm/page_alloc: simplify __alloc_pages_slowpath() flow (Nico Pache) [RHEL-164778]
|
||||
- mm/page_alloc: refactor the initial compaction handling (Nico Pache) [RHEL-164778]
|
||||
- mm/page_alloc: ignore the exact initial compaction result (Nico Pache) [RHEL-164778]
|
||||
- mm, page_alloc, thp: prevent reclaim for __GFP_THISNODE THP allocations (Nico Pache) [RHEL-164778]
|
||||
- mm/page_alloc.c: avoid infinite retries caused by cpuset race (Nico Pache) [RHEL-164778]
|
||||
- mm: warn about illegal __GFP_NOFAIL usage in a more appropriate location and manner (Nico Pache) [RHEL-164778]
|
||||
- mm: document __GFP_NOFAIL must be blockable (Nico Pache) [RHEL-164778]
|
||||
|
||||
* Sun Jun 07 2026 Andrew Lukoshko <alukoshko@almalinux.org> - 5.14.0-687.12.1
|
||||
- Recreate RHEL 5.14.0-687.12.1 from CentOS Stream 9 and upstream stable
|
||||
backports (SOURCES/1198-1252)
|
||||
|
||||
Loading…
Reference in New Issue
Block a user