From 60fd7c77801bf5b28af6808fe712b04f33697af6 Mon Sep 17 00:00:00 2001 From: Andrew Lukoshko Date: Wed, 10 Jun 2026 22:43:20 +0000 Subject: [PATCH] Recreate RHEL 5.14.0-687.13.1 from CS9/upstream backports Add the RHEL 687.13.1 backports (1253-1269) from centos-stream-9 and upstream stable, on top of 687.12.1. RHEL now ships the smb cifs.spnego fix (CVE-2026-46243) too. Bump pkgrelease and specrelease to 687.13.1. --- ...ocument-gfp-nofail-must-be-blockable.patch | 105 +++++++++ ...p-nofail-usage-in-a-more-appropriate.patch | 169 ++++++++++++++ ...finite-retries-caused-by-cpuset-race.patch | 85 +++++++ ...t-reclaim-for-gfp-thisnode-thp-alloc.patch | 96 ++++++++ ...-the-exact-initial-compaction-result.patch | 122 ++++++++++ ...ctor-the-initial-compaction-handling.patch | 208 +++++++++++++++++ ...c-simplify-alloc-pages-slowpath-flow.patch | 138 ++++++++++++ ...-thisnode-reclaim-sysctl-to-allow-th.patch | 85 +++++++ ...reads-parsing-symlink-error-response.patch | 123 ++++++++++ ...ep-in-atomic-context-in-decrypt-tail.patch | 46 ++++ ...-pass-einprogress-back-up-to-the-cal.patch | 213 ++++++++++++++++++ ...flow-in-drivers-xen-sys-hypervisor-c.patch | 60 +++++ ...agset-removal-to-nvme-fc-delete-ctrl.patch | 80 +++++++ ...r-work-is-cancelled-in-nvme-fc-delet.patch | 93 ++++++++ ...-gendisk-parent-after-copy-pair-swap.patch | 58 +++++ ...sd-move-quiesce-state-with-pprc-swap.patch | 54 +++++ ...format-information-to-secondary-devi.patch | 83 +++++++ SPECS/kernel.spec | 62 ++++- 18 files changed, 1878 insertions(+), 2 deletions(-) create mode 100644 SOURCES/1253-mm-document-gfp-nofail-must-be-blockable.patch create mode 100644 SOURCES/1254-mm-warn-about-illegal-gfp-nofail-usage-in-a-more-appropriate.patch create mode 100644 SOURCES/1255-mm-page-alloc-c-avoid-infinite-retries-caused-by-cpuset-race.patch create mode 100644 SOURCES/1256-mm-page-alloc-thp-prevent-reclaim-for-gfp-thisnode-thp-alloc.patch create mode 100644 SOURCES/1257-mm-page-alloc-ignore-the-exact-initial-compaction-result.patch create mode 100644 SOURCES/1258-mm-page-alloc-refactor-the-initial-compaction-handling.patch create mode 100644 SOURCES/1259-mm-page-alloc-simplify-alloc-pages-slowpath-flow.patch create mode 100644 SOURCES/1260-mm-page-alloc-add-vm-thp-thisnode-reclaim-sysctl-to-allow-th.patch create mode 100644 SOURCES/1261-smb-client-fix-oob-reads-parsing-symlink-error-response.patch create mode 100644 SOURCES/1262-crypto-authenc-fix-sleep-in-atomic-context-in-decrypt-tail.patch create mode 100644 SOURCES/1263-crypto-authenc-correctly-pass-einprogress-back-up-to-the-cal.patch create mode 100644 SOURCES/1264-buffer-overflow-in-drivers-xen-sys-hypervisor-c.patch create mode 100644 SOURCES/1265-nvme-nvme-fc-move-tagset-removal-to-nvme-fc-delete-ctrl.patch create mode 100644 SOURCES/1266-nvme-nvme-fc-ensure-ioerr-work-is-cancelled-in-nvme-fc-delet.patch create mode 100644 SOURCES/1267-s390-dasd-fix-gendisk-parent-after-copy-pair-swap.patch create mode 100644 SOURCES/1268-s390-dasd-move-quiesce-state-with-pprc-swap.patch create mode 100644 SOURCES/1269-s390-dasd-copy-detected-format-information-to-secondary-devi.patch diff --git a/SOURCES/1253-mm-document-gfp-nofail-must-be-blockable.patch b/SOURCES/1253-mm-document-gfp-nofail-must-be-blockable.patch new file mode 100644 index 000000000..5bc8b3f1c --- /dev/null +++ b/SOURCES/1253-mm-document-gfp-nofail-must-be-blockable.patch @@ -0,0 +1,105 @@ +From dbb0b8ec49fcd597c406f3b17f28b588e96cfa14 Mon Sep 17 00:00:00 2001 +From: Nico Pache +Date: Sat, 4 Apr 2026 19:30:21 -0600 +Subject: [PATCH] mm: document __GFP_NOFAIL must be blockable +MIME-Version: 1.0 +Content-Type: text/plain; charset=UTF-8 +Content-Transfer-Encoding: 8bit + +commit 17d75422604f0b92869aa17cb44f60958212f033 +Author: Barry Song +Date: Sat Aug 31 08:28:22 2024 +1200 + + mm: document __GFP_NOFAIL must be blockable + + Non-blocking allocation with __GFP_NOFAIL is not supported and may still + result in NULL pointers (if we don't return NULL, we result in busy-loop + within non-sleepable contexts): + + static inline struct page * + __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, + struct alloc_context *ac) + { + ... + /* + * Make sure that __GFP_NOFAIL request doesn't leak out and make sure + * we always retry + */ + if (gfp_mask & __GFP_NOFAIL) { + /* + * All existing users of the __GFP_NOFAIL are blockable, so warn + * of any new users that actually require GFP_NOWAIT + */ + if (WARN_ON_ONCE_GFP(!can_direct_reclaim, gfp_mask)) + goto fail; + ... + } + ... + fail: + warn_alloc(gfp_mask, ac->nodemask, + "page allocation failure: order:%u", order); + got_pg: + return page; + } + + Highlight this in the documentation of __GFP_NOFAIL so that non-mm + subsystems can reject any illegal usage of __GFP_NOFAIL with GFP_ATOMIC, + GFP_NOWAIT, etc. + + Link: https://lkml.kernel.org/r/20240830202823.21478-3-21cnbao@gmail.com + Signed-off-by: Barry Song + Acked-by: Michal Hocko + Reviewed-by: Christoph Hellwig + Acked-by: Vlastimil Babka + Acked-by: Davidlohr Bueso + Acked-by: David Hildenbrand + Cc: Christoph Lameter + Cc: David Rientjes + Cc: "Eugenio Pérez" + Cc: Hailong.Liu + Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com> + Cc: Jason Wang + Cc: Joonsoo Kim + Cc: Kees Cook + Cc: Linus Torvalds + Cc: Lorenzo Stoakes + Cc: Maxime Coquelin + Cc: "Michael S. Tsirkin" + Cc: Pekka Enberg + Cc: Roman Gushchin + Cc: Uladzislau Rezki (Sony) + Cc: Xuan Zhuo + Cc: Christoph Hellwig + Cc: Xie Yongji + Cc: Yafang Shao + Signed-off-by: Andrew Morton + +JIRA: https://redhat.atlassian.net/browse/RHEL-148561 +Signed-off-by: Nico Pache + +diff --git a/include/linux/gfp_types.h b/include/linux/gfp_types.h +index 6583a58670c5..373d3871f61e 100644 +--- a/include/linux/gfp_types.h ++++ b/include/linux/gfp_types.h +@@ -168,7 +168,8 @@ typedef unsigned int __bitwise gfp_t; + * the caller still has to check for failures) while costly requests try to be + * not disruptive and back off even without invoking the OOM killer. + * The following three modifiers might be used to override some of these +- * implicit rules ++ * implicit rules. Please note that all of them must be used along with ++ * %__GFP_DIRECT_RECLAIM flag. + * + * %__GFP_NORETRY: The VM implementation will try only very lightweight + * memory direct reclaim to get some memory under memory pressure (thus +@@ -199,6 +200,8 @@ typedef unsigned int __bitwise gfp_t; + * cannot handle allocation failures. The allocation could block + * indefinitely but will never return with failure. Testing for + * failure is pointless. ++ * It _must_ be blockable and used together with __GFP_DIRECT_RECLAIM. ++ * It should _never_ be used in non-sleepable contexts. + * New users should be evaluated carefully (and the flag should be + * used only when there is no reasonable failure policy) but it is + * definitely preferable to use the flag rather than opencode endless +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1254-mm-warn-about-illegal-gfp-nofail-usage-in-a-more-appropriate.patch b/SOURCES/1254-mm-warn-about-illegal-gfp-nofail-usage-in-a-more-appropriate.patch new file mode 100644 index 000000000..3afe64b82 --- /dev/null +++ b/SOURCES/1254-mm-warn-about-illegal-gfp-nofail-usage-in-a-more-appropriate.patch @@ -0,0 +1,169 @@ +From e7842dda471d377ae8c6aaf9ddb4a73159f505b4 Mon Sep 17 00:00:00 2001 +From: Nico Pache +Date: Sat, 4 Apr 2026 19:30:21 -0600 +Subject: [PATCH] mm: warn about illegal __GFP_NOFAIL usage in a more + appropriate location and manner +MIME-Version: 1.0 +Content-Type: text/plain; charset=UTF-8 +Content-Transfer-Encoding: 8bit + +commit 903edea6c53f097f5f0c847fdbbfab0c6c44f241 +Author: Barry Song +Date: Sat Aug 31 08:28:23 2024 +1200 + + mm: warn about illegal __GFP_NOFAIL usage in a more appropriate location and manner + + Three points for this change: + + 1. We should consolidate all warnings in one place. Currently, the + order > 1 warning is in the hotpath, while others are in less + likely scenarios. Moving all warnings to the slowpath will reduce + the overhead for order > 1 and increase the visibility of other + warnings. + + 2. We currently have two warnings for order: one for order > 1 in + the hotpath and another for order > costly_order in the laziest + path. I suggest standardizing on order > 1 since it's been in + use for a long time. + + 3. We don't need to check for __GFP_NOWARN in this case. __GFP_NOWARN + is meant to suppress allocation failure reports, but here we're + dealing with bug detection, not allocation failures. So replace + WARN_ON_ONCE_GFP by WARN_ON_ONCE. + + [v-songbaohua@oppo.com: also update the doc for __GFP_NOFAIL with order > 1] + Link: https://lkml.kernel.org/r/20240903223935.1697-1-21cnbao@gmail.com + Link: https://lkml.kernel.org/r/20240830202823.21478-4-21cnbao@gmail.com + Signed-off-by: Barry Song + Suggested-by: Vlastimil Babka + Reviewed-by: Vlastimil Babka + Acked-by: David Hildenbrand + Acked-by: Michal Hocko + Cc: Christoph Hellwig + Cc: Christoph Lameter + Cc: Davidlohr Bueso + Cc: David Rientjes + Cc: "Eugenio Pérez" + Cc: Hailong.Liu + Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com> + Cc: Jason Wang + Cc: Joonsoo Kim + Cc: Kees Cook + Cc: Linus Torvalds + Cc: Lorenzo Stoakes + Cc: Maxime Coquelin + Cc: "Michael S. Tsirkin" + Cc: Pekka Enberg + Cc: Roman Gushchin + Cc: Uladzislau Rezki (Sony) + Cc: Xie Yongji + Cc: Xuan Zhuo + Cc: Yafang Shao + Signed-off-by: Andrew Morton + +JIRA: https://redhat.atlassian.net/browse/RHEL-148561 +Signed-off-by: Nico Pache + +diff --git a/include/linux/gfp_types.h b/include/linux/gfp_types.h +index 373d3871f61e..359ed69b14d9 100644 +--- a/include/linux/gfp_types.h ++++ b/include/linux/gfp_types.h +@@ -206,7 +206,8 @@ typedef unsigned int __bitwise gfp_t; + * used only when there is no reasonable failure policy) but it is + * definitely preferable to use the flag rather than opencode endless + * loop around allocator. +- * Using this flag for costly allocations is _highly_ discouraged. ++ * Allocating pages from the buddy with __GFP_NOFAIL and order > 1 is ++ * not supported. Please consider using kvmalloc() instead. + */ + #define __GFP_IO ((__force gfp_t)___GFP_IO) + #define __GFP_FS ((__force gfp_t)___GFP_FS) +diff --git a/mm/page_alloc.c b/mm/page_alloc.c +index 4e8615398f07..28cb88a3a758 100644 +--- a/mm/page_alloc.c ++++ b/mm/page_alloc.c +@@ -2914,12 +2914,6 @@ struct page *rmqueue(struct zone *preferred_zone, + { + struct page *page; + +- /* +- * We most definitely don't want callers attempting to +- * allocate greater than order-1 page units with __GFP_NOFAIL. +- */ +- WARN_ON_ONCE((gfp_flags & __GFP_NOFAIL) && (order > 1)); +- + if (likely(pcp_allowed_order(order))) { + page = rmqueue_pcplist(preferred_zone, zone, order, + migratetype, alloc_flags); +@@ -4062,6 +4056,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, + { + bool can_direct_reclaim = gfp_mask & __GFP_DIRECT_RECLAIM; + bool can_compact = gfp_compaction_allowed(gfp_mask); ++ bool nofail = gfp_mask & __GFP_NOFAIL; + const bool costly_order = order > PAGE_ALLOC_COSTLY_ORDER; + struct page *page = NULL; + unsigned int alloc_flags; +@@ -4074,6 +4069,25 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, + unsigned int zonelist_iter_cookie; + int reserve_flags; + ++ if (unlikely(nofail)) { ++ /* ++ * We most definitely don't want callers attempting to ++ * allocate greater than order-1 page units with __GFP_NOFAIL. ++ */ ++ WARN_ON_ONCE(order > 1); ++ /* ++ * Also we don't support __GFP_NOFAIL without __GFP_DIRECT_RECLAIM, ++ * otherwise, we may result in lockup. ++ */ ++ WARN_ON_ONCE(!can_direct_reclaim); ++ /* ++ * PF_MEMALLOC request from this context is rather bizarre ++ * because we cannot reclaim anything and only can loop waiting ++ * for somebody to do a work for us. ++ */ ++ WARN_ON_ONCE(current->flags & PF_MEMALLOC); ++ } ++ + restart: + compaction_retries = 0; + no_progress_loops = 0; +@@ -4291,29 +4305,15 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, + * Make sure that __GFP_NOFAIL request doesn't leak out and make sure + * we always retry + */ +- if (gfp_mask & __GFP_NOFAIL) { ++ if (unlikely(nofail)) { + /* +- * All existing users of the __GFP_NOFAIL are blockable, so warn +- * of any new users that actually require GFP_NOWAIT ++ * Lacking direct_reclaim we can't do anything to reclaim memory, ++ * we disregard these unreasonable nofail requests and still ++ * return NULL + */ +- if (WARN_ON_ONCE_GFP(!can_direct_reclaim, gfp_mask)) ++ if (!can_direct_reclaim) + goto fail; + +- /* +- * PF_MEMALLOC request from this context is rather bizarre +- * because we cannot reclaim anything and only can loop waiting +- * for somebody to do a work for us +- */ +- WARN_ON_ONCE_GFP(current->flags & PF_MEMALLOC, gfp_mask); +- +- /* +- * non failing costly orders are a hard requirement which we +- * are not prepared for much so let's warn about these users +- * so that we can identify them and convert them to something +- * else. +- */ +- WARN_ON_ONCE_GFP(costly_order, gfp_mask); +- + /* + * Help non-failing allocations by giving some access to memory + * reserves normally used for high priority non-blocking +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1255-mm-page-alloc-c-avoid-infinite-retries-caused-by-cpuset-race.patch b/SOURCES/1255-mm-page-alloc-c-avoid-infinite-retries-caused-by-cpuset-race.patch new file mode 100644 index 000000000..f5727e433 --- /dev/null +++ b/SOURCES/1255-mm-page-alloc-c-avoid-infinite-retries-caused-by-cpuset-race.patch @@ -0,0 +1,85 @@ +From 76329fc4d67ac1854f97337545e428e181b1cbe5 Mon Sep 17 00:00:00 2001 +From: Nico Pache +Date: Sat, 4 Apr 2026 19:30:21 -0600 +Subject: [PATCH] mm/page_alloc.c: avoid infinite retries caused by cpuset race + +commit e05741fb10c38d70bbd7ec12b23c197b6355d519 +Author: Tianyang Zhang +Date: Wed Apr 16 16:24:05 2025 +0800 + + mm/page_alloc.c: avoid infinite retries caused by cpuset race + + __alloc_pages_slowpath has no change detection for ac->nodemask in the + part of retry path, while cpuset can modify it in parallel. For some + processes that set mempolicy as MPOL_BIND, this results ac->nodemask + changes, and then the should_reclaim_retry will judge based on the latest + nodemask and jump to retry, while the get_page_from_freelist only + traverses the zonelist from ac->preferred_zoneref, which selected by a + expired nodemask and may cause infinite retries in some cases + + cpu 64: + __alloc_pages_slowpath { + /* ..... */ + retry: + /* ac->nodemask = 0x1, ac->preferred->zone->nid = 1 */ + if (alloc_flags & ALLOC_KSWAPD) + wake_all_kswapds(order, gfp_mask, ac); + /* cpu 1: + cpuset_write_resmask + update_nodemask + update_nodemasks_hier + update_tasks_nodemask + mpol_rebind_task + mpol_rebind_policy + mpol_rebind_nodemask + // mempolicy->nodes has been modified, + // which ac->nodemask point to + + */ + /* ac->nodemask = 0x3, ac->preferred->zone->nid = 1 */ + if (should_reclaim_retry(gfp_mask, order, ac, alloc_flags, + did_some_progress > 0, &no_progress_loops)) + goto retry; + } + + Simultaneously starting multiple cpuset01 from LTP can quickly reproduce + this issue on a multi node server when the maximum memory pressure is + reached and the swap is enabled + + Link: https://lkml.kernel.org/r/20250416082405.20988-1-zhangtianyang@loongson.cn + Fixes: c33d6c06f60f ("mm, page_alloc: avoid looking up the first zone in a zonelist twice") + Signed-off-by: Tianyang Zhang + Reviewed-by: Suren Baghdasaryan + Reviewed-by: Vlastimil Babka + Cc: Michal Hocko + Cc: Brendan Jackman + Cc: Johannes Weiner + Cc: Zi Yan + Cc: + Signed-off-by: Andrew Morton + +JIRA: https://redhat.atlassian.net/browse/RHEL-148561 +Signed-off-by: Nico Pache + +diff --git a/mm/page_alloc.c b/mm/page_alloc.c +index 28cb88a3a758..dc1a7637bf97 100644 +--- a/mm/page_alloc.c ++++ b/mm/page_alloc.c +@@ -4193,6 +4193,14 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, + } + + retry: ++ /* ++ * Deal with possible cpuset update races or zonelist updates to avoid ++ * infinite retries. ++ */ ++ if (check_retry_cpuset(cpuset_mems_cookie, ac) || ++ check_retry_zonelist(zonelist_iter_cookie)) ++ goto restart; ++ + /* Ensure kswapd doesn't accidentally go to sleep as long as we loop */ + if (alloc_flags & ALLOC_KSWAPD) + wake_all_kswapds(order, gfp_mask, ac); +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1256-mm-page-alloc-thp-prevent-reclaim-for-gfp-thisnode-thp-alloc.patch b/SOURCES/1256-mm-page-alloc-thp-prevent-reclaim-for-gfp-thisnode-thp-alloc.patch new file mode 100644 index 000000000..1c87e292d --- /dev/null +++ b/SOURCES/1256-mm-page-alloc-thp-prevent-reclaim-for-gfp-thisnode-thp-alloc.patch @@ -0,0 +1,96 @@ +From 32e483909f993e18474e30c11e0397a82c570b6e Mon Sep 17 00:00:00 2001 +From: Nico Pache +Date: Sat, 4 Apr 2026 19:30:21 -0600 +Subject: [PATCH] mm, page_alloc, thp: prevent reclaim for __GFP_THISNODE THP + allocations + +commit 9c9828d3ead69416d731b1238802af31760c823e +Author: Vlastimil Babka +Date: Fri Dec 19 17:31:57 2025 +0100 + + mm, page_alloc, thp: prevent reclaim for __GFP_THISNODE THP allocations + + Since commit cc638f329ef6 ("mm, thp: tweak reclaim/compaction effort of + local-only and all-node allocations"), THP page fault allocations have + settled on the following scheme (from the commit log): + + 1. local node only THP allocation with no reclaim, just compaction. + 2. for madvised VMA's or when synchronous compaction is enabled always - THP + allocation from any node with effort determined by global defrag setting + and VMA madvise + 3. fallback to base pages on any node + + Recent customer reports however revealed we have a gap in step 1 above. + What we have seen is excessive reclaim due to THP page faults on a NUMA + node that's close to its high watermark, while other nodes have plenty of + free memory. + + The problem with step 1 is that it promises no reclaim after the + compaction attempt, however reclaim is only avoided for certain compaction + outcomes (deferred, or skipped due to insufficient free base pages), and + not e.g. when compaction is actually performed but fails (we did see + compact_fail vmstat counter increasing). + + THP page faults can therefore exhibit a zone_reclaim_mode-like behavior, + which is not the intention. + + Thus add a check for __GFP_THISNODE that corresponds to this exact + situation and prevents continuing with reclaim/compaction once the initial + compaction attempt isn't successful in allocating the page. + + Note that commit cc638f329ef6 has not introduced this over-reclaim + possibility; it appears to exist in some form since commit 2f0799a0ffc0 + ("mm, thp: restore node-local hugepage allocations"). Followup commits + b39d0ee2632d ("mm, page_alloc: avoid expensive reclaim when compaction may + not succeed") and cc638f329ef6 have moved in the right direction, but left + the abovementioned gap. + + Link: https://lkml.kernel.org/r/20251219-costly-noretry-thisnode-fix-v1-1-e1085a4a0c34@suse.cz + Fixes: 2f0799a0ffc0 ("mm, thp: restore node-local hugepage allocations") + Signed-off-by: Vlastimil Babka + Acked-by: Michal Hocko + Acked-by: Johannes Weiner + Acked-by: Pedro Falcato + Acked-by: Zi Yan + Cc: Brendan Jackman + Cc: "David Hildenbrand (Red Hat)" + Cc: David Rientjes + Cc: Joshua Hahn + Cc: Liam Howlett + Cc: Lorenzo Stoakes + Cc: Mike Rapoport + Cc: Suren Baghdasaryan + Cc: + Signed-off-by: Andrew Morton + +JIRA: https://redhat.atlassian.net/browse/RHEL-148561 +Signed-off-by: Nico Pache + +diff --git a/mm/page_alloc.c b/mm/page_alloc.c +index dc1a7637bf97..ca0d42a95410 100644 +--- a/mm/page_alloc.c ++++ b/mm/page_alloc.c +@@ -4183,6 +4183,20 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, + compact_result == COMPACT_DEFERRED) + goto nopage; + ++ /* ++ * THP page faults may attempt local node only first, ++ * but are then allowed to only compact, not reclaim, ++ * see alloc_pages_mpol(). ++ * ++ * Compaction can fail for other reasons than those ++ * checked above and we don't want such THP allocations ++ * to put reclaim pressure on a single node in a ++ * situation where other nodes might have plenty of ++ * available memory. ++ */ ++ if (gfp_mask & __GFP_THISNODE) ++ goto nopage; ++ + /* + * Looks like reclaim/compaction is worth trying, but + * sync compaction could be very expensive, so keep +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1257-mm-page-alloc-ignore-the-exact-initial-compaction-result.patch b/SOURCES/1257-mm-page-alloc-ignore-the-exact-initial-compaction-result.patch new file mode 100644 index 000000000..597829b32 --- /dev/null +++ b/SOURCES/1257-mm-page-alloc-ignore-the-exact-initial-compaction-result.patch @@ -0,0 +1,122 @@ +From 37f7d4a6be45deda800131b7a9ea6d1f3e4d97ab Mon Sep 17 00:00:00 2001 +From: Nico Pache +Date: Sat, 4 Apr 2026 19:30:20 -0600 +Subject: [PATCH] mm/page_alloc: ignore the exact initial compaction result + +commit 66987218154918a6341a3e3eeeee58110a69e0bb +Author: Vlastimil Babka +Date: Tue Jan 6 12:52:36 2026 +0100 + + mm/page_alloc: ignore the exact initial compaction result + + Patch series "tweaks for __alloc_pages_slowpath()", v3. + + This patch (of 3): + + For allocations that are of costly order and __GFP_NORETRY (and can + perform compaction) we attempt direct compaction first. If that fails, we + continue with a single round of direct reclaim+compaction (as for other + __GFP_NORETRY allocations, except the compaction is of lower priority), + with two exceptions that fail immediately: + + - __GFP_THISNODE is specified, to prevent zone_reclaim_mode-like + behavior for e.g. THP page faults + + - compaction failed because it was deferred (i.e. has been failing + recently so further attempts are not done for a while) or skipped, + which means there are insufficient free base pages to defragment to + begin with + + Upon closer inspection, the second condition has a somewhat flawed + reasoning. If there are not enough base pages and reclaim could create + them, we instead fail. When there are enough base pages and compaction + has already ran and failed, we proceed and hope that reclaim and the + subsequent compaction attempt will succeed. But it's unclear why they + should and whether it will be as inexpensive as intended. + + It might make therefore more sense to just fail unconditionally after the + initial compaction attempt. However that would change the semantics of + __GFP_NORETRY to attempt reclaim at least once. + + Alternatively we can remove the compaction result checks and proceed with + the single reclaim and (lower priority) compaction attempt, leaving only + the __GFP_THISNODE exception for failing immediately. + + Link: https://lkml.kernel.org/r/20260106-thp-thisnode-tweak-v3-0-f5d67c21a193@suse.cz + Link: https://lkml.kernel.org/r/20260106-thp-thisnode-tweak-v3-1-f5d67c21a193@suse.cz + Signed-off-by: Vlastimil Babka + Acked-by: Michal Hocko + Cc: Brendan Jackman + Cc: David Hildenbrand (Red Hat) + Cc: David Rientjes + Cc: Johannes Weiner + Cc: Joshua Hahn + Cc: Liam Howlett + Cc: Lorenzo Stoakes + Cc: Mike Rapoport + Cc: Pedro Falcato + Cc: Suren Baghdasaryan + Cc: Zi Yan + Signed-off-by: Andrew Morton + +JIRA: https://redhat.atlassian.net/browse/RHEL-148561 +Signed-off-by: Nico Pache + +diff --git a/mm/page_alloc.c b/mm/page_alloc.c +index ca0d42a95410..3301e934dafa 100644 +--- a/mm/page_alloc.c ++++ b/mm/page_alloc.c +@@ -4162,44 +4162,22 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, + * includes some THP page fault allocations + */ + if (costly_order && (gfp_mask & __GFP_NORETRY)) { +- /* +- * If allocating entire pageblock(s) and compaction +- * failed because all zones are below low watermarks +- * or is prohibited because it recently failed at this +- * order, fail immediately unless the allocator has +- * requested compaction and reclaim retry. +- * +- * Reclaim is +- * - potentially very expensive because zones are far +- * below their low watermarks or this is part of very +- * bursty high order allocations, +- * - not guaranteed to help because isolate_freepages() +- * may not iterate over freed pages as part of its +- * linear scan, and +- * - unlikely to make entire pageblocks free on its +- * own. +- */ +- if (compact_result == COMPACT_SKIPPED || +- compact_result == COMPACT_DEFERRED) +- goto nopage; +- + /* + * THP page faults may attempt local node only first, + * but are then allowed to only compact, not reclaim, + * see alloc_pages_mpol(). + * +- * Compaction can fail for other reasons than those +- * checked above and we don't want such THP allocations +- * to put reclaim pressure on a single node in a +- * situation where other nodes might have plenty of +- * available memory. ++ * Compaction has failed above and we don't want such ++ * THP allocations to put reclaim pressure on a single ++ * node in a situation where other nodes might have ++ * plenty of available memory. + */ + if (gfp_mask & __GFP_THISNODE) + goto nopage; + + /* +- * Looks like reclaim/compaction is worth trying, but +- * sync compaction could be very expensive, so keep ++ * Proceed with single round of reclaim/compaction, but ++ * since sync compaction could be very expensive, keep + * using async compaction. + */ + compact_priority = INIT_COMPACT_PRIORITY; +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1258-mm-page-alloc-refactor-the-initial-compaction-handling.patch b/SOURCES/1258-mm-page-alloc-refactor-the-initial-compaction-handling.patch new file mode 100644 index 000000000..4ddb0e093 --- /dev/null +++ b/SOURCES/1258-mm-page-alloc-refactor-the-initial-compaction-handling.patch @@ -0,0 +1,208 @@ +From 2490569160937bfa1556b9d2dc07998148eb5f77 Mon Sep 17 00:00:00 2001 +From: Nico Pache +Date: Sat, 4 Apr 2026 19:30:20 -0600 +Subject: [PATCH] mm/page_alloc: refactor the initial compaction handling + +commit 53a9b4646f67c95df1775aa5f381cb7f42cae957 +Author: Vlastimil Babka +Date: Tue Jan 6 12:52:37 2026 +0100 + + mm/page_alloc: refactor the initial compaction handling + + The initial direct compaction done in some cases in + __alloc_pages_slowpath() stands out from the main retry loop of reclaim + + compaction. + + We can simplify this by instead skipping the initial reclaim attempt via a + new local variable compact_first, and handle the compact_prority as + necessary to match the original behavior. No functional change intended. + + Link: https://lkml.kernel.org/r/20260106-thp-thisnode-tweak-v3-2-f5d67c21a193@suse.cz + Signed-off-by: Vlastimil Babka + Suggested-by: Johannes Weiner + Reviewed-by: Joshua Hahn + Acked-by: Michal Hocko + Cc: Brendan Jackman + Cc: David Hildenbrand (Red Hat) + Cc: David Rientjes + Cc: Liam Howlett + Cc: Lorenzo Stoakes + Cc: Mike Rapoport + Cc: Pedro Falcato + Cc: Suren Baghdasaryan + Cc: Zi Yan + Signed-off-by: Andrew Morton + +JIRA: https://redhat.atlassian.net/browse/RHEL-148561 +Signed-off-by: Nico Pache + +diff --git a/include/linux/gfp.h b/include/linux/gfp.h +index 3e4c0c536a3d..ac836590ba3a 100644 +--- a/include/linux/gfp.h ++++ b/include/linux/gfp.h +@@ -348,9 +348,15 @@ extern gfp_t gfp_allowed_mask; + /* Returns true if the gfp_mask allows use of ALLOC_NO_WATERMARK */ + bool gfp_pfmemalloc_allowed(gfp_t gfp_mask); + ++/* A helper for checking if gfp includes all the specified flags */ ++static inline bool gfp_has_flags(gfp_t gfp, gfp_t flags) ++{ ++ return (gfp & flags) == flags; ++} ++ + static inline bool gfp_has_io_fs(gfp_t gfp) + { +- return (gfp & (__GFP_IO | __GFP_FS)) == (__GFP_IO | __GFP_FS); ++ return gfp_has_flags(gfp, __GFP_IO | __GFP_FS); + } + + /* +diff --git a/mm/page_alloc.c b/mm/page_alloc.c +index 3301e934dafa..277ed887ec7a 100644 +--- a/mm/page_alloc.c ++++ b/mm/page_alloc.c +@@ -4055,7 +4055,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, + struct alloc_context *ac) + { + bool can_direct_reclaim = gfp_mask & __GFP_DIRECT_RECLAIM; +- bool can_compact = gfp_compaction_allowed(gfp_mask); ++ bool can_compact = can_direct_reclaim && gfp_compaction_allowed(gfp_mask); + bool nofail = gfp_mask & __GFP_NOFAIL; + const bool costly_order = order > PAGE_ALLOC_COSTLY_ORDER; + struct page *page = NULL; +@@ -4068,6 +4068,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, + unsigned int cpuset_mems_cookie; + unsigned int zonelist_iter_cookie; + int reserve_flags; ++ bool compact_first = false; + + if (unlikely(nofail)) { + /* +@@ -4095,6 +4096,19 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, + cpuset_mems_cookie = read_mems_allowed_begin(); + zonelist_iter_cookie = zonelist_iter_begin(); + ++ /* ++ * For costly allocations, try direct compaction first, as it's likely ++ * that we have enough base pages and don't need to reclaim. For non- ++ * movable high-order allocations, do that as well, as compaction will ++ * try prevent permanent fragmentation by migrating from blocks of the ++ * same migratetype. ++ */ ++ if (can_compact && (costly_order || (order > 0 && ++ ac->migratetype != MIGRATE_MOVABLE))) { ++ compact_first = true; ++ compact_priority = INIT_COMPACT_PRIORITY; ++ } ++ + /* + * The fast path uses conservative alloc_flags to succeed only until + * kswapd needs to be woken up, and to avoid the cost of setting up +@@ -4137,53 +4151,6 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, + if (page) + goto got_pg; + +- /* +- * For costly allocations, try direct compaction first, as it's likely +- * that we have enough base pages and don't need to reclaim. For non- +- * movable high-order allocations, do that as well, as compaction will +- * try prevent permanent fragmentation by migrating from blocks of the +- * same migratetype. +- * Don't try this for allocations that are allowed to ignore +- * watermarks, as the ALLOC_NO_WATERMARKS attempt didn't yet happen. +- */ +- if (can_direct_reclaim && can_compact && +- (costly_order || +- (order > 0 && ac->migratetype != MIGRATE_MOVABLE)) +- && !gfp_pfmemalloc_allowed(gfp_mask)) { +- page = __alloc_pages_direct_compact(gfp_mask, order, +- alloc_flags, ac, +- INIT_COMPACT_PRIORITY, +- &compact_result); +- if (page) +- goto got_pg; +- +- /* +- * Checks for costly allocations with __GFP_NORETRY, which +- * includes some THP page fault allocations +- */ +- if (costly_order && (gfp_mask & __GFP_NORETRY)) { +- /* +- * THP page faults may attempt local node only first, +- * but are then allowed to only compact, not reclaim, +- * see alloc_pages_mpol(). +- * +- * Compaction has failed above and we don't want such +- * THP allocations to put reclaim pressure on a single +- * node in a situation where other nodes might have +- * plenty of available memory. +- */ +- if (gfp_mask & __GFP_THISNODE) +- goto nopage; +- +- /* +- * Proceed with single round of reclaim/compaction, but +- * since sync compaction could be very expensive, keep +- * using async compaction. +- */ +- compact_priority = INIT_COMPACT_PRIORITY; +- } +- } +- + retry: + /* + * Deal with possible cpuset update races or zonelist updates to avoid +@@ -4227,10 +4194,12 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, + goto nopage; + + /* Try direct reclaim and then allocating */ +- page = __alloc_pages_direct_reclaim(gfp_mask, order, alloc_flags, ac, +- &did_some_progress); +- if (page) +- goto got_pg; ++ if (!compact_first) { ++ page = __alloc_pages_direct_reclaim(gfp_mask, order, alloc_flags, ++ ac, &did_some_progress); ++ if (page) ++ goto got_pg; ++ } + + /* Try direct compaction and then allocating */ + page = __alloc_pages_direct_compact(gfp_mask, order, alloc_flags, ac, +@@ -4238,6 +4207,33 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, + if (page) + goto got_pg; + ++ if (compact_first) { ++ /* ++ * THP page faults may attempt local node only first, but are ++ * then allowed to only compact, not reclaim, see ++ * alloc_pages_mpol(). ++ * ++ * Compaction has failed above and we don't want such THP ++ * allocations to put reclaim pressure on a single node in a ++ * situation where other nodes might have plenty of available ++ * memory. ++ */ ++ if (gfp_has_flags(gfp_mask, __GFP_NORETRY | __GFP_THISNODE)) ++ goto nopage; ++ ++ /* ++ * For the initial compaction attempt we have lowered its ++ * priority. Restore it for further retries, if those are ++ * allowed. With __GFP_NORETRY there will be a single round of ++ * reclaim and compaction with the lowered priority. ++ */ ++ if (!(gfp_mask & __GFP_NORETRY)) ++ compact_priority = DEF_COMPACT_PRIORITY; ++ ++ compact_first = false; ++ goto retry; ++ } ++ + /* Do not loop if specifically requested */ + if (gfp_mask & __GFP_NORETRY) + goto nopage; +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1259-mm-page-alloc-simplify-alloc-pages-slowpath-flow.patch b/SOURCES/1259-mm-page-alloc-simplify-alloc-pages-slowpath-flow.patch new file mode 100644 index 000000000..31295d697 --- /dev/null +++ b/SOURCES/1259-mm-page-alloc-simplify-alloc-pages-slowpath-flow.patch @@ -0,0 +1,138 @@ +From a17c77996e1aa930c05901e213f1441f0db7a46a Mon Sep 17 00:00:00 2001 +From: Nico Pache +Date: Sat, 4 Apr 2026 19:30:20 -0600 +Subject: [PATCH] mm/page_alloc: simplify __alloc_pages_slowpath() flow + +commit 2c4c3e29897d43c431b1cf9432fb66977f262ac2 +Author: Vlastimil Babka +Date: Tue Jan 6 12:52:38 2026 +0100 + + mm/page_alloc: simplify __alloc_pages_slowpath() flow + + The actions done before entering the main retry loop include waking up + kswapds and an allocation attempt with the precise alloc_flags. Then in + the loop we keep waking up kswapds, and we retry the allocation with flags + potentially further adjusted by being allowed to use reserves (due to e.g. + becoming an OOM killer victim). + + We can adjust the retry loop to keep only one instance of waking up + kswapds and allocation attempt. Introduce the can_retry_reserves variable + for retrying once when we become eligible for reserves. It is still + useful not to evaluate reserve_flags immediately for the first allocation + attempt, because it's better to first try succeed in a non-preferred zone + above the min watermark before allocating immediately from the preferred + zone below min watermark. + + Additionally move the cpuset update checks introduced by e05741fb10c3 + ("mm/page_alloc.c: avoid infinite retries caused by cpuset race") further + down the retry loop. It's enough to do the checks only before reaching + any potentially infinite 'goto retry;' loop. + + There should be no meaningful functional changes. The change of exact + moments the retry for reserves and cpuset updates are checked should not + result in different outomes modulo races with concurrent allocator + activity. + + Link: https://lkml.kernel.org/r/20260106-thp-thisnode-tweak-v3-3-f5d67c21a193@suse.cz + Signed-off-by: Vlastimil Babka + Acked-by: Michal Hocko + Cc: Johannes Weiner + Cc: Joshua Hahn + Cc: Brendan Jackman + Cc: David Hildenbrand (Red Hat) + Cc: David Rientjes + Cc: Liam Howlett + Cc: Lorenzo Stoakes + Cc: Mike Rapoport + Cc: Pedro Falcato + Cc: Suren Baghdasaryan + Cc: Zi Yan + Signed-off-by: Andrew Morton + +JIRA: https://redhat.atlassian.net/browse/RHEL-148561 +Signed-off-by: Nico Pache + +diff --git a/mm/page_alloc.c b/mm/page_alloc.c +index 277ed887ec7a..4c2b622a39cf 100644 +--- a/mm/page_alloc.c ++++ b/mm/page_alloc.c +@@ -4069,6 +4069,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, + unsigned int zonelist_iter_cookie; + int reserve_flags; + bool compact_first = false; ++ bool can_retry_reserves = true; + + if (unlikely(nofail)) { + /* +@@ -4140,6 +4141,8 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, + goto nopage; + } + ++retry: ++ /* Ensure kswapd doesn't accidentally go to sleep as long as we loop */ + if (alloc_flags & ALLOC_KSWAPD) + wake_all_kswapds(order, gfp_mask, ac); + +@@ -4151,19 +4154,6 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, + if (page) + goto got_pg; + +-retry: +- /* +- * Deal with possible cpuset update races or zonelist updates to avoid +- * infinite retries. +- */ +- if (check_retry_cpuset(cpuset_mems_cookie, ac) || +- check_retry_zonelist(zonelist_iter_cookie)) +- goto restart; +- +- /* Ensure kswapd doesn't accidentally go to sleep as long as we loop */ +- if (alloc_flags & ALLOC_KSWAPD) +- wake_all_kswapds(order, gfp_mask, ac); +- + reserve_flags = __gfp_pfmemalloc_flags(gfp_mask); + if (reserve_flags) + alloc_flags = gfp_to_alloc_flags_cma(gfp_mask, reserve_flags) | +@@ -4178,12 +4168,18 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, + ac->nodemask = NULL; + ac->preferred_zoneref = first_zones_zonelist(ac->zonelist, + ac->highest_zoneidx, ac->nodemask); +- } + +- /* Attempt with potentially adjusted zonelist and alloc_flags */ +- page = get_page_from_freelist(gfp_mask, order, alloc_flags, ac); +- if (page) +- goto got_pg; ++ /* ++ * The first time we adjust anything due to being allowed to ++ * ignore memory policies or watermarks, retry immediately. This ++ * allows us to keep the first allocation attempt optimistic so ++ * it can succeed in a zone that is still above watermarks. ++ */ ++ if (can_retry_reserves) { ++ can_retry_reserves = false; ++ goto retry; ++ } ++ } + + /* Caller is not willing to reclaim, we can't balance anything */ + if (!can_direct_reclaim) +@@ -4246,6 +4242,15 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, + !(gfp_mask & __GFP_RETRY_MAYFAIL))) + goto nopage; + ++ /* ++ * Deal with possible cpuset update races or zonelist updates to avoid ++ * infinite retries. No "goto retry;" can be placed above this check ++ * unless it can execute just once. ++ */ ++ if (check_retry_cpuset(cpuset_mems_cookie, ac) || ++ check_retry_zonelist(zonelist_iter_cookie)) ++ goto restart; ++ + if (should_reclaim_retry(gfp_mask, order, ac, alloc_flags, + did_some_progress > 0, &no_progress_loops)) + goto retry; +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1260-mm-page-alloc-add-vm-thp-thisnode-reclaim-sysctl-to-allow-th.patch b/SOURCES/1260-mm-page-alloc-add-vm-thp-thisnode-reclaim-sysctl-to-allow-th.patch new file mode 100644 index 000000000..a96d094e9 --- /dev/null +++ b/SOURCES/1260-mm-page-alloc-add-vm-thp-thisnode-reclaim-sysctl-to-allow-th.patch @@ -0,0 +1,85 @@ +From fe93b8af523e8f3cf1e7304d100adf6ed44f6345 Mon Sep 17 00:00:00 2001 +From: Nico Pache +Date: Sat, 28 Mar 2026 16:17:55 -0600 +Subject: [PATCH] mm/page_alloc: add vm.thp_thisnode_reclaim sysctl to allow + THP reclaim on local node + +Upstream commit cd2e3c32636e ("mm, page_alloc, thp: prevent reclaim for +__GFP_THISNODE THP allocations") prevents __GFP_THISNODE THP allocations +from proceeding into reclaim after compaction failure, to avoid +zone_reclaim_mode-like excessive reclaim on a single NUMA node when other +nodes have plenty of free memory. This was further refined by upstream +commits 66987218154918a6 and 53a9b4646f67 which refactored the check +into gfp_has_flags(gfp_mask, __GFP_NORETRY | __GFP_THISNODE). +While this is the correct default, to prevent workloads regressing on older +releases, or for customers/workloads that may benefit from the more aggressive +reclaim behavior. Add a sysctl knob (vm.thp_thisnode_reclaim) to restore the +previous behavior. + +The sysctl defaults to 1 to avoid regressions and keep the pre-fix behavior. + +Upstream-status: RHEL-Only +JIRA: https://redhat.atlassian.net/browse/RHEL-148561 +Signed-off-by: Nico Pache + +diff --git a/mm/page_alloc.c b/mm/page_alloc.c +index 4c2b622a39cf..b26f3d53b751 100644 +--- a/mm/page_alloc.c ++++ b/mm/page_alloc.c +@@ -290,6 +290,14 @@ int user_min_free_kbytes = -1; + static int watermark_boost_factor __read_mostly = 15000; + static int watermark_scale_factor = 10; + ++/* ++ * RHEL-ONLY: When set to 1, allows reclaim for __GFP_THISNODE THP allocations, ++ * restoring the behavior prior to the fix that prevents zone_reclaim_mode-like ++ * excessive reclaim on a single NUMA node when other nodes have plenty of free ++ * memory. ++ */ ++static int thp_thisnode_reclaim __read_mostly = 1; ++ + /* movable_zone is the "real" zone pages in ZONE_MOVABLE are taken from */ + int movable_zone; + EXPORT_SYMBOL(movable_zone); +@@ -4213,9 +4221,20 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, + * allocations to put reclaim pressure on a single node in a + * situation where other nodes might have plenty of available + * memory. ++ * ++ * RHEL-ONLY: vm.thp_thisnode_reclaim can override this to ++ * restore the pre-fix behavior: allow reclaim for THISNODE THP ++ * allocations, but still fail immediately when compaction was ++ * skipped (insufficient free base pages) or deferred (recent ++ * compaction failures at this order). + */ +- if (gfp_has_flags(gfp_mask, __GFP_NORETRY | __GFP_THISNODE)) +- goto nopage; ++ if (gfp_has_flags(gfp_mask, __GFP_NORETRY | __GFP_THISNODE)) { ++ if (!thp_thisnode_reclaim) ++ goto nopage; ++ if (compact_result == COMPACT_SKIPPED || ++ compact_result == COMPACT_DEFERRED) ++ goto nopage; ++ } + + /* + * For the initial compaction attempt we have lowered its +@@ -6213,6 +6232,15 @@ static struct ctl_table page_alloc_sysctl_table[] = { + .extra1 = SYSCTL_ZERO, + .extra2 = SYSCTL_ONE_HUNDRED, + }, ++ { ++ .procname = "thp_thisnode_reclaim", //RHEL-ONLY ++ .data = &thp_thisnode_reclaim, ++ .maxlen = sizeof(thp_thisnode_reclaim), ++ .mode = 0644, ++ .proc_handler = proc_dointvec_minmax, ++ .extra1 = SYSCTL_ZERO, ++ .extra2 = SYSCTL_ONE, ++ }, + #endif + {} + }; +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1261-smb-client-fix-oob-reads-parsing-symlink-error-response.patch b/SOURCES/1261-smb-client-fix-oob-reads-parsing-symlink-error-response.patch new file mode 100644 index 000000000..bd370dd31 --- /dev/null +++ b/SOURCES/1261-smb-client-fix-oob-reads-parsing-symlink-error-response.patch @@ -0,0 +1,123 @@ +From e0c8209f463129749b824ebf8068fd75774dd5d7 Mon Sep 17 00:00:00 2001 +From: CKI Backport Bot +Date: Tue, 28 Apr 2026 12:07:13 +0000 +Subject: [PATCH] smb: client: fix OOB reads parsing symlink error response + +JIRA: https://redhat.atlassian.net/browse/RHEL-171472 +CVE: CVE-2026-31613 + +commit 3df690bba28edec865cf7190be10708ad0ddd67e +Author: Greg Kroah-Hartman +Date: Mon Apr 6 15:49:38 2026 +0200 + + smb: client: fix OOB reads parsing symlink error response + + When a CREATE returns STATUS_STOPPED_ON_SYMLINK, smb2_check_message() + returns success without any length validation, leaving the symlink + parsers as the only defense against an untrusted server. + + symlink_data() walks SMB 3.1.1 error contexts with the loop test "p < + end", but reads p->ErrorId at offset 4 and p->ErrorDataLength at offset + 0. When the server-controlled ErrorDataLength advances p to within 1-7 + bytes of end, the next iteration will read past it. When the matching + context is found, sym->SymLinkErrorTag is read at offset 4 from + p->ErrorContextData with no check that the symlink header itself fits. + + smb2_parse_symlink_response() then bounds-checks the substitute name + using SMB2_SYMLINK_STRUCT_SIZE as the offset of PathBuffer from + iov_base. That value is computed as sizeof(smb2_err_rsp) + + sizeof(smb2_symlink_err_rsp), which is correct only when + ErrorContextCount == 0. + + With at least one error context the symlink data sits 8 bytes deeper, + and each skipped non-matching context shifts it further by 8 + + ALIGN(ErrorDataLength, 8). The check is too short, allowing the + substitute name read to run past iov_len. The out-of-bound heap bytes + are UTF-16-decoded into the symlink target and returned to userspace via + readlink(2). + + Fix this all up by making the loops test require the full context header + to fit, rejecting sym if its header runs past end, and bound the + substitute name against the actual position of sym->PathBuffer rather + than a fixed offset. + + Because sub_offs and sub_len are 16bits, the pointer math will not + overflow here with the new greater-than. + + Cc: Ronnie Sahlberg + Cc: Shyam Prasad N + Cc: Tom Talpey + Cc: Bharath SM + Cc: linux-cifs@vger.kernel.org + Cc: samba-technical@lists.samba.org + Cc: stable + Reviewed-by: Paulo Alcantara (Red Hat) + Assisted-by: gregkh_clanker_t1000 + Signed-off-by: Greg Kroah-Hartman + Signed-off-by: Steve French + +Signed-off-by: CKI Backport Bot + +diff --git a/fs/smb/client/smb2file.c b/fs/smb/client/smb2file.c +index fa9726f53143..4793981e2bb7 100644 +--- a/fs/smb/client/smb2file.c ++++ b/fs/smb/client/smb2file.c +@@ -27,10 +27,11 @@ static struct smb2_symlink_err_rsp *symlink_data(const struct kvec *iov) + { + struct smb2_err_rsp *err = iov->iov_base; + struct smb2_symlink_err_rsp *sym = ERR_PTR(-EINVAL); ++ u8 *end = (u8 *)err + iov->iov_len; + u32 len; + + if (err->ErrorContextCount) { +- struct smb2_error_context_rsp *p, *end; ++ struct smb2_error_context_rsp *p; + + len = (u32)err->ErrorContextCount * (offsetof(struct smb2_error_context_rsp, + ErrorContextData) + +@@ -39,8 +40,7 @@ static struct smb2_symlink_err_rsp *symlink_data(const struct kvec *iov) + return ERR_PTR(-EINVAL); + + p = (struct smb2_error_context_rsp *)err->ErrorData; +- end = (struct smb2_error_context_rsp *)((u8 *)err + iov->iov_len); +- do { ++ while ((u8 *)p + sizeof(*p) <= end) { + if (le32_to_cpu(p->ErrorId) == SMB2_ERROR_ID_DEFAULT) { + sym = (struct smb2_symlink_err_rsp *)p->ErrorContextData; + break; +@@ -50,14 +50,16 @@ static struct smb2_symlink_err_rsp *symlink_data(const struct kvec *iov) + + len = ALIGN(le32_to_cpu(p->ErrorDataLength), 8); + p = (struct smb2_error_context_rsp *)(p->ErrorContextData + len); +- } while (p < end); ++ } + } else if (le32_to_cpu(err->ByteCount) >= sizeof(*sym) && + iov->iov_len >= SMB2_SYMLINK_STRUCT_SIZE) { + sym = (struct smb2_symlink_err_rsp *)err->ErrorData; + } + +- if (!IS_ERR(sym) && (le32_to_cpu(sym->SymLinkErrorTag) != SYMLINK_ERROR_TAG || +- le32_to_cpu(sym->ReparseTag) != IO_REPARSE_TAG_SYMLINK)) ++ if (!IS_ERR(sym) && ++ ((u8 *)sym + sizeof(*sym) > end || ++ le32_to_cpu(sym->SymLinkErrorTag) != SYMLINK_ERROR_TAG || ++ le32_to_cpu(sym->ReparseTag) != IO_REPARSE_TAG_SYMLINK)) + sym = ERR_PTR(-EINVAL); + + return sym; +@@ -128,8 +130,10 @@ int smb2_parse_symlink_response(struct cifs_sb_info *cifs_sb, const struct kvec + print_len = le16_to_cpu(sym->PrintNameLength); + print_offs = le16_to_cpu(sym->PrintNameOffset); + +- if (iov->iov_len < SMB2_SYMLINK_STRUCT_SIZE + sub_offs + sub_len || +- iov->iov_len < SMB2_SYMLINK_STRUCT_SIZE + print_offs + print_len) ++ if ((char *)sym->PathBuffer + sub_offs + sub_len > ++ (char *)iov->iov_base + iov->iov_len || ++ (char *)sym->PathBuffer + print_offs + print_len > ++ (char *)iov->iov_base + iov->iov_len) + return -EINVAL; + + return smb2_parse_native_symlink(path, +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1262-crypto-authenc-fix-sleep-in-atomic-context-in-decrypt-tail.patch b/SOURCES/1262-crypto-authenc-fix-sleep-in-atomic-context-in-decrypt-tail.patch new file mode 100644 index 000000000..e2a658e8e --- /dev/null +++ b/SOURCES/1262-crypto-authenc-fix-sleep-in-atomic-context-in-decrypt-tail.patch @@ -0,0 +1,46 @@ +From b68bb0a260effb5982ab52535a3213ff03b57ed9 Mon Sep 17 00:00:00 2001 +From: Vladislav Dronov +Date: Wed, 29 Apr 2026 23:04:01 +0200 +Subject: [PATCH] crypto: authenc - Fix sleep in atomic context in decrypt_tail + +JIRA: https://issues.redhat.com/browse/RHEL-172166 +Upstream Status: merged into the linux.git + +commit 66eae850333d639fc278d6f915c6fc01499ea893 +Author: Herbert Xu +Date: Wed Jan 19 17:58:40 2022 +1100 + + crypto: authenc - Fix sleep in atomic context in decrypt_tail + + The function crypto_authenc_decrypt_tail discards its flags + argument and always relies on the flags from the original request + when starting its sub-request. + + This is clearly wrong as it may cause the SLEEPABLE flag to be + set when it shouldn't. + + Fixes: 92d95ba91772 ("crypto: authenc - Convert to new AEAD interface") + Reported-by: Corentin Labbe + Signed-off-by: Herbert Xu + Tested-by: Corentin Labbe + Signed-off-by: Herbert Xu + +Assisted-by: Patchpal 0.7.1 +Signed-off-by: Vladislav Dronov + +diff --git a/crypto/authenc.c b/crypto/authenc.c +index 670bf1a01d00..17f674a7cdff 100644 +--- a/crypto/authenc.c ++++ b/crypto/authenc.c +@@ -253,7 +253,7 @@ static int crypto_authenc_decrypt_tail(struct aead_request *req, + dst = scatterwalk_ffwd(areq_ctx->dst, req->dst, req->assoclen); + + skcipher_request_set_tfm(skreq, ctx->enc); +- skcipher_request_set_callback(skreq, aead_request_flags(req), ++ skcipher_request_set_callback(skreq, flags, + req->base.complete, req->base.data); + skcipher_request_set_crypt(skreq, src, dst, + req->cryptlen - authsize, req->iv); +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1263-crypto-authenc-correctly-pass-einprogress-back-up-to-the-cal.patch b/SOURCES/1263-crypto-authenc-correctly-pass-einprogress-back-up-to-the-cal.patch new file mode 100644 index 000000000..875886abe --- /dev/null +++ b/SOURCES/1263-crypto-authenc-correctly-pass-einprogress-back-up-to-the-cal.patch @@ -0,0 +1,213 @@ +From 6cedc3414cbe4e00b4a85ac4381edb18805194d6 Mon Sep 17 00:00:00 2001 +From: Vladislav Dronov +Date: Wed, 29 Apr 2026 23:05:43 +0200 +Subject: [PATCH] crypto: authenc - Correctly pass EINPROGRESS back up to the + caller + +JIRA: https://issues.redhat.com/browse/RHEL-172166 +Upstream Status: merged into the linux.git + +Conflicts: Missing a large crypto-tree-wide upstream patch 255e48eb1768 +("crypto: api - Use data directly in completion function"). To apply: + - Change "void *data" back to "struct crypto_async_request *areq". + - Changle "struct aead_request *req = data" back to "struct aead_request + *req = areq->data". + +commit 96feb73def02d175850daa0e7c2c90c876681b5c +Author: Herbert Xu +Date: Wed Sep 24 18:20:17 2025 +0800 + + crypto: authenc - Correctly pass EINPROGRESS back up to the caller + + When authenc is invoked with MAY_BACKLOG, it needs to pass EINPROGRESS + notifications back up to the caller when the underlying algorithm + returns EBUSY synchronously. + + However, if the EBUSY comes from the second part of an authenc call, + i.e., it is asynchronous, both the EBUSY and the subsequent EINPROGRESS + notification must not be passed to the caller. + + Implement this by passing a mask to the function that starts the + second half of authenc and using it to determine whether EBUSY + and EINPROGRESS should be passed to the caller. + + This was a deficiency in the original implementation of authenc + because it was not expected to be used with MAY_BACKLOG. + + Reported-by: Ingo Franzki + Reported-by: Mikulas Patocka + Fixes: 180ce7e81030 ("crypto: authenc - Add EINPROGRESS check") + Signed-off-by: Herbert Xu + +Assisted-by: Patchpal AI 0.7.1 +Signed-off-by: Vladislav Dronov + +diff --git a/crypto/authenc.c b/crypto/authenc.c +index 17f674a7cdff..494c0b6db431 100644 +--- a/crypto/authenc.c ++++ b/crypto/authenc.c +@@ -39,7 +39,7 @@ struct authenc_request_ctx { + + static void authenc_request_complete(struct aead_request *req, int err) + { +- if (err != -EINPROGRESS) ++ if (err != -EINPROGRESS && err != -EBUSY) + aead_request_complete(req, err); + } + +@@ -109,27 +109,42 @@ static int crypto_authenc_setkey(struct crypto_aead *authenc, const u8 *key, + return err; + } + +-static void authenc_geniv_ahash_done(struct crypto_async_request *areq, int err) ++static void authenc_geniv_ahash_finish(struct aead_request *req) + { +- struct aead_request *req = areq->data; + struct crypto_aead *authenc = crypto_aead_reqtfm(req); + struct aead_instance *inst = aead_alg_instance(authenc); + struct authenc_instance_ctx *ictx = aead_instance_ctx(inst); + struct authenc_request_ctx *areq_ctx = aead_request_ctx(req); + struct ahash_request *ahreq = (void *)(areq_ctx->tail + ictx->reqoff); + +- if (err) +- goto out; +- + scatterwalk_map_and_copy(ahreq->result, req->dst, + req->assoclen + req->cryptlen, + crypto_aead_authsize(authenc), 1); ++} + +-out: ++static void authenc_geniv_ahash_done(struct crypto_async_request *areq, int err) ++{ ++ struct aead_request *req = areq->data; ++ ++ if (!err) ++ authenc_geniv_ahash_finish(req); + aead_request_complete(req, err); + } + +-static int crypto_authenc_genicv(struct aead_request *req, unsigned int flags) ++/* ++ * Used when the ahash request was invoked in the async callback context ++ * of the previous skcipher request. Eat any EINPROGRESS notifications. ++ */ ++static void authenc_geniv_ahash_done2(struct crypto_async_request *areq, int err) ++{ ++ struct aead_request *req = areq->data; ++ ++ if (!err) ++ authenc_geniv_ahash_finish(req); ++ authenc_request_complete(req, err); ++} ++ ++static int crypto_authenc_genicv(struct aead_request *req, unsigned int mask) + { + struct crypto_aead *authenc = crypto_aead_reqtfm(req); + struct aead_instance *inst = aead_alg_instance(authenc); +@@ -138,6 +153,7 @@ static int crypto_authenc_genicv(struct aead_request *req, unsigned int flags) + struct crypto_ahash *auth = ctx->auth; + struct authenc_request_ctx *areq_ctx = aead_request_ctx(req); + struct ahash_request *ahreq = (void *)(areq_ctx->tail + ictx->reqoff); ++ unsigned int flags = aead_request_flags(req) & ~mask; + u8 *hash = areq_ctx->tail; + int err; + +@@ -148,7 +164,8 @@ static int crypto_authenc_genicv(struct aead_request *req, unsigned int flags) + ahash_request_set_crypt(ahreq, req->dst, hash, + req->assoclen + req->cryptlen); + ahash_request_set_callback(ahreq, flags, +- authenc_geniv_ahash_done, req); ++ mask ? authenc_geniv_ahash_done2 : ++ authenc_geniv_ahash_done, req); + + err = crypto_ahash_digest(ahreq); + if (err) +@@ -165,12 +182,11 @@ static void crypto_authenc_encrypt_done(struct crypto_async_request *req, + { + struct aead_request *areq = req->data; + +- if (err) +- goto out; +- +- err = crypto_authenc_genicv(areq, 0); +- +-out: ++ if (err) { ++ aead_request_complete(areq, err); ++ return; ++ } ++ err = crypto_authenc_genicv(areq, CRYPTO_TFM_REQ_MAY_SLEEP); + authenc_request_complete(areq, err); + } + +@@ -223,11 +239,18 @@ static int crypto_authenc_encrypt(struct aead_request *req) + if (err) + return err; + +- return crypto_authenc_genicv(req, aead_request_flags(req)); ++ return crypto_authenc_genicv(req, 0); ++} ++ ++static void authenc_decrypt_tail_done(struct crypto_async_request *areq, int err) ++{ ++ struct aead_request *req = areq->data; ++ ++ authenc_request_complete(req, err); + } + + static int crypto_authenc_decrypt_tail(struct aead_request *req, +- unsigned int flags) ++ unsigned int mask) + { + struct crypto_aead *authenc = crypto_aead_reqtfm(req); + struct aead_instance *inst = aead_alg_instance(authenc); +@@ -238,6 +261,7 @@ static int crypto_authenc_decrypt_tail(struct aead_request *req, + struct skcipher_request *skreq = (void *)(areq_ctx->tail + + ictx->reqoff); + unsigned int authsize = crypto_aead_authsize(authenc); ++ unsigned int flags = aead_request_flags(req) & ~mask; + u8 *ihash = ahreq->result + authsize; + struct scatterlist *src, *dst; + +@@ -254,7 +278,9 @@ static int crypto_authenc_decrypt_tail(struct aead_request *req, + + skcipher_request_set_tfm(skreq, ctx->enc); + skcipher_request_set_callback(skreq, flags, +- req->base.complete, req->base.data); ++ mask ? authenc_decrypt_tail_done : ++ req->base.complete, ++ mask ? req : req->base.data); + skcipher_request_set_crypt(skreq, src, dst, + req->cryptlen - authsize, req->iv); + +@@ -266,12 +292,11 @@ static void authenc_verify_ahash_done(struct crypto_async_request *areq, + { + struct aead_request *req = areq->data; + +- if (err) +- goto out; +- +- err = crypto_authenc_decrypt_tail(req, 0); +- +-out: ++ if (err) { ++ aead_request_complete(req, err); ++ return; ++ } ++ err = crypto_authenc_decrypt_tail(req, CRYPTO_TFM_REQ_MAY_SLEEP); + authenc_request_complete(req, err); + } + +@@ -301,7 +326,7 @@ static int crypto_authenc_decrypt(struct aead_request *req) + if (err) + return err; + +- return crypto_authenc_decrypt_tail(req, aead_request_flags(req)); ++ return crypto_authenc_decrypt_tail(req, 0); + } + + static int crypto_authenc_init_tfm(struct crypto_aead *tfm) +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1264-buffer-overflow-in-drivers-xen-sys-hypervisor-c.patch b/SOURCES/1264-buffer-overflow-in-drivers-xen-sys-hypervisor-c.patch new file mode 100644 index 000000000..409d615e0 --- /dev/null +++ b/SOURCES/1264-buffer-overflow-in-drivers-xen-sys-hypervisor-c.patch @@ -0,0 +1,60 @@ +From 27fdbab4221b375de54bf91919798d88520c6e28 Mon Sep 17 00:00:00 2001 +From: Juergen Gross +Date: Fri, 27 Mar 2026 14:13:38 +0100 +Subject: [PATCH] Buffer overflow in drivers/xen/sys-hypervisor.c + +The build id returned by HYPERVISOR_xen_version(XENVER_build_id) is +neither NUL terminated nor a string. + +The first causes a buffer overflow as sprintf in buildid_show will +read and copy till it finds a NUL. + +00000000 f4 91 51 f4 dd 38 9e 9d 65 47 52 eb 10 71 db 50 |..Q..8..eGR..q.P| +00000010 b9 a8 01 42 6f 2e 32 |...Bo.2| +00000017 + +So use a memcpy instead of sprintf to have the correct value: + +00000000 f4 91 51 f4 dd 00 9e 9d 65 47 52 eb 10 71 db 50 |..Q.....eGR..q.P| +00000010 b9 a8 01 42 |...B| +00000014 + +(the above have a hack to embed a zero inside and check it's +returned correctly). + +This is XSA-485 / CVE-2026-31786 + +Fixes: 84b7625728ea ("xen: add sysfs node for hypervisor build id") +Signed-off-by: Frediano Ziglio +Reviewed-by: Juergen Gross +Signed-off-by: Juergen Gross + +diff --git a/drivers/xen/sys-hypervisor.c b/drivers/xen/sys-hypervisor.c +index b1bb01ba82f8..91923242a5ae 100644 +--- a/drivers/xen/sys-hypervisor.c ++++ b/drivers/xen/sys-hypervisor.c +@@ -366,6 +366,8 @@ static ssize_t buildid_show(struct hyp_sysfs_attr *attr, char *buffer) + ret = sprintf(buffer, ""); + return ret; + } ++ if (ret > PAGE_SIZE) ++ return -ENOSPC; + + buildid = kmalloc(sizeof(*buildid) + ret, GFP_KERNEL); + if (!buildid) +@@ -373,8 +375,10 @@ static ssize_t buildid_show(struct hyp_sysfs_attr *attr, char *buffer) + + buildid->len = ret; + ret = HYPERVISOR_xen_version(XENVER_build_id, buildid); +- if (ret > 0) +- ret = sprintf(buffer, "%s", buildid->buf); ++ if (ret > 0) { ++ /* Build id is binary, not a string. */ ++ memcpy(buffer, buildid->buf, ret); ++ } + kfree(buildid); + + return ret; +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1265-nvme-nvme-fc-move-tagset-removal-to-nvme-fc-delete-ctrl.patch b/SOURCES/1265-nvme-nvme-fc-move-tagset-removal-to-nvme-fc-delete-ctrl.patch new file mode 100644 index 000000000..a1a782c6b --- /dev/null +++ b/SOURCES/1265-nvme-nvme-fc-move-tagset-removal-to-nvme-fc-delete-ctrl.patch @@ -0,0 +1,80 @@ +From 75537b257b7125983cc0a54f0a2878d28677eb50 Mon Sep 17 00:00:00 2001 +From: "Ewan D. Milne" +Date: Mon, 18 May 2026 11:14:05 -0400 +Subject: [PATCH] nvme: nvme-fc: move tagset removal to nvme_fc_delete_ctrl() + +JIRA: https://redhat.atlassian.net/browse/RHEL-171725 +Upstream Status: From upstream linux mainline + +Now target is removed from nvme_fc_ctrl_free() which is the ctrl->ref +release handler. And even admin queue is unquiesced there, this way +is definitely wrong because the ctr->ref is grabbed when submitting +command. + +And Marco observed that nvme_fc_ctrl_free() can be called from request +completion code path, and trigger kernel warning since request completes +from softirq context. + +Fix the issue by moveing target removal into nvme_fc_delete_ctrl(), +which is also aligned with nvme-tcp and nvme-rdma. + +Patch originally proposed by Ming Lei, then modified to move the tagset +removal down to after nvme_fc_delete_association() after further testing. + +Cc: Marco Patalano +Cc: Ewan Milne +Cc: James Smart +Cc: Sagi Grimberg +Signed-off-by: Ming Lei +Cc: stable@vger.kernel.org +Tested-by: Marco Patalano +Reviewed-by: Justin Tee +Signed-off-by: Ewan D. Milne +Signed-off-by: Keith Busch +(cherry picked from commit ea3442efabd0aa3930c5bab73c3901ef38ef6ac3) +Signed-off-by: Ewan D. Milne + +diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c +index bd6cbe35dace..3e500e87e30c 100644 +--- a/drivers/nvme/host/fc.c ++++ b/drivers/nvme/host/fc.c +@@ -2354,17 +2354,11 @@ nvme_fc_ctrl_free(struct kref *ref) + container_of(ref, struct nvme_fc_ctrl, ref); + unsigned long flags; + +- if (ctrl->ctrl.tagset) +- nvme_remove_io_tag_set(&ctrl->ctrl); +- + /* remove from rport list */ + spin_lock_irqsave(&ctrl->rport->lock, flags); + list_del(&ctrl->ctrl_list); + spin_unlock_irqrestore(&ctrl->rport->lock, flags); + +- nvme_unquiesce_admin_queue(&ctrl->ctrl); +- nvme_remove_admin_tag_set(&ctrl->ctrl); +- + kfree(ctrl->queues); + + put_device(ctrl->dev); +@@ -3252,11 +3246,18 @@ nvme_fc_delete_ctrl(struct nvme_ctrl *nctrl) + + cancel_work_sync(&ctrl->ioerr_work); + cancel_delayed_work_sync(&ctrl->connect_work); ++ + /* + * kill the association on the link side. this will block + * waiting for io to terminate + */ + nvme_fc_delete_association(ctrl); ++ ++ if (ctrl->ctrl.tagset) ++ nvme_remove_io_tag_set(&ctrl->ctrl); ++ ++ nvme_unquiesce_admin_queue(&ctrl->ctrl); ++ nvme_remove_admin_tag_set(&ctrl->ctrl); + } + + static void +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1266-nvme-nvme-fc-ensure-ioerr-work-is-cancelled-in-nvme-fc-delet.patch b/SOURCES/1266-nvme-nvme-fc-ensure-ioerr-work-is-cancelled-in-nvme-fc-delet.patch new file mode 100644 index 000000000..3f4a76f4c --- /dev/null +++ b/SOURCES/1266-nvme-nvme-fc-ensure-ioerr-work-is-cancelled-in-nvme-fc-delet.patch @@ -0,0 +1,93 @@ +From 08fc4018856d6e04b7517e1ea515f06e86a05128 Mon Sep 17 00:00:00 2001 +From: "Ewan D. Milne" +Date: Mon, 18 May 2026 11:19:16 -0400 +Subject: [PATCH] nvme: nvme-fc: Ensure ->ioerr_work is cancelled in + nvme_fc_delete_ctrl() + +JIRA: https://redhat.atlassian.net/browse/RHEL-171725 +Upstream Status: From upstream linux mainline + +nvme_fc_delete_assocation() waits for pending I/O to complete before +returning, and an error can cause ->ioerr_work to be queued after +cancel_work_sync() had been called. Move the call to cancel_work_sync() to +be after nvme_fc_delete_association() to ensure ->ioerr_work is not running +when the nvme_fc_ctrl object is freed. Otherwise the following can occur: + +[ 1135.911754] list_del corruption, ff2d24c8093f31f8->next is NULL +[ 1135.917705] ------------[ cut here ]------------ +[ 1135.922336] kernel BUG at lib/list_debug.c:52! +[ 1135.926784] Oops: invalid opcode: 0000 [#1] SMP NOPTI +[ 1135.931851] CPU: 48 UID: 0 PID: 726 Comm: kworker/u449:23 Kdump: loaded Not tainted 6.12.0 #1 PREEMPT(voluntary) +[ 1135.943490] Hardware name: Dell Inc. PowerEdge R660/0HGTK9, BIOS 2.5.4 01/16/2025 +[ 1135.950969] Workqueue: 0x0 (nvme-wq) +[ 1135.954673] RIP: 0010:__list_del_entry_valid_or_report.cold+0xf/0x6f +[ 1135.961041] Code: c7 c7 98 68 72 94 e8 26 45 fe ff 0f 0b 48 c7 c7 70 68 72 94 e8 18 45 fe ff 0f 0b 48 89 fe 48 c7 c7 80 69 72 94 e8 07 45 fe ff <0f> 0b 48 89 d1 48 c7 c7 a0 6a 72 94 48 89 c2 e8 f3 44 fe ff 0f 0b +[ 1135.979788] RSP: 0018:ff579b19482d3e50 EFLAGS: 00010046 +[ 1135.985015] RAX: 0000000000000033 RBX: ff2d24c8093f31f0 RCX: 0000000000000000 +[ 1135.992148] RDX: 0000000000000000 RSI: ff2d24d6bfa1d0c0 RDI: ff2d24d6bfa1d0c0 +[ 1135.999278] RBP: ff2d24c8093f31f8 R08: 0000000000000000 R09: ffffffff951e2b08 +[ 1136.006413] R10: ffffffff95122ac8 R11: 0000000000000003 R12: ff2d24c78697c100 +[ 1136.013546] R13: fffffffffffffff8 R14: 0000000000000000 R15: ff2d24c78697c0c0 +[ 1136.020677] FS: 0000000000000000(0000) GS:ff2d24d6bfa00000(0000) knlGS:0000000000000000 +[ 1136.028765] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 +[ 1136.034510] CR2: 00007fd207f90b80 CR3: 000000163ea22003 CR4: 0000000000f73ef0 +[ 1136.041641] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 +[ 1136.048776] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400 +[ 1136.055910] PKRU: 55555554 +[ 1136.058623] Call Trace: +[ 1136.061074] +[ 1136.063179] ? show_trace_log_lvl+0x1b0/0x2f0 +[ 1136.067540] ? show_trace_log_lvl+0x1b0/0x2f0 +[ 1136.071898] ? move_linked_works+0x4a/0xa0 +[ 1136.075998] ? __list_del_entry_valid_or_report.cold+0xf/0x6f +[ 1136.081744] ? __die_body.cold+0x8/0x12 +[ 1136.085584] ? die+0x2e/0x50 +[ 1136.088469] ? do_trap+0xca/0x110 +[ 1136.091789] ? do_error_trap+0x65/0x80 +[ 1136.095543] ? __list_del_entry_valid_or_report.cold+0xf/0x6f +[ 1136.101289] ? exc_invalid_op+0x50/0x70 +[ 1136.105127] ? __list_del_entry_valid_or_report.cold+0xf/0x6f +[ 1136.110874] ? asm_exc_invalid_op+0x1a/0x20 +[ 1136.115059] ? __list_del_entry_valid_or_report.cold+0xf/0x6f +[ 1136.120806] move_linked_works+0x4a/0xa0 +[ 1136.124733] worker_thread+0x216/0x3a0 +[ 1136.128485] ? __pfx_worker_thread+0x10/0x10 +[ 1136.132758] kthread+0xfa/0x240 +[ 1136.135904] ? __pfx_kthread+0x10/0x10 +[ 1136.139657] ret_from_fork+0x31/0x50 +[ 1136.143236] ? __pfx_kthread+0x10/0x10 +[ 1136.146988] ret_from_fork_asm+0x1a/0x30 +[ 1136.150915] + +Fixes: 19fce0470f05 ("nvme-fc: avoid calling _nvme_fc_abort_outstanding_ios from interrupt context") +Cc: stable@vger.kernel.org +Tested-by: Marco Patalano +Reviewed-by: Justin Tee +Signed-off-by: Ewan D. Milne +Signed-off-by: Keith Busch +(cherry picked from commit 0a2c5495b6d1ecb0fa18ef6631450f391a888256) +Signed-off-by: Ewan D. Milne + +diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c +index 3e500e87e30c..6cc7d11ad5c0 100644 +--- a/drivers/nvme/host/fc.c ++++ b/drivers/nvme/host/fc.c +@@ -3244,7 +3244,6 @@ nvme_fc_delete_ctrl(struct nvme_ctrl *nctrl) + { + struct nvme_fc_ctrl *ctrl = to_fc_ctrl(nctrl); + +- cancel_work_sync(&ctrl->ioerr_work); + cancel_delayed_work_sync(&ctrl->connect_work); + + /* +@@ -3252,6 +3251,7 @@ nvme_fc_delete_ctrl(struct nvme_ctrl *nctrl) + * waiting for io to terminate + */ + nvme_fc_delete_association(ctrl); ++ cancel_work_sync(&ctrl->ioerr_work); + + if (ctrl->ctrl.tagset) + nvme_remove_io_tag_set(&ctrl->ctrl); +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1267-s390-dasd-fix-gendisk-parent-after-copy-pair-swap.patch b/SOURCES/1267-s390-dasd-fix-gendisk-parent-after-copy-pair-swap.patch new file mode 100644 index 000000000..14a740bec --- /dev/null +++ b/SOURCES/1267-s390-dasd-fix-gendisk-parent-after-copy-pair-swap.patch @@ -0,0 +1,58 @@ +From e88ced2e3c3091122785c0a2dd822b61d1839d58 Mon Sep 17 00:00:00 2001 +From: Mete Durlu +Date: Fri, 27 Mar 2026 13:14:31 +0100 +Subject: [PATCH] s390/dasd: Fix gendisk parent after copy pair swap + +JIRA: https://issues.redhat.com/browse/RHEL-161530 + +commit c943bfc6afb8d0e781b9b7406f36caa8bbf95cb9 +Author: Stefan Haberland +Date: Wed Nov 26 17:06:31 2025 +0100 + + s390/dasd: Fix gendisk parent after copy pair swap + + After a copy pair swap the block device's "device" symlink points to + the secondary CCW device, but the gendisk's parent remained the + primary, leaving /sys/block/ under the wrong parent. + + Move the gendisk to the secondary's device with device_move(), keeping + the sysfs topology consistent after the swap. + + Fixes: 413862caad6f ("s390/dasd: add copy pair swap capability") + Cc: stable@vger.kernel.org #6.1 + Reviewed-by: Jan Hoeppner + Signed-off-by: Stefan Haberland + Signed-off-by: Jens Axboe + +Signed-off-by: Stefan Haberland +Signed-off-by: Mete Durlu + +diff --git a/drivers/s390/block/dasd_eckd.c b/drivers/s390/block/dasd_eckd.c +index c60424afaf04..7e8679d7c686 100644 +--- a/drivers/s390/block/dasd_eckd.c ++++ b/drivers/s390/block/dasd_eckd.c +@@ -6148,6 +6148,7 @@ static int dasd_eckd_copy_pair_swap(struct dasd_device *device, char *prim_busid + struct dasd_copy_relation *copy; + struct dasd_block *block; + struct gendisk *gdp; ++ int rc; + + copy = device->copy; + if (!copy) +@@ -6182,6 +6183,13 @@ static int dasd_eckd_copy_pair_swap(struct dasd_device *device, char *prim_busid + /* swap blocklayer device link */ + gdp = block->gdp; + dasd_add_link_to_gendisk(gdp, secondary); ++ rc = device_move(disk_to_dev(gdp), &secondary->cdev->dev, DPM_ORDER_NONE); ++ if (rc) { ++ dev_err(&primary->cdev->dev, ++ "copy_pair_swap: moving blockdevice parent %s->%s failed (%d)\n", ++ dev_name(&primary->cdev->dev), ++ dev_name(&secondary->cdev->dev), rc); ++ } + + /* re-enable device */ + dasd_device_remove_stop_bits(primary, DASD_STOPPED_PPRC); +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1268-s390-dasd-move-quiesce-state-with-pprc-swap.patch b/SOURCES/1268-s390-dasd-move-quiesce-state-with-pprc-swap.patch new file mode 100644 index 000000000..d8a2ea712 --- /dev/null +++ b/SOURCES/1268-s390-dasd-move-quiesce-state-with-pprc-swap.patch @@ -0,0 +1,54 @@ +From 02a659e928f9ef15fc673384e95def0b088c9684 Mon Sep 17 00:00:00 2001 +From: Mete Durlu +Date: Fri, 27 Mar 2026 13:14:33 +0100 +Subject: [PATCH] s390/dasd: Move quiesce state with pprc swap + +JIRA: https://issues.redhat.com/browse/RHEL-161530 + +commit 40e9cd4ae8ec43b107ed2bff422a8fa39dcf4e4b +Author: Stefan Haberland +Date: Tue Mar 10 15:23:29 2026 +0100 + + s390/dasd: Move quiesce state with pprc swap + + Quiesce and resume is a mechanism to suspend operations on DASD devices. + In the context of a controlled copy pair swap operation, the quiesce + operation is usually issued before the actual swap and a resume + afterwards. + + During the swap operation, the underlying device is exchanged. Therefore, + the quiesce flag must be moved to the secondary device to ensure a + consistent quiesce state after the swap. + + The secondary device itself cannot be suspended separately because there + is no separate block device representation for it. + + Fixes: 413862caad6f ("s390/dasd: add copy pair swap capability") + Cc: stable@vger.kernel.org #6.1 + Reviewed-by: Jan Hoeppner + Signed-off-by: Stefan Haberland + Link: https://patch.msgid.link/20260310142330.4080106-2-sth@linux.ibm.com + Signed-off-by: Jens Axboe + +Signed-off-by: Stefan Haberland +Signed-off-by: Mete Durlu + +diff --git a/drivers/s390/block/dasd_eckd.c b/drivers/s390/block/dasd_eckd.c +index 7e8679d7c686..f53791c9cbe7 100644 +--- a/drivers/s390/block/dasd_eckd.c ++++ b/drivers/s390/block/dasd_eckd.c +@@ -6191,6 +6191,11 @@ static int dasd_eckd_copy_pair_swap(struct dasd_device *device, char *prim_busid + dev_name(&secondary->cdev->dev), rc); + } + ++ if (primary->stopped & DASD_STOPPED_QUIESCE) { ++ dasd_device_set_stop_bits(secondary, DASD_STOPPED_QUIESCE); ++ dasd_device_remove_stop_bits(primary, DASD_STOPPED_QUIESCE); ++ } ++ + /* re-enable device */ + dasd_device_remove_stop_bits(primary, DASD_STOPPED_PPRC); + dasd_device_remove_stop_bits(secondary, DASD_STOPPED_PPRC); +-- +2.50.1 (Apple Git-155) + diff --git a/SOURCES/1269-s390-dasd-copy-detected-format-information-to-secondary-devi.patch b/SOURCES/1269-s390-dasd-copy-detected-format-information-to-secondary-devi.patch new file mode 100644 index 000000000..711f4a4a6 --- /dev/null +++ b/SOURCES/1269-s390-dasd-copy-detected-format-information-to-secondary-devi.patch @@ -0,0 +1,83 @@ +From 85945142a2a0c5d6a104b9d86eab6648a023765d Mon Sep 17 00:00:00 2001 +From: Mete Durlu +Date: Fri, 27 Mar 2026 13:14:35 +0100 +Subject: [PATCH] s390/dasd: Copy detected format information to secondary + device + +JIRA: https://issues.redhat.com/browse/RHEL-161530 + +commit 4c527c7e030672efd788d0806d7a68972a7ba3c1 +Author: Stefan Haberland +Date: Tue Mar 10 15:23:30 2026 +0100 + + s390/dasd: Copy detected format information to secondary device + + During online processing for a DASD device an IO operation is started to + determine the format of the device. CDL format contains specifically + sized blocks at the beginning of the disk. + + For a PPRC secondary device no real IO operation is possible therefore + this IO request can not be started and this step is skipped for online + processing of secondary devices. This is generally fine since the + secondary is a copy of the primary device. + + In case of an additional partition detection that is run after a swap + operation the format information is needed to properly drive partition + detection IO. + + Currently the information is not passed leading to IO errors during + partition detection and a wrongly detected partition table which in turn + might lead to data corruption on the disk with the wrong partition table. + + Fix by passing the format information from primary to secondary device. + + Fixes: 413862caad6f ("s390/dasd: add copy pair swap capability") + Cc: stable@vger.kernel.org #6.1 + Reviewed-by: Jan Hoeppner + Acked-by: Eduard Shishkin + Signed-off-by: Stefan Haberland + Link: https://patch.msgid.link/20260310142330.4080106-3-sth@linux.ibm.com + Signed-off-by: Jens Axboe + +Signed-off-by: Stefan Haberland +Signed-off-by: Mete Durlu + +diff --git a/drivers/s390/block/dasd_eckd.c b/drivers/s390/block/dasd_eckd.c +index f53791c9cbe7..54d6d29477e4 100644 +--- a/drivers/s390/block/dasd_eckd.c ++++ b/drivers/s390/block/dasd_eckd.c +@@ -6144,6 +6144,7 @@ static void copy_pair_set_active(struct dasd_copy_relation *copy, char *new_busi + static int dasd_eckd_copy_pair_swap(struct dasd_device *device, char *prim_busid, + char *sec_busid) + { ++ struct dasd_eckd_private *prim_priv, *sec_priv; + struct dasd_device *primary, *secondary; + struct dasd_copy_relation *copy; + struct dasd_block *block; +@@ -6164,6 +6165,9 @@ static int dasd_eckd_copy_pair_swap(struct dasd_device *device, char *prim_busid + if (!secondary) + return DASD_COPYPAIRSWAP_SECONDARY; + ++ prim_priv = primary->private; ++ sec_priv = secondary->private; ++ + /* + * usually the device should be quiesced for swap + * for paranoia stop device and requeue requests again +@@ -6196,6 +6200,13 @@ static int dasd_eckd_copy_pair_swap(struct dasd_device *device, char *prim_busid + dasd_device_remove_stop_bits(primary, DASD_STOPPED_QUIESCE); + } + ++ /* ++ * The secondary device never got through format detection, but since it ++ * is a copy of the primary device, the format is exactly the same; ++ * therefore, the detected layout can simply be copied. ++ */ ++ sec_priv->uses_cdl = prim_priv->uses_cdl; ++ + /* re-enable device */ + dasd_device_remove_stop_bits(primary, DASD_STOPPED_PPRC); + dasd_device_remove_stop_bits(secondary, DASD_STOPPED_PPRC); +-- +2.50.1 (Apple Git-155) + diff --git a/SPECS/kernel.spec b/SPECS/kernel.spec index f10134998..a4ee22b0d 100644 --- a/SPECS/kernel.spec +++ b/SPECS/kernel.spec @@ -176,13 +176,13 @@ Summary: The Linux kernel # define buildid .local %define specversion 5.14.0 %define patchversion 5.14 -%define pkgrelease 687.5.1 +%define pkgrelease 687.13.1 %define kversion 5 %define tarfile_release 5.14.0-687.5.1.el9_8 # This is needed to do merge window version magic %define patchlevel 14 # This allows pkg_release to have configurable %%{?dist} tag -%define specrelease 687.12.1%{?buildid}%{?dist} +%define specrelease 687.13.1%{?buildid}%{?dist} # This defines the kabi tarball version %define kabiversion 5.14.0-687.5.1.el9_8 @@ -1130,6 +1130,23 @@ Patch1251: 1251-netfilter-xt-tcpmss-check-remaining-length-before-reading-op.pat Patch1252: 1252-dm-thin-fix-metadata-refcount-underflow.patch Patch11111: ppc64le-kvm-support.patch +Patch1253: 1253-mm-document-gfp-nofail-must-be-blockable.patch +Patch1254: 1254-mm-warn-about-illegal-gfp-nofail-usage-in-a-more-appropriate.patch +Patch1255: 1255-mm-page-alloc-c-avoid-infinite-retries-caused-by-cpuset-race.patch +Patch1256: 1256-mm-page-alloc-thp-prevent-reclaim-for-gfp-thisnode-thp-alloc.patch +Patch1257: 1257-mm-page-alloc-ignore-the-exact-initial-compaction-result.patch +Patch1258: 1258-mm-page-alloc-refactor-the-initial-compaction-handling.patch +Patch1259: 1259-mm-page-alloc-simplify-alloc-pages-slowpath-flow.patch +Patch1260: 1260-mm-page-alloc-add-vm-thp-thisnode-reclaim-sysctl-to-allow-th.patch +Patch1261: 1261-smb-client-fix-oob-reads-parsing-symlink-error-response.patch +Patch1262: 1262-crypto-authenc-fix-sleep-in-atomic-context-in-decrypt-tail.patch +Patch1263: 1263-crypto-authenc-correctly-pass-einprogress-back-up-to-the-cal.patch +Patch1264: 1264-buffer-overflow-in-drivers-xen-sys-hypervisor-c.patch +Patch1265: 1265-nvme-nvme-fc-move-tagset-removal-to-nvme-fc-delete-ctrl.patch +Patch1266: 1266-nvme-nvme-fc-ensure-ioerr-work-is-cancelled-in-nvme-fc-delet.patch +Patch1267: 1267-s390-dasd-fix-gendisk-parent-after-copy-pair-swap.patch +Patch1268: 1268-s390-dasd-move-quiesce-state-with-pprc-swap.patch +Patch1269: 1269-s390-dasd-copy-detected-format-information-to-secondary-devi.patch # END OF PATCH DEFINITIONS %description @@ -2027,6 +2044,23 @@ ApplyPatch 1249-bluetooth-sco-fix-race-conditions-in-sco-sock-connect.patch ApplyPatch 1250-wifi-brcmfmac-validate-bsscfg-indices-in-if-events.patch ApplyPatch 1251-netfilter-xt-tcpmss-check-remaining-length-before-reading-op.patch ApplyPatch 1252-dm-thin-fix-metadata-refcount-underflow.patch +ApplyPatch 1253-mm-document-gfp-nofail-must-be-blockable.patch +ApplyPatch 1254-mm-warn-about-illegal-gfp-nofail-usage-in-a-more-appropriate.patch +ApplyPatch 1255-mm-page-alloc-c-avoid-infinite-retries-caused-by-cpuset-race.patch +ApplyPatch 1256-mm-page-alloc-thp-prevent-reclaim-for-gfp-thisnode-thp-alloc.patch +ApplyPatch 1257-mm-page-alloc-ignore-the-exact-initial-compaction-result.patch +ApplyPatch 1258-mm-page-alloc-refactor-the-initial-compaction-handling.patch +ApplyPatch 1259-mm-page-alloc-simplify-alloc-pages-slowpath-flow.patch +ApplyPatch 1260-mm-page-alloc-add-vm-thp-thisnode-reclaim-sysctl-to-allow-th.patch +ApplyPatch 1261-smb-client-fix-oob-reads-parsing-symlink-error-response.patch +ApplyPatch 1262-crypto-authenc-fix-sleep-in-atomic-context-in-decrypt-tail.patch +ApplyPatch 1263-crypto-authenc-correctly-pass-einprogress-back-up-to-the-cal.patch +ApplyPatch 1264-buffer-overflow-in-drivers-xen-sys-hypervisor-c.patch +ApplyPatch 1265-nvme-nvme-fc-move-tagset-removal-to-nvme-fc-delete-ctrl.patch +ApplyPatch 1266-nvme-nvme-fc-ensure-ioerr-work-is-cancelled-in-nvme-fc-delet.patch +ApplyPatch 1267-s390-dasd-fix-gendisk-parent-after-copy-pair-swap.patch +ApplyPatch 1268-s390-dasd-move-quiesce-state-with-pprc-swap.patch +ApplyPatch 1269-s390-dasd-copy-detected-format-information-to-secondary-devi.patch # END OF PATCH APPLICATIONS # Any further pre-build tree manipulations happen here. @@ -4101,6 +4135,30 @@ fi # # %changelog +* Wed Jun 11 2026 Andrew Lukoshko - 5.14.0-687.13.1 +- Recreate RHEL 5.14.0-687.13.1 from CentOS Stream 9 and upstream stable backports (1253-1269) +- RHEL changelog for 687.13.1 follows: + +* Tue Jun 02 2026 CKI KWF Bot [5.14.0-687.13.1.el9_8] +- smb: client: reject userspace cifs.spnego descriptions (Paulo Alcantara) [RHEL-178944] {CVE-2026-46243} +- s390/dasd: Copy detected format information to secondary device (Ramesh Chhetri) [RHEL-176472] +- s390/dasd: Move quiesce state with pprc swap (Ramesh Chhetri) [RHEL-176472] +- s390/dasd: Fix gendisk parent after copy pair swap (Ramesh Chhetri) [RHEL-176472] +- nvme: nvme-fc: Ensure ->ioerr_work is cancelled in nvme_fc_delete_ctrl() (Ewan D. Milne) [RHEL-171745] +- nvme: nvme-fc: move tagset removal to nvme_fc_delete_ctrl() (Ewan D. Milne) [RHEL-171745] +- Buffer overflow in drivers/xen/sys-hypervisor.c (Vitaly Kuznetsov) [RHEL-172510] {CVE-2026-31786} +- crypto: authenc - Correctly pass EINPROGRESS back up to the caller (Vladislav Dronov) [RHEL-172167] +- crypto: authenc - Fix sleep in atomic context in decrypt_tail (Vladislav Dronov) [RHEL-172167] +- smb: client: fix OOB reads parsing symlink error response (CKI Backport Bot) [RHEL-171471] {CVE-2026-31613} +- mm/page_alloc: add vm.thp_thisnode_reclaim sysctl to allow THP reclaim on local node (Nico Pache) [RHEL-164778] +- mm/page_alloc: simplify __alloc_pages_slowpath() flow (Nico Pache) [RHEL-164778] +- mm/page_alloc: refactor the initial compaction handling (Nico Pache) [RHEL-164778] +- mm/page_alloc: ignore the exact initial compaction result (Nico Pache) [RHEL-164778] +- mm, page_alloc, thp: prevent reclaim for __GFP_THISNODE THP allocations (Nico Pache) [RHEL-164778] +- mm/page_alloc.c: avoid infinite retries caused by cpuset race (Nico Pache) [RHEL-164778] +- mm: warn about illegal __GFP_NOFAIL usage in a more appropriate location and manner (Nico Pache) [RHEL-164778] +- mm: document __GFP_NOFAIL must be blockable (Nico Pache) [RHEL-164778] + * Sun Jun 07 2026 Andrew Lukoshko - 5.14.0-687.12.1 - Recreate RHEL 5.14.0-687.12.1 from CentOS Stream 9 and upstream stable backports (SOURCES/1198-1252)