60 lines
2.4 KiB
Diff
60 lines
2.4 KiB
Diff
From 0c5ea60367085f9ccb735f4c05ee9e35faebf640 Mon Sep 17 00:00:00 2001
|
|
From: Kamal Heib <kheib@redhat.com>
|
|
Date: Sun, 19 Apr 2026 18:24:30 -0400
|
|
Subject: [PATCH] net/mlx5: pagealloc: Fix reclaim race during command
|
|
interface teardown
|
|
|
|
JIRA: https://redhat.atlassian.net/browse/RHEL-169055
|
|
|
|
commit 79a0e32b32ac4e4f9e4bb22be97f371c8c116c88
|
|
Author: Shay Drory <shayd@nvidia.com>
|
|
Date: Mon Sep 29 00:02:08 2025 +0300
|
|
|
|
net/mlx5: pagealloc: Fix reclaim race during command interface teardown
|
|
|
|
The reclaim_pages_cmd() function sends a command to the firmware to
|
|
reclaim pages if the command interface is active.
|
|
|
|
A race condition can occur if the command interface goes down (e.g., due
|
|
to a PCI error) while the mlx5_cmd_do() call is in flight. In this
|
|
case, mlx5_cmd_do() will return an error. The original code would
|
|
propagate this error immediately, bypassing the software-based page
|
|
reclamation logic that is supposed to run when the command interface is
|
|
down.
|
|
|
|
Fix this by checking whether mlx5_cmd_do() returns -ENXIO, which mark
|
|
that command interface is down. If this is the case, fall through to
|
|
the software reclamation path. If the command failed for any another
|
|
reason, or finished successfully, return as before.
|
|
|
|
Fixes: b898ce7bccf1 ("net/mlx5: cmdif, Avoid skipping reclaim pages if FW is not accessible")
|
|
Signed-off-by: Shay Drory <shayd@nvidia.com>
|
|
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
|
|
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
|
|
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
|
|
Signed-off-by: Kamal Heib <kheib@redhat.com>
|
|
|
|
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c b/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c
|
|
index 9bc9bd83c232..cd68c4b2c0bf 100644
|
|
--- a/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c
|
|
+++ b/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c
|
|
@@ -489,9 +489,12 @@ static int reclaim_pages_cmd(struct mlx5_core_dev *dev,
|
|
u32 func_id;
|
|
u32 npages;
|
|
u32 i = 0;
|
|
+ int err;
|
|
|
|
- if (!mlx5_cmd_is_down(dev))
|
|
- return mlx5_cmd_do(dev, in, in_size, out, out_size);
|
|
+ err = mlx5_cmd_do(dev, in, in_size, out, out_size);
|
|
+ /* If FW is gone (-ENXIO), proceed to forceful reclaim */
|
|
+ if (err != -ENXIO)
|
|
+ return err;
|
|
|
|
/* No hard feelings, we want our pages back! */
|
|
npages = MLX5_GET(manage_pages_in, in, input_num_entries);
|
|
--
|
|
2.50.1 (Apple Git-155)
|
|
|