- Resolves: RHEL-23437, call_home command diag_nvme fails on nvmf drive

This commit is contained in:
Than Ngo 2024-02-02 14:45:06 +01:00
parent e86f2df64a
commit 550d9fa530
2 changed files with 78 additions and 1 deletions

View File

@ -0,0 +1,71 @@
commit db0c6d7974d7f8909878384d77ec02457759d6df
Author: Nilay Shroff <nilay@linux.ibm.com>
Date: Tue Jan 16 13:55:03 2024 +0530
diags/diag_nvme: call_home command fails on nvmf drive
The diag_nvme command needs to retrieve the VPD log page from NVMe for
filling in the product data while generating the call-home event.
However, call-home feature is supported for directly attached NVMe
module. In the current diag_nvme implementation, if user doesn't
provide NVMe device name for diagnostics then it(diag_nvme) loops
through each NVMe moudle (directly connected to the system/LPAR as
well as discovered over fabrics) and attempt retrieving the SMART log
page as well as VPD page. Unfortunately, diag_nvme fails to retrieve
the VPD page for NVMe connected over fabrics and that causes the
diag_nvme to print "not-so-nice" failure messages on console.
Henec fixed the diag_nvme code so that for call-home event reporting,
it skips the NVMe which is connected over fabrics and prints a
"nice-message" informing the user that it's skipping diagnosting for
NVMe module connected over fabrics. In a nutshell, with this fix now
diag_nvme would only diagnose the NVMe module which is directtly
attached (over PCIe) to the system.
Signed-off-by: Nilay Shroff <nilay@linux.ibm.com>
diff --git a/diags/diag_nvme.c b/diags/diag_nvme.c
index c1c0a20..e86786c 100644
--- a/diags/diag_nvme.c
+++ b/diags/diag_nvme.c
@@ -375,9 +375,40 @@ static int diagnose_nvme(char *device_name, struct notify *notify, char *file_pa
char endurance_s[sizeof(vpd.endurance) + 1], capacity_s[sizeof(vpd.capacity)+1];
uint64_t event_id;
uint8_t severity;
+ FILE *fp;
+ char tr_file_path[PATH_MAX];
uint32_t raw_data_len = 0;
unsigned char *raw_data = NULL;
+ /*
+ * Skip diag test if NVMe is connected over fabric
+ */
+ snprintf(tr_file_path, sizeof(tr_file_path),
+ NVME_SYS_PATH"/%s/%s", device_name, "transport");
+ fp = fopen(tr_file_path, "r");
+ if (fp) {
+ char buf[12];
+ int n = fread(buf, 1, sizeof(buf), fp);
+
+ if (n) {
+ /*
+ * If NVMe transport is anything but pcie then skip the diag test
+ */
+ if (strncmp(buf, "pcie", 4) != 0) {
+ fprintf(stdout, "Skipping diagnostics for nvmf : %s\n",
+ device_name);
+ fclose(fp);
+ return 0;
+ }
+ }
+ fclose(fp);
+ } else {
+ fprintf(stderr, "Skipping diagnostics for %s:\n"
+ "Unable to find the nvme transport type\n",
+ device_name);
+ return -1;
+ }
+
tmp_rc = regex_controller(controller_name, device_name);
if (tmp_rc != 0)
return -1;

View File

@ -3,7 +3,7 @@
Name: ppc64-diag
Version: 2.7.9
Release: 2%{?dist}
Release: 3%{?dist}
Summary: PowerLinux Platform Diagnostics
URL: https://github.com/power-ras/ppc64-diag
Group: System Environment/Base
@ -46,6 +46,8 @@ Patch10: ppc64-diag-2.7.9-handle_multiple_platform_dumps.patch
Patch11: ppc64-diag-2.7.9-moving-trim_trail_space-function.patch
Patch12: ppc64-diag-2.7.9-utilize-trim_trail_space-in-event_fru_callout.patch
Patch13: ppc64-diag-2.7.9-enable-light-path-diagnostics-for-RTAS-events.patch
# call_home command "diag_nvme" fails on nvmf drive(nvmf/lpfc/Power10)
Patch14: ppc64-diag-2.7.9-call_home-fail-on-nvmf-device
%description
This package contains various diagnostic tools for PowerLinux.
@ -181,6 +183,10 @@ if [ "$1" = "0" ]; then # last uninstall
fi
%changelog
* Wed Jan 31 2024 Than Ngo <than@redhat.com> - 2.7.9-3
- call_home command "diag_nvme" fails on nvmf drive
Resolves: RHEL-23437
* Sun Dec 10 2023 Than Ngo <than@redhat.com> - 2.7.9-2
- Enable light path diagnostics for RTAS events
- Handle multiple platform dumps