59 lines
2.7 KiB
Diff
59 lines
2.7 KiB
Diff
commit 00416008b8ce018dd149182bf54a650eb95f9309
|
|
Author: Mahesh Salgaonkar <mahesh@linux.ibm.com>
|
|
Date: Fri Sep 19 22:49:44 2025 +0530
|
|
|
|
external/opal-prd: Fix opal-prd service shutdown on memory errors
|
|
|
|
Whenever there is a memory error reported, opal-prd tries to spawn a
|
|
child process using fork to delegate the memory offline work to child
|
|
process. After handling memory error child process suppose to exit.
|
|
However, instead of delegating the task to child process the main thread
|
|
itself handles the memory error and exits. Thus causing opal-prd service
|
|
to go into stop/restart loop and eventually hits the systemd restart
|
|
limit leaving opal-prd service unavailable.
|
|
|
|
opal-prd[49096]: MEM: Memory error: range 0000000eeb445700-0000000eeb445700, type: correctable
|
|
opal-prd[49096]: MEM: Offlined 0000000eeb445700,0000000eeb455700, type correctable: No such file or directory
|
|
systemd[1]: opal-prd.service: Service RestartSec=100ms expired, scheduling restart.
|
|
systemd[1]: opal-prd.service: Scheduled restart job, restart counter is at 7.
|
|
systemd[1]: opal-prd.service: Start request repeated too quickly.
|
|
systemd[1]: opal-prd.service: Failed with result 'start-limit-hit'.
|
|
systemd[1]: Failed to start OPAL PRD daemon
|
|
|
|
The fork() function, on success, returns pid of child process (pid > 0)
|
|
in the parent and 0 in the child. Instead of invoking memory worker
|
|
when return value pid == 0, it invokes worker when pid > 0 which is
|
|
parent process itself.
|
|
|
|
pid = fork();
|
|
if (pid > 0)
|
|
exit(memory_error_worker(sysfsfile, typestr, i_start_addr,
|
|
i_endAddr));
|
|
|
|
The above logic causes the parent thread to exit after handling memory
|
|
error. Fix this by changing the if condition to (pid == 0).
|
|
|
|
Fixes: 8cbd0de88d16 ("opal-prd: Have a worker process handle page offlining")
|
|
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.ibm.com>
|
|
Signed-off-by: Reza Arbab <arbab@linux.ibm.com>
|
|
|
|
diff --git a/external/opal-prd/opal-prd.c b/external/opal-prd/opal-prd.c
|
|
index 1c610da4c..da947c827 100644
|
|
--- a/external/opal-prd/opal-prd.c
|
|
+++ b/external/opal-prd/opal-prd.c
|
|
@@ -755,9 +755,13 @@ int hservice_memory_error(uint64_t i_start_addr, uint64_t i_endAddr,
|
|
/*
|
|
* HBRT expects the memory offlining process to happen in the background
|
|
* after the notification is delivered.
|
|
+ *
|
|
+ * fork() return value:
|
|
+ * On success, the PID of the child process is returned in the parent,
|
|
+ * and 0 is returned in the child.
|
|
*/
|
|
pid = fork();
|
|
- if (pid > 0)
|
|
+ if (pid == 0)
|
|
exit(memory_error_worker(sysfsfile, typestr, i_start_addr, i_endAddr));
|
|
|
|
if (pid < 0) {
|