From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001 From: Benjamin Marzinski Date: Tue, 29 Mar 2022 22:22:10 -0500 Subject: [PATCH] multipathd: Don't keep starting TUR threads, if they always hang. If tur thead hangs, multipathd was simply creating a new thread, and assuming that the old thread would get cleaned up eventually. I have seen a case recently where there were 26000 multipathd threads on a system, all stuck trying to send TUR commands to path devices. The root cause of the issue was a scsi kernel issue, but it shows that the way multipathd currently deals with stuck threads could use some refinement. Now, when one tur thread hangs, multipathd will act as it did before. If a second one in a row hangs, multipathd will instead wait for it to complete before starting another thread. Once the thread completes, the count is reset. Signed-off-by: Benjamin Marzinski Reviewed-by: Martin Wilck holders) > 1) { + /* The thread has been cancelled but hasn't quit. */ + if (ct->nr_timeouts == MAX_NR_TIMEOUTS) { + condlog(2, "%d:%d : waiting for stalled tur thread to finish", + major(ct->devt), minor(ct->devt)); + ct->nr_timeouts++; + } /* - * The thread has been cancelled but hasn't quit. + * Don't start new threads until the last once has + * finished. + */ + if (ct->nr_timeouts > MAX_NR_TIMEOUTS) { + c->msgid = MSG_TUR_TIMEOUT; + return PATH_TIMEOUT; + } + ct->nr_timeouts++; + /* + * Start a new thread while the old one is stalled. * We have to prevent it from interfering with the new * thread. We create a new context and leave the old * one with the stale thread, hoping it will clean up @@ -376,13 +393,15 @@ int libcheck_check(struct checker * c) */ if (libcheck_init(c) != 0) return PATH_UNCHECKED; + ((struct tur_checker_context *)c->context)->nr_timeouts = ct->nr_timeouts; if (!uatomic_sub_return(&ct->holders, 1)) /* It did terminate, eventually */ cleanup_context(ct); ct = c->context; - } + } else + ct->nr_timeouts = 0; /* Start new TUR checker */ pthread_mutex_lock(&ct->lock); tur_status = ct->state = PATH_PENDING;