From 028bd6aab181104fe68166c8ec9c0485e12f9376 Mon Sep 17 00:00:00 2001 From: Damien Ciabrini Date: Fri, 18 Sep 2020 18:34:22 +0200 Subject: [PATCH] galera: recover from joining a non existing cluster galera being a M/S resource, the resource agent decides when and how to promote a resource based on the current state of the galera cluster. If there's no cluster, a resource is promoted as the bootstrap node. Otherwise it is promoted as a joiner node. There can be some time between the moment when a node is promoted and when the promote operation effectively takes place. So if a node is promoted for joining a cluster, all the running galera nodes are stopped before the promote operation start, the joining node won't be able to join the cluster, and it can't bootstrap a new one either because it doesn't have the most recent copy of the DB. In that case, do not make the promotion fail, and force a demotion instead. This ensures that a normal bootstrap election will take place eventually, without blocking the joining node due to a failed promotion. --- heartbeat/galera | 13 ++++++++++--- 1 file changed, 10 insertions(+), 3 deletions(-) diff --git a/heartbeat/galera b/heartbeat/galera index 74f11d8c5..d2f4faa86 100755 --- a/heartbeat/galera +++ b/heartbeat/galera @@ -727,9 +727,16 @@ galera_promote() ocf_log info "Node <${NODENAME}> is bootstrapping the cluster" extra_opts="--wsrep-cluster-address=gcomm://" else - ocf_exit_reason "Failure, Attempted to promote Master instance of $OCF_RESOURCE_INSTANCE before bootstrap node has been detected." - clear_last_commit - return $OCF_ERR_GENERIC + # We are being promoted without having the bootstrap + # attribute in the CIB, which means we are supposed to + # join a cluster; however if we end up here, there is no + # Master remaining right now, which means there is no + # cluster to join anymore. So force a demotion, and and + # let the RA decide later which node should be the next + # bootstrap node. + ocf_log warn "There is no running cluster to join, demoting ourself" + clear_master_score + return $OCF_SUCCESS fi fi