Master node failed restarting discovery – How to solve this Elasticsearch error

Opster Team

Aug-23, Version: 7.3-7.15

Briefly, this error occurs when Elasticsearch cannot restart its discovery process, which is crucial for identifying and connecting to other nodes in the cluster. This could be due to network issues, configuration errors, or resource limitations. To resolve this, you can check the network connectivity between nodes, verify the discovery settings in the Elasticsearch configuration file, or ensure that the master node has sufficient resources (CPU, memory, disk space) to perform its tasks. Additionally, reviewing the Elasticsearch logs can provide more detailed information about the cause of the error.

we recommend you run Elasticsearch Error Check-Up which can resolve issues that cause many errors.

Advanced users might want to skip right to the common problems section in each concept or try running the Check-Up which analyses ES to pinpoint the cause of many errors and provides suitable actionable recommendations how to resolve them (free tool that requires no installation).

Overview

To ensure proper functioning of the Elasticsearch cluster, it is essential to have the master node up and running. In scenarios where the master node is down or disconnected from the cluster network, Elasticsearch will again start the discovery process (to find a new master node). 

This discovery process is tracked in the INFO logs of the Elaticsearch cluster, and you should see the lines below when it occurs.

 [2020-03-21T12:33:12,969][INFO ][o.e.c.c.Coordinator ] [node-1] master node [{node-2}{8OghL2LnR1KudCiKhdGNjg}{elLEWn7GQje6YO4vo_-5UA}{172.31.36.118}{172.31.36.118:9300}{dilm}{ml.machine_memory=2088439808, ml.max_open_jobs=20, xpack.installed=true}] failed, restarting discovery 


If there are other master eligible nodes available in the cluster and the cluster coordination algorithm can find  the new master nodes, then you’re all set. But if it’s not able to find the new master node and consistently sends the WARN log shown below, then you need to look at your cluster configuration and error messages in the cluster log to troubleshoot the issue further.

master not discovered yet:

Some tips for when a new master is not discovered:

  1. If the old master node is down, try to bring it up.
  2. If the old master node disconnected from the network, add it to the network again.
  3. Try to allocate more master eligible nodes, as Elasticsearch uses the voting algorithm from master eligible nodes to elect the new master node.

Log Context

Log “master node [{}] failed; restarting discovery” classname is Coordinator.java.
We extracted the following from Elasticsearch source code for those seeking an in-depth context :

    private void onLeaderFailure(Exception e) {
        synchronized (mutex) {
            if (mode != Mode.CANDIDATE) {
                assert lastKnownLeader.isPresent();
                logger.info(new ParameterizedMessage("master node [{}] failed; restarting discovery"; lastKnownLeader.get()); e);
            }
            becomeCandidate("onLeaderFailure");
        }
    }

 

 [ratemypost]