Elasticsearch Red Status

By Opster Team

Updated: Jan 28, 2024

| 4 min read

Quick links

Overview

A red status indicates that one or more indices do not have allocated primary shards. The causes may be similar to those described in Status Yellow, but certainly indicate that something is not right with the cluster.

What it means

A red status indicates that not only has the primary shard been lost, but also that a replica has not been promoted to primary in its place. However, just as with yellow status, you should not panic and start firing off commands without finding out what is happening first, because Elasticsearch has mechanisms in place which may recover the situation automatically.

Why it occurs

There can be several reasons why a red status may occur:

1. There are no replicas available to promote

This may happen because you only have one node, or by design you bravely specified number_of_replicas:0. In that case, corruption of data or loss of a node may result in your node becoming red without passing through the yellow stage.

2. Node crashes

If more than one node becomes overwhelmed or stops operating for any reason, for instance due to “out of memory” errors, then the first symptom will probably be that nodes become yellow or red as the shards fail to sync.

3. Networking issues

If nodes are not able to reach each other reliably, then the nodes will lose contact with one another and shards will get out of sync resulting in a red or yellow status. You may be able to detect this situation by finding repeated messages in the logs about nodes leaving or rejoining the cluster.

4. Disk space issues

Insufficient disk space may prevent Elasticsearch from allocating a shard to a node. Typically this will happen when disk utilization goes above the setting below:

cluster.routing.allocation.disk.watermark.low

Here the solution requires deleting indices, increasing disk size, or adding a new node to the cluster.  Of course you can also temporarily increase the watermark to keep things running while you decide what to do, but just putting off the decision until later is not the best course of action.

PUT _cluster/settings
{
  "transient": {
    "cluster.routing.allocation.disk.watermark.low": "85%",
     "cluster.info.update.interval": "1m"
  }
}

You can also get:

cannot allocate because allocation is not permitted to any of the nodes

Typically this happens when a node disk utilization goes above the flood stage, creating a write block on the cluster. As above, you must delete data, or add a new node. You can buy time with:

PUT _cluster/settings
{
  "transient": {
 
    "cluster.routing.allocation.disk.watermark.flood_stage": "97%",
    "cluster.info.update.interval": "1m"
  }
}

5. Node allocation awareness

Sometimes there may be specific issues with the allocation rules that have been created on the cluster which prevent the cluster from allocating shards. For example, it is possible to create rules that require that a shard’s replicas be spread over a specific set of nodes (“allocation awareness”), such as AWS availability zones or different host machines in a kubernetes setup.  On occasion, these rules may conflict with other rules (such as disk space) and prevent shards being allocated.

Find the cause of non-allocation:

You can use the cluster allocation API:

GET /_cluster/allocation/explain

By running the above command. You will get an explanation of the allocation status of the first unallocated shard found.

{
  "index" : "my_index",
  "shard" : 0,
  "primary" : false,
  "current_state" : "unassigned",
  "unassigned_info" : {
    "reason" : "NODE_LEFT",
    "at" : "2017-01-04T18:53:59.498Z",
    "details" : "node_left[G92ZwuuaRY-9n8_tc-IzEg]",
    "last_allocation_status" : "no_attempt"
  },
  "can_allocate" : "allocation_delayed",
  "allocate_explanation" : "cannot allocate because the cluster is still waiting 59.8s for the departed node holding a replica to rejoin, despite being allowed to allocate the shard to at least one other node",
  "configured_delay" : "1m",                      
  "configured_delay_in_millis" : 60000,
  "remaining_delay" : "59.8s",                    
  "remaining_delay_in_millis" : 59824,
  "node_allocation_decisions" : [
    {
      "node_id" : "pmnHu_ooQWCPEFobZGbpWw",
      "node_name" : "node_t2",
      "transport_address" : "127.0.0.1:9402",
      "node_decision" : "yes"
    },
    {
      "node_id" : "3sULLVJrRneSg0EfBB-2Ew",
      "node_name" : "node_t0",
      "transport_address" : "127.0.0.1:9400",
      "node_decision" : "no",
      "store" : {                                 
        "matching_size" : "4.2kb",
        "matching_size_in_bytes" : 4325
      },
      "deciders" : [
        {
          "decider" : "same_shard",
          "decision" : "NO",
          "explanation" : "the shard cannot be allocated to the same node on which a copy of the shard already exists [[my_index][0], node[3sULLVJrRneSg0EfBB-2Ew], [P], s[STARTED], a[id=eV9P8BN1QPqRc3B4PLx6cg]]"
        }
      ]
    }
  ]
}

The above api returns :

“unassigned_info”  => The reason why the shard became unassigned.

“node_allocation_decision” => A list of explanations for each node explaining whether it could potentially receive the shard.

“deciders” => The decision and the explanation of that decision.

How to recover a lost primary shard

A lost primary shard should usually be recovered automatically by promoting its replica. However, the cluster allocation explain api may indicate that this is not possible

{
  "index" : "test",
  "shard" : 0,
  "primary" : true,
  "current_state" : "unassigned",
  "unassigned_info" : {
    "reason" : "NODE_LEFT",
    "at" : "2017-01-04T18:03:28.464Z",
    "details" : "node_left[OIWe8UhhThCK0V5XfmdrmQ]",
    "last_allocation_status" : "no_valid_shard_copy"
  },
  "can_allocate" : "no_valid_shard_copy",
  "allocate_explanation" : "cannot allocate because a previous copy of the primary shard existed but can no longer be found on the nodes in the cluster"
}

In this case, the following options are available:

1. Wait for the node to come back online

If the lost node went down or restarted, it may be a matter of time before the node is restarted and the shard becomes available again. 

2. Restore a snapshot

It is generally preferable to restore a snapshot in a known state (eg. 30 minutes ago) than try to recover corrupted data in an unknown state.

3. Restore from corrupted shard

As a last resort, if there is no possibility to recover the node and if no snapshot is available then it may be possible to promote a stale shard.  However, this means that data will be lost, and in the event that the lost node recovers, the data will be overwritten with the stale data. The command to restore is:

POST /_cluster/reroute
{
	"commands" : [
 
    	{
      	"allocate_stale_primary" : {
            	"index" : "test", "shard" : 0,
            	"node" : "es01",
            	"accept_data_loss":"true"
      	}
    	}
	]
}