Quick links
- What are dangling indices?
- When and why you should use the dangling API
- How to list and restore dangling indices in Elasticsearch
- Summary and important notes
What are dangling indices?
Dangling indices occur when a node that has several indices stored locally, joins a cluster and those local indices do not exist in the cluster metadata. In other words, if a node that was previously connected to a cluster connects to a new cluster, the indices on that node are marked as dangling indices.
Some version notes on the Dangling API:
- Dangling API is available starting from Elasticsearch v0.x.
- Automatically importing dangling indices was deprecated after v7.9+, and it’s disabled by default.
- Automatically importing dangling indices was removed in Elasticsearch 8.0 (you can import manually).
When and why you should use the dangling API
The cluster metadata describes how to read the data stored on the data nodes. If all the master nodes lose their metadata, then everything stored on the data nodes in the cluster cannot be read. In this situation, you should create a new cluster and connect your existing data nodes to it. After you connect all the data nodes to the new cluster, the old indices will now be dangling indices. You can use the API to recover the data from those indices.
In this article, we will cover how to list and restore dangling indices.
How to list and restore dangling indices in Elasticsearch
Here’s how to list and restore dangling indices in Elasticsearch:
- Run the dangling indices API and copy the index-uuid from the response.
- Restore the dangling index.
- (Optional) Use the automated script to restore all dangling indices
- Check cluster status.
- If you see unassigned shards, check the allocation explain API.
- Restore the corrupted shards.
- Check the index status.
Step 1. Run the dangling indices API and copy the index-uuid from the response
GET /_dangling
{ "index_name" : "<index_name>", "index_uuid" : "D_FDBv2NSZGZzrliLc_4AA", "creation_date_millis" : 1666289487991, "node_ids" : [ "5vKcjNSjTtmXbUIY98oRSw", "-prq1HnZSDmjtHn4fp3skQ", "jl9HozrpQNqYqctvI40ISw", "P_tR9BvITiqz7SOPrrBiAg", "mH6iK9xsSsqo9dBwLNM06A", "gVEFddF6SNqRfao6te-TMA", "2xeISkW1QRiGJXQ8b9bRtQ", "wxLLkuzgTeyY9aHlXRIBNg" ] }
Step 2. Restore the dangling index
POST /_dangling/<index-uuid>?accept_data_loss=true
{ "acknowledged" : true }
It’s important to note this field: accept_data_loss (Required, Boolean). This field must be set to true in order to import a dangling index. Elasticsearch cannot know where the dangling index data came from or determine which shard copies are fresh and which are stale. Therefore, it cannot guarantee that the imported data represents the latest state of the index when it was last in the cluster.
Step 3. (Optional) Use the automated script to restore all dangling indices
It’s not possible to use a wildcard(*) with the _dangling API. If you have hundreds of dangling indices, you can use the following bash script to detect and restore your dangling indices.
$ curl -XGET -ks "https://localhost:9200/_dangling?pretty" -u elastic:<password> | grep "index_uuid" | awk -F'"' '/"index_uuid": "/{print $4}' > index_uuid.txt $ list1=$(cat index_uuid.txt) $ for line in $list1; do echo -e "$line"; curl -XPOST -ks "https://localhost:9200/_dangling/$line?accept_data_loss=true" -u elastic:<password>; done
Step 4. Check cluster status
Some indices may be corrupted. In order to check the indices, we should check the cluster status.
curl -ks https://localhost:9200/_cat/health?v -u elastic:<password>
epoch timestamp cluster status node.total node.data shards pri relo init unassign pending_tasks max_task_wait_time active_shards_percent 1672058540 12:42:20 <cluster_name> red 41 38 969 332 0 0 2 0 - 99.8%
Step 5. If you see unassigned shards, check the allocation explain API
curl -ks https://localhost:9200/_cluster/allocation/explain -u elastic:<password>
"note" : "No shard was specified in the explain API request, so this response explains a randomly chosen unassigned shard. There may be other unassigned shards in this cluster which cannot be assigned for different reasons. It may not be possible to assign this shard until one of the other shards is assigned correctly. To explain the allocation of other shards (whether assigned or unassigned) you must specify the target shard in the request to this API.", "index" : "<index_name>", "shard" : 2, "primary" : true, "current_state" : "unassigned", "unassigned_info" : { "reason" : "DANGLING_INDEX_IMPORTED", "at" : "2022-12-26T12:27:33.255Z", "last_allocation_status" : "no_valid_shard_copy" }, "can_allocate" : "no_valid_shard_copy", "allocate_explanation" : "Elasticsearch can't allocate this shard because there are no copies of its data in the cluster. Elasticsearch will allocate this shard when a node holding a good copy of its data joins the cluster. If no such node is available, restore this index from a recent snapshot.", "node_allocation_decisions" : [ { "node_id" : "<node_id>", "node_name" : "<node_name>", "transport_address" : "10.6.11.242:9300", "node_attributes" : { "aws_availability_zone" : "us-east-1a", "xpack.installed" : "true", "transform.node" : "true" }, "node_decision" : "no", "store" : { "found" : false } }
Step 6. Restore the corrupted shards
There were 5 primary shards for the index in question. One of them, “primary shard 2”, was corrupted. We will try to allocate that specific index shard using allocate_stale_primary.
First, we’ll restore the index with allocate_stale_primary.
This API will try to restore the data in shard 2.
curl -ks -XPOST "https://localhost:9200/_cluster/reroute" -u elastic:<password> -H "Content-Type: application/json" -d' { "commands": [ { "allocate_stale_primary": { "index": "<index_name>", "shard": 2, "node": "<node_name>", "accept_data_loss": true } } ] }
{"error":{"root_cause":[{"type":"illegal_argument_exception","reason":"No data for shard [2] of index [<index_name>] found on any node"}],"type":"illegal_argument_exception","reason":"No data for shard [2] of index [<index_name>] found on any node"},"status":400}
As we can see in the response above, the data in “shard 2” is unfortunately lost. It’s not possible to recover it. We now need to run the reroute API with allocate_empty_primary and accept_data_loss: true flags.
Now we will try to restore the index with allocate_empty_primary.
This API will allocate shard 2 as empty. The purpose of this operation is to partially recover the data in the index. An index cannot be used with a missing shard.
curl -ks -XPOST "https://localhost:9200/_cluster/reroute" -u elastic:<password> -H "Content-Type: application/json" -d' { "commands": [ { "allocate_empty_primary": { "index": "<index_name>", "shard": 2, "node": "<node_name>", "accept_data_loss": true } } ] }
Step 7. Check the index status
curl -ks https://localhost:9200/_cat/indices/<index_name>?v -u elastic:<password>
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size green open <index_name> D_FDBv2NSZGZzrliLc_4AA 5 1 1259593649 0 1008.9gb 504.5gb
The index health is GREEN. Repeat the same procedure for all unassigned indices, and then you can jump to step 5 and check the allocation explain again.
Summary and important notes
- In this article, we learned what a dangling index is, how it is formed and how to import it.
- Remember that the dangling API cannot offer any guarantees as to whether the imported data truly represents the latest state of the data when the index was still part of the cluster.
- Before running the dangling API, make sure that all the data nodes are connected to the new cluster.