Elasticsearch How to Recover OpenSearch Dangling Indices

By Opster Expert Team

Updated: Oct 17, 2023

| 5 min read

Quick links

Discovering dangling indices is something that has happened to Opster’s customers a few times. The procedure below detailing how to recover data from old nodes was created by the Opster team for the customers who needed it. This process, and additional data recovery processes, are covered by Opster AutoOps.

Definition

What are dangling indices in OpenSearch?

Dangling indices occur when a node that has several indices stored locally, joins a cluster and those local indices do not exist in the cluster metadata. In other words, if a node that was previously connected to a cluster connects to a new cluster, the indices on that node are marked as dangling indices.

When and why you should use the dangling API

The cluster metadata describes how to read the data stored on the data nodes. If all the master nodes lose their metadata, then everything stored on the data nodes in the cluster cannot be read. In this situation, you should create a new cluster and connect your existing data nodes to it. After you connect all the data nodes to the new cluster, the old indices will now be dangling indices. You can use the API to recover the data from those indices.

In this article, we will cover how to list and restore dangling indices.

How to list and restore dangling indices in OpenSearch

Step 1. Run the dangling indices API and copy the index-uuid from the response

GET /_dangling
 {
      "index_name" : "<index_name>",
      "index_uuid" : "D_FDBv2NSZGZzrliLc_4AA",
      "creation_date_millis" : 1666289487991,
      "node_ids" : [
        "5vKcjNSjTtmXbUIY98oRSw",
        "-prq1HnZSDmjtHn4fp3skQ",
        "jl9HozrpQNqYqctvI40ISw",
        "P_tR9BvITiqz7SOPrrBiAg",
        "mH6iK9xsSsqo9dBwLNM06A",
        "gVEFddF6SNqRfao6te-TMA",
        "2xeISkW1QRiGJXQ8b9bRtQ",
        "wxLLkuzgTeyY9aHlXRIBNg"
      ]
    }

Step 2. Restore the dangling index

POST /_dangling/<index-uuid>?accept_data_loss=true
{ "acknowledged" : true }

It’s important to note this field: accept_data_loss (Required, Boolean). This field must be set to true in order to import a dangling index. OpenSearch cannot know where the dangling index data came from or determine which shard copies are fresh and which are stale. Therefore, it cannot guarantee that the imported data represents the latest state of the index when it was last in the cluster.

Step 3. (Optional) Use the automated script to restore all dangling indices

It’s not possible to use a wildcard(*) with the _dangling API. If you have hundreds of dangling indices, you can use the following bash script to detect and restore your dangling indices.

$ curl -XGET -ks "https://localhost:9200/_dangling" -u opensearch:<password> | grep "index-uuid" >> index_uuid.txt
$ list1=$(cat index_uuid.txt)
$ for line in $list1; do echo -e "$line"; curl -XPOST -ks "https://localhost:9200/_dangling/$line?accept_data_loss=true" -u opensearch:<password>; done

Step 4. Check cluster status

Some indices may be corrupted. In order to check the indices, we should check the cluster status.

curl -ks https://localhost:9200/_cat/health?v -u opensearch:<password>
epoch      timestamp cluster                  status node.total node.data shards pri relo init unassign pending_tasks max_task_wait_time active_shards_percent
1672058540 12:42:20  <cluster_name> red            41        38    969 332    0    0        2             0                  -                 99.8%

Step 5. If you see unassigned shards, check the allocation explain API

curl -ks https://localhost:9200/_cluster/allocation/explain -u opensearch:<password>
"note" : "No shard was specified in the explain API request, so this response explains a randomly chosen unassigned shard. There may be other unassigned shards in this cluster which cannot be assigned for different reasons. It may not be possible to assign this shard until one of the other shards is assigned correctly. To explain the allocation of other shards (whether assigned or unassigned) you must specify the target shard in the request to this API.",
  "index" : "<index_name>",
  "shard" : 2,
  "primary" : true,
  "current_state" : "unassigned",
  "unassigned_info" : {
    "reason" : "DANGLING_INDEX_IMPORTED",
    "at" : "2022-12-26T12:27:33.255Z",
    "last_allocation_status" : "no_valid_shard_copy"
  },
  "can_allocate" : "no_valid_shard_copy",
  "allocate_explanation" : "OpenSearch can't allocate this shard because there are no copies of its data in the cluster. OpenSearch will allocate this shard when a node holding a good copy of its data joins the cluster. If no such node is available, restore this index from a recent snapshot.",
  "node_allocation_decisions" : [
    {
      "node_id" : "<node_id>",
      "node_name" : "<node_name>",
      "transport_address" : "10.6.11.242:9300",
      "node_attributes" : {
        "aws_availability_zone" : "us-east-1a",
        "xpack.installed" : "true",
        "transform.node" : "true"
      },
      "node_decision" : "no",
      "store" : {
        "found" : false
      }
    }

Step 6. Restore the corrupted shards

There were 5 primary shards for the index in question. One of them, “primary shard 2”, was corrupted. We will try to allocate that specific index shard using allocate_stale_primary.

First, we’ll restore the index with allocate_stale_primary.

This API will try to restore the data in shard 2.

curl -ks  -XPOST "https://localhost:9200/_cluster/reroute" -u opensearch:<password> -H "Content-Type: application/json" -d'
{
  "commands": [
    {
      "allocate_stale_primary": {
        "index": "<index_name>",
        "shard": 2,
        "node": "<node_name>",
        "accept_data_loss": true
      }
    }
  ]
}
{"error":{"root_cause":[{"type":"illegal_argument_exception","reason":"No data for shard [2] of index [<index_name>] found on any node"}],"type":"illegal_argument_exception","reason":"No data for shard [2] of index [<index_name>] found on any node"},"status":400}

As we can see in the response above, the data in “shard 2” is unfortunately lost. It’s not possible to recover it. We now need to run the reroute API with allocate_empty_primary and accept_data_loss: true flags.

Now we will try to restore the index with allocate_empty_primary.

This API will allocate shard 2 as empty. The purpose of this operation is to partially recover the data in the index. An index cannot be used with a missing shard.

curl -ks -XPOST "https://localhost:9200/_cluster/reroute" -u opensearch:<password> -H "Content-Type: application/json" -d'
{
  "commands": [
    {
      "allocate_empty_primary": {
        "index": "<index_name>",
        "shard": 2,
        "node": "<node_name>",
        "accept_data_loss": true
      }
    }
  ]
}

Step 7. Check the index status

curl -ks https://localhost:9200/_cat/indices/<index_name>?v -u opensearch:<password>
health status index                           uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   <index_name> D_FDBv2NSZGZzrliLc_4AA   5   1 1259593649            0   1008.9gb        504.5gb

The index health is GREEN. Repeat the same procedure for all unassigned indices, and then you can jump to step 5 and check the allocation explain again.

Summary and important notes

  • In this article, we learned what a dangling index is, how it is formed and how to import it.
  • Remember that the dangling API cannot offer any guarantees as to whether the imported data truly represents the latest state of the data when the index was still part of the cluster.
  • Before running the dangling API, make sure that all the data nodes are connected to the new cluster.