Elasticsearch Excessive Replicas on Data Nodes

By Opster Team

Updated: Mar 10, 2024

| 2 min read

What does this mean?

Replicas are copies of primary shards that provide redundancy and improve search performance. When there are more replicas than necessary on some indices within the Elasticsearch data nodes it can lead to negative consequences, such as reduced indexing performance and increased storage costs.

Why does this occur?

This occurs when the Elasticsearch cluster has been configured with a higher number of replicas than required for optimal performance. This can happen due to various reasons, such as:

Default settings: Elasticsearch sets the default number of replicas to 1, which may not be suitable for all use cases.
Misconfiguration: The number of replicas may have been set too high during cluster configuration.
Changes in cluster size or usage patterns: The optimal number of replicas may change as the cluster grows or as the usage patterns evolve.

Possible impact and consequences of excessive replicas

The presence of excessive replicas on data nodes can lead to the following consequences:

Reduced performance: Additional replicas consume more resources, such as CPU, memory, and disk space, which can negatively impact the overall performance of the cluster.
Increased storage costs: More replicas require more storage space, leading to higher costs.
Lower indexing throughput: The process of indexing new data can be slowed down due to the increased resource usage by replicas.
Cluster instability: The additional resource usage can lead to instability in the cluster, causing issues such as node failures or slow response times.

How to resolve

To resolve the issue of excessive replicas on data nodes, follow these steps:

1. Analyze the current replica configuration: Use the following command to check the current number of replicas for each index:

GET /_cat/indices?v

2. Determine the optimal number of replicas: Based on your cluster size, usage patterns, and performance requirements, decide on the appropriate number of replicas for each index.

3. Reduce the number of replicas: Update the number of replicas for each index using the following command:

PUT /<index_name>/_settings
{
  "index" : {
    "number_of_replicas" : <new_replica_count>
  }
}

Replace `<index_name>` with the name of the index and `<new_replica_count>` with the desired number of replicas.

As a general recommendation, consider reducing the number of replicas to 1 on hot and warm data nodes. On cold and frozen nodes, there’s no need for replicas at all since the index is backed by a snapshot. However, be aware that this may lead to an increase in search latency since fewer nodes will be available to participate in search operations.

Reducing the number of replicas presents a tradeoff between cost and performance, and this tradeoff may not always apply. It is essential to consider the specific requirements of your Elasticsearch cluster when making this decision.

4. Monitor the cluster: After updating the replica configuration, monitor the cluster’s performance and stability to ensure that the changes have had the desired effect. Adjust the number of replicas as needed based on your observations.

Conclusion

By following this guide, you can resolve the issue of excessive replicas on Elasticsearch data nodes, leading to improved performance, reduced storage costs, and enhanced cluster stability.

Elasticsearch Excessive Replicas on Elasticsearch Data Nodes

What does this mean?

Why does this occur?

Possible impact and consequences of excessive replicas

How to resolve

Conclusion