What does this mean?
There are recommended values for the cluster concurrent rebalance setting. If the value is set to lower than recommended, it means that the current setting for the maximum number of shards the cluster can move at any one time to rebalance the distribution of shards across the nodes is too low. This can lead to problems with the cluster’s ability to allocate shards and maintain a healthy state.
Why does this occur?
This occurs when the cluster concurrent rebalance setting is not configured optimally. The setting determines the maximum number of shards that can be moved simultaneously to balance the distribution of shards across the nodes in the cluster. If the setting is too low, the cluster may struggle to move shards from a full disk to another node with more disk space, leading to potential issues with disk space and cluster health.
Possible impacts and consequences of cluster concurrent rebalance issues
The impact of this can be significant. If the cluster is unable to rebalance shards away from a node with a disk that is close to full, some nodes may be unable to allocate shards. This could result in the cluster turning yellow or red and not being able to write new data to certain indices. In the worst-case scenario, the disk may continue to fill up and reach the disk flood stage threshold, at which point it will no longer be possible to write to the index.
How to resolve
To resolve this issue, follow these steps:
1. Increase concurrent rebalance settings: This setting controls the number of shards being moved between nodes on the cluster. A low number will increase the cluster rebalance process time. To increase the concurrent rebalance setting, use the following command:
PUT /_cluster/settings { "transient": { "cluster.routing.allocation.cluster_concurrent_rebalance": <new_value> } }
Replace `<new_value>` with the desired number of concurrent shard movements. It is recommended to set this value based on the cluster size and hardware capabilities.
2. Monitor disk space usage: Keep an eye on the disk space usage of your Elasticsearch nodes to ensure that they do not reach the disk flood stage threshold. When using Opster AutoOps, you can simply turn to the Data section of the Node View dashboard to troubleshoot this. If you aren’t using AutoOps, you can use other monitoring tools or Elasticsearch APIs to check disk usage.
3. Optimize index settings: Review your index settings and consider optimizing them to reduce disk space usage. This may include adjusting the number of replicas, using force merge, or adjusting the shard size.
4. Add more nodes or increase disk space: If the issue persists, consider adding more nodes to the cluster or increasing the disk space available to the existing nodes.
Conclusion
By following this guide, you should be able to resolve the issue related to Elasticsearch’s cluster concurrent rebalance setting being set to a lower value than recommended. Ensuring that your cluster is properly configured and monitored will help maintain its health and performance.