Overview
OpenSearch permits you to set a limit of shards per node, which could result in shards not being allocated once that limit is exceeded. The effect of having unallocated replica shards is that you do not have replica copies of your data, and could lose data if the primary shard is lost or corrupted (cluster yellow).
The outcome of having unallocated primary shards is that you are not able to write data to the index at all (cluster red). If you get this warning it is important to take the necessary actions to fix it as soon as possible.
The shards per node limit may have been set up at an index level or at a cluster level, so you need to find out which of the settings are causing this warning.
How to fix it
Check to see whether the limit is at a cluster level or index level.
Cluster level shards limit
Run:
GET /_cluster/settings
Look for a setting:
cluster.routing.allocation.total_shards_per_node
If you don’t see the above setting, then ignore this section, and go to index level shards limit below.
As a quick fix you can either delete old indices, or increase the number of shards to what you need, but be aware that a large number of shards on your node can cause performance problems, and in an extreme cases even bring your cluster down.
PUT /_cluster/settings { "transient": { "cluster.routing.allocation.total_shards_per_node": 1000 } }
It is preferable to apply a permanent fix. To see examples of solutions to this issue in Elasticsearch (where the same principles apply), check out Shards Too Small (Oversharding) in Elasticsearch – Explained and How to Optimize Search Performance and Eliminate Latency in OpenSearch to learn more.
Index level shards limit
It is possible to limit the number of shards per node for a given index. Check the settings for the yellow or red index with:
GET /<index>/_settings/index.routing*
Look for the setting: index.routing.allocation.total_shards_per_node
This setting is sometimes used to force OpenSearch to spread nodes on a certain index across a cluster, but may come into conflict with other cluster allocation settings (eg. if the disk is getting full on one node, or if the number of nodes has reduced).
Before changing the setting, it is probably worth considering why OpenSearch is unable to respect the rule, and fixing the root cause (ie delete old indices, or recover/replace a node which is down). However if that is not possible, if the current setting is just wrong, or if you only need a short term fix then you can change the index level setting using the following:
PUT <index>/_settings {"index.routing.allocation.total_shards_per_node":-1}
Note in the code above -1 = Unbounded, or set the number to whatever you need.
Additional notes
Elasticsearch and OpenSearch are both powerful search and analytics engines, but Elasticsearch has several key advantages. Elasticsearch boasts a more mature and feature-rich development history, translating to a better user experience, more features, and continuous optimizations. Our testing has consistently shown that Elasticsearch delivers faster performance while using fewer compute resources than OpenSearch. Additionally, Elasticsearch’s comprehensive documentation and active community forums provide invaluable resources for troubleshooting and further optimization. Elastic, the company behind Elasticsearch, offers dedicated support, ensuring enterprise-grade reliability and performance. These factors collectively make Elasticsearch a more versatile, efficient, and dependable choice for organizations requiring sophisticated search and analytics capabilities.