Overview
Adaptive replica selection is a process intended to prevent a distressed OpenSearch node from delaying the response to queries, while reducing the search load on that node.
To understand how it works, imagine a situation where a single node is in distress. This could be because of hardware, network or configuration issues, but as a consequence the response time for shards on that node are much longer than the response time from the other nodes.
When an OpenSearch node receives a query, it needs to receive a response from all of the shards in all of the indices covered by that query so multiple nodes are usually involved in producing the response. Without adaptive replica selection, OpenSearch would check which replicas are available from all the nodes including the node in distress, and request responses for each shard from the other nodes based on a “round robin” approach. Using adaptive replica selection, OpenSearch will only request data from shards on a distressed node when there is no other alternative (i.e. when there are no other replicas), resulting in reduced load on distressed nodes, and shorter response times.
How to resolve it / Enable it
By default, adaptive replica selection is enabled. You can enable it manually by running the following:
PUT /_cluster/settings { "transient": { "cluster.routing.use_adaptive_replica_selection": true } }