Quick links
- Definition – What is segment replication in OpenSearch?
- How to enable segment replication in OpenSearch
- Conclusion
Definition
Segment replication is an experimental feature on OpenSearch 2.3 that changes the way data is replicated from primary shards to replica shards. This feature copies segments directly to the replica nodes disk after refresh, significantly improving the indexing throughput.
Currently, OpenSearch uses document replication. This means running an indexing operation on the primary shard, then, each of the nodes containing a replica run the same indexing operation.
A widely-known tip is to disable replicas when doing large ingestions and to re-enable them after completion, given how slow it is to run index operations on each replica against copying segments from disk.
Segment replication follows this pattern without having to disable and then re-enable replicas. After a regular indexing operation on the primary, and after the refresh (when data is written to disk, and the translog data flushed) the segments will be copied directly to the replica nodes disk, improving the indexing throughput significantly.
*Tests were conducted by the OpenSearch Team, view the entire document here.
P0: Minimum value observed
P50: Median or 50th percentile value observed
P99: Value at or below, which is where 99% of the observations fall
P100: Maximum value observed
Results are promising, and the Roadmap contains many optimizations.
How to enable segment replication in OpenSearch
Go to config/jvm.options, and add the following line:
-Dopensearch.experimental.feature.replication_type.enabled=true
Alternatively, set an environment variable:
export OPENSEARCH_JAVA_OPTS="-Dopensearch.experimental.feature.replication_type.enabled=true"
To configure and index to use segment replication, replication type must be set to SEGMENT.
PUT /test-index { "settings": { "index": { "replication.type": "SEGMENT" } } }
Known limitations at this stage of development
There are several known limitations to segment replication, including the need to reindex when enabling for an existing index, and the lack of support for rolling upgrades. Full cluster restarts are currently required for index upgrades using segment replication (See Issue 3881).
Additionally, cross-cluster replication does not use segment replication to copy between clusters. Primary shards may experience increased network congestion, and shard allocation algorithms have not been updated to evenly distribute primary shards across nodes.
Finally, integration with remote-backed storage as a replication source is not currently supported. New versions of this feature will address these limitations.
Conclusion
- OpenSearch has released an experimental feature called “Segment Replication,” which changes how data is replicated from primary shards to replica shards.
- The feature copies segments directly to the replica nodes disk after the refresh, significantly improving the indexing throughput.
- The team conducted tests, showing promising results and have included many optimizations on the roadmap.
- As this feature is still in development and is still in the experimentation phase, there are some limitations. Known limitations of segment replication include the need to reindex existing indices when enabling the feature, lack of support for rolling upgrades, and increased network congestion on primary shards.