Elasticsearch OpenSearch Segment Replication

By Opster Expert Team - Gustavo

Updated: Sep 21, 2023

| 2 min read

Quick links

Definition

A chart that simplifies the architecture design in OpenSearch - including document and segment replication.
*This chart is a simplified version of the architecture design. To see the full chart visit: [Segment Replication] Design Proposal · Issue #2229 · opensearch-project/OpenSearch (github.com)
What is segment replication in OpenSearch?

Segment replication is an experimental feature on OpenSearch 2.3 that changes the way data is replicated from primary shards to replica shards. This feature copies segments directly to the replica nodes disk after refresh, significantly improving the indexing throughput.

Currently, OpenSearch uses document replication. This means running an indexing operation on the primary shard, then, each of the nodes containing a replica run the same indexing operation.

A widely-known tip is to disable replicas when doing large ingestions and to re-enable them after completion, given how slow it is to run index operations on each replica against copying segments from disk. 

Segment replication follows this pattern without having to disable and then re-enable replicas. After a regular indexing operation on the primary, and after the refresh (when data is written to disk, and the translog data flushed) the segments will be copied directly to the replica nodes disk, improving the indexing throughput significantly.

Main branch document replication in OpenSearch table, including PO, P50, P99 and P100 data.

*Tests were conducted by the OpenSearch Team, view the entire document here.

P0: Minimum value observed
P50: Median or 50th percentile value observed
P99: Value at or below, which is where 99% of the observations fall
P100: Maximum value observed

Results are promising, and the Roadmap contains many optimizations.

How to enable segment replication in OpenSearch

Go to config/jvm.options, and add the following line:

-Dopensearch.experimental.feature.replication_type.enabled=true

Alternatively, set an environment variable:

export OPENSEARCH_JAVA_OPTS="-Dopensearch.experimental.feature.replication_type.enabled=true"

To configure and index to use segment replication, replication type must be set to SEGMENT.

PUT /test-index
{
  "settings": {
    "index": {
      "replication.type": "SEGMENT" 
    }
  }
}

Known limitations at this stage of development

There are several known limitations to segment replication, including the need to reindex when enabling for an existing index, and the lack of support for rolling upgrades. Full cluster restarts are currently required for index upgrades using segment replication (See Issue 3881). 

Additionally, cross-cluster replication does not use segment replication to copy between clusters. Primary shards may experience increased network congestion, and shard allocation algorithms have not been updated to evenly distribute primary shards across nodes. 

Finally, integration with remote-backed storage as a replication source is not currently supported. New versions of this feature will address these limitations.

Conclusion

  • OpenSearch has released an experimental feature called “Segment Replication,” which changes how data is replicated from primary shards to replica shards.
  • The feature copies segments directly to the replica nodes disk after the refresh, significantly improving the indexing throughput.
  • The team conducted tests, showing promising results and have included many optimizations on the roadmap.
  • As this feature is still in development and is still in the experimentation phase, there are some limitations. Known limitations of segment replication include the need to reindex existing indices when enabling the feature, lack of support for rolling upgrades, and increased network congestion on primary shards.