Learn how to reindex Elasticsearch more efficiently and improve Elasticsearch reindexing performance by following these tips:
- Disable Replicas
Disable replicas when building a new index from scratch that is not serving the search traffic. Replicas can be changed dynamically later on once re-indexing has been completed.
- Disable Refresh Interval
Disable refresh interval again. It can be changed once re-indexing has been completed.
- Use Bulk API
Use the bulk API with multiple clients to get the maximum throughput from Elasticsearch (Benchmark Elasticsearch cluster to avoid any performance issues).
- Increase Buffer Size
Increase index buffer size and use Opster’s detailed documentation to fine-tune it.
- Use Reindex API
If _source field is enabled and you are re-indexing in the case of changing analyzer on the existing fields (breaking changes), use Reindex API of Elasticsearch.
- Disable Merge Throttling
Disable merge throttling by changing the setting `indices.store.throttle.type` to none. If you have a massive write-heavy index, then you can make it permanent. Note that this is only relevant for Elasticsearch versions older than 6.x.
- Ensure Optimal Scalability Settings
Choosing the optimal number of primary shards is crucial for scalability, which can’t be changed later on. Refer to Opster’s guide to shards and replicas to understand more. Also, make sure you don’t end up creating “hotspots” in the cluster.