Overview
Elasticsearch indices are stored in shards, and each shard in turn stores the data on disk in segments. Elasticsearch processes such as updates and deletion can result in many small segments being created on disk, which Elasticsearch will merge into bigger sized segments in order to optimize disk usage. The merging process uses CPU, memory and disk resources, which can slow down the cluster’s response speed.
How to fix it
In general, the Elasticsearch merging process is controlled in the background, taking into account the other resource requirements of the cluster such as search and indexing. Therefore, it is neither necessary nor desirable to interfere with these processes.
However, if you temporarily want to reduce or limit the merge processes, then you have two options.
The less aggressive method is to reduce the max_thread_count for the merge scheduler.
This reduces the maximum number of threads that Elasticsearch will dedicate to merge activity (This number cannot be reduced to zero).
PUT my_index/_settings {"index.merge.scheduler.max_thread_count":1}
If you believe it necessary to take this step, it is highly recommended to restore the default as soon as possible to avoid other issues such as excess file descriptors on the nodes.
PUT my_index/_settings {"index.merge.scheduler.max_thread_count":null}
Another option is to roll over the index to stop further writing to it and then directly cancel merge tasks that have been started on that index, according to the following steps.
Roll over the index
To roll over the index you will need to be using index aliasing.
To create an index that uses aliases, you would need to either create the index with an alias:
PUT /logs-000001 { "aliases": { "my_log_alias": {} } }
Or add an alias to an existing index:
POST /_aliases { "actions" : [ { "add" : { "index" : "logs-000001", "alias" : "my_log_alias","is_write_index":true } } ] }
Your application will then need to write to the alias and not directly to the index.
Once you have created the alias, you can force a roll over by creating the new index and changing the write index like this:
PUT logs-000002
POST /_aliases { "actions" : [ { "add" : { "index" : "logs-000001", "alias" : "my_log_alias","is_write_index":false } }, { "add" : { "index" : "logs-000002", "alias" : "my_log_alias","is_write_index":true } } ] }
Cancel the merge task
Usually you should try to avoid cancelling merge tasks, since this is an optimisation operation regulated automatically by the cluster. However as a last resort to free up resource to handle heavy search or indexing operations you can cancel the task like this:
POST _tasks/aitueURTbdu58VeiohTt8A:12345/_cancel
How to prevent heavy merge activity
Heavy merge activity usually occurs for indices which are being actively indexed or updated. Heavy merge activity may also be an indication of other things not being right in your cluster. In particular it could be that your shards are too large. Learn more about how that happens and how to address it here: Shards Too Large in Elasticsearch – A Complete Guide.