Elasticsearch Elasticsearch Long Running DeleteByQuery Task

By Opster Team

Updated: Mar 10, 2024

| 2 min read

What does this mean? 

DeleteByQuery operations in Elasticsearch may take an unusually long time to complete. DeleteByQuery is a feature in Elasticsearch that allows you to delete documents from an index that match a specific query. When this operation takes a long time to complete, it can cause performance issues in the cluster.

Why does this occur?

There could be several reasons for a long running DeleteByQuery task:

  1. The DeleteByQuery operation is matching a large number of documents, which can take a long time to process and delete.
  2. The cluster is experiencing high load or resource contention, causing the DeleteByQuery operation to take longer than expected.
  3. The index mappings and analysis settings are not optimized, leading to slower query performance.

Possible impact and consequences of long running DeleteByQuery operations

The impact of a long running DeleteByQuery task can be significant, as it may affect the overall performance of the Elasticsearch cluster. This can lead to slower query response times, increased resource usage, and potential instability in the cluster.

How to resolve

To resolve the issue of a long running DeleteByQuery task, consider the following recommendations:

1. Try to improve the query used in the DeleteByQuery API. Wherever possible, use the filter in the query and reduce the number of documents matched. If DeleteByQuery matches a huge number of documents, try to do it in multiple batches. Increase the refresh interval if applicable (if it is currently less than 30s, increase to 30s)..

To update the refresh interval of an index, use the following command:

PUT /your_index_name/_settings
{
  "index" : {
    "refresh_interval" : "30s"
  }
}

2. Review the index mappings and analysis, and optimize where possible. You can use the free Opster Template Analyzer tool to help you with this.

3. Monitor the resource usage of your Elasticsearch cluster and ensure that it has adequate resources (CPU, memory, disk space, and I/O) to handle the DeleteByQuery operations. If necessary, consider scaling up your cluster or adding more nodes to distribute the load. Note that this is only applicable if the index being updated has more primary shards than data nodes, otherwise adding more nodes won’t help.

4. Use the Elasticsearch Task Management API to monitor the progress of the DeleteByQuery task and identify any bottlenecks or issues. For example, you can run the DeleteByQuery task in the background with the following command:

POST /<index_name>/_delete_by_query?wait_for_completion=false

The response from the previous command will include a <task_id> which you can then use in the following command to retrieve the status of the ongoing task:

5. If the DeleteByQuery operation is still taking too long, you can kill it with the command below and you can consider breaking it down into smaller tasks using slicing:

POST /_tasks/<task_id>/_cancel