Elasticsearch Mastering the Update by Query Functionality in Elasticsearch

By Opster Team

Updated: Jul 23, 2023

| 2 min read

Introduction

Elasticsearch’s Update by Query API is a powerful tool that allows you to modify documents that match a specific query. This feature is particularly useful when you need to update multiple documents in a single operation, saving you the time and resources of updating each document individually. If you want to learn about failed to update the original token document . the update result was . retrying – how to solve related issues, check out this guide.

Understanding the Update by Query API

The Update by Query API works by running a single query and applying an update script to each document that matches the query. The API uses Elasticsearch’s Query DSL to define the query and the scripting language Painless to define the update script.

Here’s a basic example of an Update by Query request:

json
POST /_update_by_query
{
  "script": {
    "source": "ctx._source.field += 'updated'",
    "lang": "painless"
  },
  "query": {
    "match": {
      "field": "value"
    }
  }
}

In this example, the Update by Query API will update all documents where the field “field” matches the value “value”. The update script will append the string “updated” to the existing value of the field.

Using the Update by Query API with Conflicts

One of the challenges of using the Update by Query API is handling conflicts. By default, the API will abort the operation if it encounters a version conflict. However, you can change this behavior by setting the “conflicts” parameter to “proceed”. This will cause the API to continue with the operation and record the conflict for later resolution.

Here’s an example of an Update by Query request with the “conflicts” parameter set to “proceed”:

json
POST /_update_by_query?conflicts=proceed
{
  "script": {
    "source": "ctx._source.field += 'updated'",
    "lang": "painless"
  },
  "query": {
    "match": {
      "field": "value"
    }
  }
}

Run the Update by Query API asynchronously

By default, the Update by Query API runs in the foreground. If the task takes too long, you run the risk of getting a connection timeout and you won’t see the result of the task. In order to prevent this, you can run the task by setting the “wait_for_completion” parameter to “false” and the command will return immediately with a task ID that you can use to monitor the executing task.

json
POST /_update_by_query?wait_for_completion=false
{
  "script": {
    "source": "ctx._source.field += 'updated'",
    "lang": "painless"
  },
  "query": {
    "match": {
      "field": "value"
    }
  }
}

Monitoring the Progress of an Update by Query Operation

The Update by Query API provides a way to monitor the progress of an operation. When you send an Update by Query request to be executed asynchronously, the API returns a task ID that you can use to retrieve the status of the operation.

`json
GET /_tasks/<task_id>

You can also retrieve the status of all executing Update by Query operations using the command below:

`json
GET /_tasks?detailed=true&actions=*byquery

In this example, the GET request will return detailed information about all Update by Query operations.

Optimizing Update by Query Operations

There are several ways to optimize Update by Query operations. One way is to use the “slices” parameter to divide the operation into multiple tasks that can run in parallel. Another way is to use the “scroll” parameter to control how long the operation keeps the search context alive.

Here’s an example of an Update by Query request with the “slices” and “scroll” parameters:

json
POST /_update_by_query?slices=5&scroll=5m
{
  "script": {
    "source": "ctx._source.field += 'updated'",
    "lang": "painless"
  },
  "query": {
    "match": {
      "field": "value"
    }
  }
}

In this example, the Update by Query operation is divided into five slices, and the search context is kept alive for five minutes.

Conclusion 

In conclusion, the Update by Query API is a versatile tool that can greatly simplify the process of updating multiple documents in Elasticsearch. By understanding how to use this API effectively, you can make your Elasticsearch operations more efficient and reliable.