What is async search?
Waiting for the payload to get to the client can take a very long time when you’re querying large amounts of data.
The async search API is designed to retrieve huge amounts of data in a stream fashion instead of a single request.
This means that instead of waiting for the query to finish retrieving all the results, the async query will be returning the results partially as it’s collecting them.
The query will return an ID and other status indicators, so you can close your Kibana DevTools console or terminal and come back later to see your query’s progress and the results fetched.
Running an async search query
The async search query receives the same parameters as a regular search.
Let’s index some documents and run a query.
POST test_async/_doc { "text": "Doc1" } POST test_async/_doc { "text": "Doc2" } POST test_async/_doc { "text": "Example doc" } POST test_async/_async_search
The response will look like this:
{ “id”: “SOME_ID”, "is_partial" : false, "is_running" : false, "start_time_in_millis" : 1636010235096, "expiration_time_in_millis" : 1636442235096, "response" : { "took" : 719, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 3, "relation" : "eq" }, "max_score" : 1.0, "hits" : [ { "_index" : "test_async", "_type" : "_doc", "_id" : "0JjG6XwBpL6RE1SX6qi6", "_score" : 1.0, "_source" : { "title" : "Example doc" } }, { "_index" : "test_async", "_type" : "_doc", "_id" : "0ZjO6XwBpL6RE1SX0Kgt", "_score" : 1.0, "_source" : { "text" : "Doc1" } }, { "_index" : "test_async", "_type" : "_doc", "_id" : "0pjO6XwBpL6RE1SX1KgU", "_score" : 1.0, "_source" : { "text" : "Doc2" } } ] } } }
Important properties in async search queries
Field | Description |
---|---|
id | If the query takes longer than the preset time set on wait_for_completion_timeout, an ID is generated to retrieve the query status later. |
is_partial | When the query is running, this parameter will always be true. Otherwise, it will indicate if the query failed or is complete. |
is_running | Indicates whether the query is running or complete. |
shards.total | Total amount of shards the query will be executed against. |
shards.successful | The amount of shards which, up until the current point in time, have been successfully executed against. |
hits.total.value | Documents returned by the query so far. These documents belong to the “shards successful”. |
How to retrieve status and hits
To retrieve the status and hits of our async query we just need to run a GET request:
GET /_async_search/SOME_ID
The current status and hits of the async query will be returned.
How to retrieve status alone
If we don’t need the hits of the query and only want to check the status, we can call the status endpoint:
GET /_async_search/status/SOME_ID
The response will look like this:
{ "id" : "FmRldE8zREVEUzA2ZVpUeGs2ejJFUFEaMkZ5QTVrSTZSaVN3WlNFVmtlWHJsdzoxMDc=", "is_running" : true, "is_partial" : true, "start_time_in_millis" : 1583945890986, "expiration_time_in_millis" : 1584377890986, "_shards" : { "total" : 562, "successful" : 188, "skipped" : 0, "failed" : 0 } }
The “successful” property indicates the amount of shards the query was executed on.
For an async search that has been completed, the status response has an additional completion_status field that shows the HTTP status code of the completed async search.
For example, if the query executed correctly:
“completion_status” : 200
If the query had errors:
“completion_status” : 503
How to delete a query
If you want to cancel the async query at some point you can call the DELETE verb and the query will be canceled.
DELETE /_async_search/SOME_ID
If OpenSearch security features are enabled, there are two types of users that can delete queries:
1. The authenticated user that fired the query
OR
2. A user that has cancel_task cluster privileges.
Additional parameters
Field | Description |
---|---|
wait_for_completion_timeout | Blocks the query execution so that it finishes after this time, defaulting to 1 second. Results will not be stored (no ID field) if the query finished before this time. |
keep_on_completion | Stores results even if the query finished within wait_for_completion_timeout. |
keep_alive | Defaults to 5 days and determines the amount of time the async queries status will be saved. After this time all the ongoing queries and statuses will be deleted. |
batched_reduce_size | Defines how often partial results become available, defaults to 5. |
request_cache | Used to enable or disable caching on a *per-request* basis. Defaults to true. |
The following parameters cannot be changed but are worth mentioning:
Field | Description |
---|---|
pre_filter_shard_size | Set to 1, enforces the execution of a pre-filter roundtrip to skip the documents that don't match the query. |
ccs_minimize_roundtrips | Indicates whether network round-trips should be minimized as part of cross-cluster search requests execution. Set to false. |
OpenSearch by default does not limit the size of the async queries response. Storing huge responses might destabilize the cluster. To limit the maximum response size you can change the search.max_async_search_response_size cluster setting.
Conclusion
Using async search is a great idea when you need to run high demanding queries and want to retrieve partial results instead of waiting until the end of the query.