Elasticsearch Optimize for Search Speed

By Opster Expert Team - Brillian

Updated: Mar 14, 2024

| 6 min read

Search speed is the major selling point of Elasticsearch. Most of the time, it’s the reason people decide to use Elasticsearch in the first place – which is why it’s key to ensure it produces results quickly.

By optimizing and maintaining Elasticsearch search speed, you can improve your product’s user experience and in turn improve your product’s conversion rate.

In this article, we will detail how to increase Elasticsearch speed by optimizing query and Elasticsearch settings.

How to detect slowness in your Elasticsearch

Before we learn how to increase your Elasticsearch search speed, it’s important to first cover how to detect the slowness in your Elasticsearch. Here are a few different ways you can do so:

Using slow logs

Elasticsearch provides a very convenient feature called slow logs. When configured correctly, Elasticsearch will print any slow query you can debug so you can improve those specific queries. You can configure slow logs on the index level or Elasticsearch level.

To configure it on the index level:

curl --request PUT \
  --url http://localhost:9200/search-speed \
  --header 'Content-Type: application/json' \
  --data '{
	"settings": {
		"index.search.slowlog.threshold.query.warn": "10s",
		"index.search.slowlog.threshold.query.info": "5s",
		"index.search.slowlog.threshold.query.debug": "2s",
		"index.search.slowlog.threshold.query.trace": "500ms",
		"index.search.slowlog.threshold.fetch.warn": "1s",
		"index.search.slowlog.threshold.fetch.info": "800ms",
		"index.search.slowlog.threshold.fetch.debug": "500ms",
		"index.search.slowlog.threshold.fetch.trace": "200ms",
		"index.search.slowlog.level": "info"
	}
}'

To configure it on the Elasticsearch level, you can change the properties:

index.search.slowlog.threshold.query.warn: 10s
index.search.slowlog.threshold.query.info: 5s
index.search.slowlog.threshold.query.debug: 2s
index.search.slowlog.threshold.query.trace: 500ms

index.search.slowlog.threshold.fetch.warn: 1s
index.search.slowlog.threshold.fetch.info: 800ms
index.search.slowlog.threshold.fetch.debug: 500ms
index.search.slowlog.threshold.fetch.trace: 200ms
index.search.slowlog.level: info

If you want to know more about slow logs, you can check out our article on how to configure slow logs properly.

Optimizing your query

Optimizing your queries is one thing you can do to improve Elasticsearch’s search performance. A bad query that collects more document results than needed will decrease your search speed.

Don’t put a large number on size parameter

Size parameter in Elasticsearch determines how many documents Elasticsearch will return in responses. A large value in the size parameter will reduce your search speed because a large number of documents need to be constructed by Elasticsearch. Other than that, the transfer latency between Elasticsearch and the client will also slow the search speed.

It’s recommended to double check and ensure that you set the value to the amount of documents you need.

The default size for a query is 10. You can change the size in the search parameter:

curl --request GET \
 --url 'http://localhost:9200/search-speed/_doc/_search?size=100' \
 --header 'Content-Type: application/json'

Get only the fields you need

Similar to retrieving more documents than you need, getting too many fields you don’t use will also slow down your search speed. This is due to the same reason we mentioned earlier – Elasticsearch will need to construct and transfer more documents to the client. If you combine both a large size parameter and many fields, together they will significantly slow your search speed.

Because of that, it’s recommended to only get the fields that you truly need.

There are multiple methods for configuring which fields you want to get. Here are a few methods you can use:

Using _source in the request body:

curl --request POST \
 --url http://localhost:9200/search-speed/_doc/_search \
 --header 'Content-Type: application/json' \
 --data '{
"_source":["name"]
}'

Using “fields” in the request body and turning off _source:

curl --request POST \
  --url http://localhost:9200/search-speed/_doc/_search \
  --header 'Content-Type: application/json' \
  --data '{
	"fields": [
		"name",
		"description"
	],
	"_source": false
}'

Please note that the search result with this method will be different, you will need to get the documents’ value in the fields instead of _source:

{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 2,
      "relation": "eq"
    },
    "max_score": 1.0,
    "hits": [
      {
        "_index": "search-speed",
        "_type": "_doc",
        "_id": "r2cy4HUB4Qqjzh5nmZLw",
        "_score": 1.0,
        "fields": {
          "name": [
            "hello"
          ],
          "description": [
            "world"
          ]
        }
      },
      {
        "_index": "search-speed",
        "_type": "_doc",
        "_id": "sGc14HUB4Qqjzh5n6JJx",
        "_score": 1.0,
        "fields": {
          "name": [
            "Opster"
          ],
          "description": [
            "Opster"
          ]
        }
      }
    ]
  }
}

Doc Value Fields, note that this method is not supported for text field type:

curl --request POST \
  --url http://localhost:9200/search-speed/_doc/_search \
  --header 'Content-Type: application/json' \
  --data '{
	"docvalue_fields": [
		"name.keyword"
	]
}'

There are additional methods you can use like stored fields, scripts fields and runtime fields, but the above are the most basic solutions for selecting the fields you want Elasticsearch to return.

Avoid using scripts

Scripting is a feature in Elasticsearch that allows you to evaluate custom expressions. It is a powerful feature, but it can majorly affect your search speed.

You should be careful when using scripts because Elasticsearch will apply the script to every result. The more data you have in the index, the slower the search will become as it goes over every result.

Avoid leading wildcard queries

Wildcard queries in Elasticsearch are similar to LIKE queries in SQL. For example, if you query *elastic* then the query will get all results containing the word elastic. The real problem with wildcard queries in Elasticsearch is using the leading wildcard query, e.g. *elastic.

Elasticsearch is designed to search exact tokens efficiently. With leading wildcard query, though, Elasticsearch can’t carry out the search efficiently. When you search Elasticsearch with a leading wildcard query, the query needs to go through the whole Inverted Index to discover which terms in the entire index contain the queried term.

Because of that, it’s recommended to stick with configuring your analyzer so it can support the query you want to use instead of using leading wildcard query.

Use timeout when searching

A slow query uses significant computational resources, often blocks the thread and slows down your Elasticsearch server. To avoid queries that are too long to complete, Elasticsearch offers a timeout feature. By using the timeout feature, you can stop your query if it’s growing too long to finish. It’s a very important feature to configure.

To set search timeout, you can define it in the search parameter:

curl --request POST \
 --url 'http://localhost:9200/search-speed/_doc/_search?timeout=5s' \
 --header 'Content-Type: application/json' \
 --data '{}'

You can also define it in the global settings with this key:

search.default_search_timeout

Avoid complex aggregations if you don’t need them

Aggregations is a powerful Elasticsearch feature you can use for multiple different things. Many people, especially in data analytics, only use Elasticsearch because of its aggregations feature. But, using too many aggregations comes with a price – slower search speeds. You need to be careful with aggregations, especially if your query requires fast search speed like autocomplete.

Tune your Elasticsearch settings

Tuning Elasticsearch settings is always hard to do. You want to ensure high availability, plan for scalability and achieve excellent search performance, all while trying to minimize the cost. You also need to constantly change your settings based on your products users. In this section, we’ll cover configuration tips you can apply to your Elasticsearch settings to improve search performance.

Freeze unused indices

Elasticsearch’s indices use memory to store data structure for faster performance. The problem is, unused indices will still utilize that memory. Too many unused indices will hog your memory and clutter it, slowing down your other indices’ search speed. Fortunately, you can use Elasticsearch’s freeze API to stop the unused indices from using your memory. You still can search a frozen index, but note that the search will be slower because the index doesn’t use memory anymore.

To freeze an index you can use _freeze API:

curl --request POST \
 --url http://localhost:9200/search-speed/_freeze \

Increase refresh interval

If you’ve been using Elasticsearch for some time, you might’ve noticed that you can’t search a document as soon as it’s indexed. This happens because before the documents are indexed into the shard (searchable), it goes through an in-memory buffer first. Elasticsearch uses an in-memory buffer because it’s more efficient to first store the tokens in the memory if there are large indexing processes. The process of tokens going from the in-memory buffer to the shard is called “refresh”.

Refresh is a very expensive process that can reduce your shard performance, and therefore also reduce your search performance. By default, the refresh process occurs every 1 second. Reducing its interval is usually not advised. Reducing refresh interval means that the in-memory buffer won’t work efficiently because it won’t be able to store many tokens before indexing it to the shard.

So, if reducing the interval will slow down your performance, what about increasing it? Well, it’s not that simple.

Increasing the refresh interval generally will increase your search performance, but you need to be careful. If you increase the refresh interval too much, the refresh process will be heavy and take longer to finish, which can harm your search performance instead of improving it. Another thing to note when increasing the refresh interval is that your document will also take longer to become searchable.

You will need trial and error to determine the most efficient refresh interval for your system. Usually, the default refresh interval of 1 second works pretty well with most use cases.

You can change the refresh interval in the index settings:

curl --request PUT \
  --url http://localhost:9200/search-speed/_settings \
  --header 'Content-Type: application/json' \
  --data '{
	"settings": {
		"index": {
			"refresh_interval": "5s"
		}
	}
}'

Or you can also change it in the Elasticsearch’s settings with key:

index.refresh_interval

Increase node query cache size

Elasticsearch uses node query cache to cache query results so it can return the results faster when it’s queried again. The cache implements LRU policy, so when it becomes full, it will evict the data that was least used recently.

If your node query cache size is too small, part of your query might not be cached and because of that, your Elasticsearch’s search performance may decrease.

To change the size, you can change the global setting with this key:

indices.queries.cache.size

You can read more information about node query cache setting here.

Optimize shards and replicas

Shards and replicas are part of Elasticsearch’s foundation. They’re responsible for Elasticsearch’s high availability, scalability, and fast performance. Shards and replicas will allow Elasticsearch to search concurrently and hence will improve your search performance. But you also need to be careful when increasing their numbers, because too many shards and replicas will lower your search performance and can make your Elasticsearch server shut down.

For a deeper understanding of how shards and replicas work, we recommend you read Opster’s guide on Elasticsearch Shards and Replicas.

There is also an article about RCA Analysis of Elasticsearch outage – check it out if you want to know how a single incorrectly set shard setting affects your Elasticsearch. For more tips on how to improve elasticsearch search performance, read this blog.

Increase hardware resources

Increasing your hardware resources is the most obvious way to increase your performance. The problem with this approach is, it’s very expensive. Increasing your Elasticsearch’s hardware resources should be the last thing that you do when you’re trying to improve Elasticsearch performance and in many cases will not solve search latency issues or improve search performance.

Elasticsearch How to Increase Elasticsearch Search Speed

How to detect slowness in your Elasticsearch

Using slow logs

Optimizing your query

Don’t put a large number on size parameter

Get only the fields you need

Avoid using scripts

Avoid leading wildcard queries

Use timeout when searching

Avoid complex aggregations if you don’t need them

Tune your Elasticsearch settings

Freeze unused indices

Increase refresh interval

Increase node query cache size

Optimize shards and replicas

Increase hardware resources