Elasticsearch How to Increase OpenSearch Search Speed

By Opster Expert Team - Brillian

Updated: Apr 15, 2024

| 6 min read

Quick links

Background

Search speed is the major selling point of OpenSearch. Most of the time, it’s the reason people decide to use OpenSearch in the first place – which is why it’s key to ensure it produces results quickly.

By optimizing and maintaining OpenSearch search speed, you can improve your product’s user experience and in turn improve your product’s conversion rate.

In this article, we will detail how to increase OpenSearch speed by optimizing query and OpenSearch settings.


How to detect slowness in your OpenSearch

Before we learn how to increase your OpenSearch search speed, it’s important to first cover how to detect the slowness in your OpenSearch. Here are a few different ways you can do so:

Using slow logs

OpenSearch provides a very convenient feature called slow logs. When configured correctly, OpenSearch will print any slow query you can debug so you can improve those specific queries. You can configure slow logs on the index level or OpenSerch level.

To configure it on the index level:

curl --request PUT \
  --url http://localhost:9200/search-speed \
  --header 'Content-Type: application/json' \
  --data '{
	"settings": {
		"index.search.slowlog.threshold.query.warn": "10s",
		"index.search.slowlog.threshold.query.info": "5s",
		"index.search.slowlog.threshold.query.debug": "2s",
		"index.search.slowlog.threshold.query.trace": "500ms",
		"index.search.slowlog.threshold.fetch.warn": "1s",
		"index.search.slowlog.threshold.fetch.info": "800ms",
		"index.search.slowlog.threshold.fetch.debug": "500ms",
		"index.search.slowlog.threshold.fetch.trace": "200ms",
		"index.search.slowlog.level": "info"
	}
}'

To configure it on the OpenSearch level, you can change the properties:

index.search.slowlog.threshold.query.warn: 10s
index.search.slowlog.threshold.query.info: 5s
index.search.slowlog.threshold.query.debug: 2s
index.search.slowlog.threshold.query.trace: 500ms

index.search.slowlog.threshold.fetch.warn: 1s
index.search.slowlog.threshold.fetch.info: 800ms
index.search.slowlog.threshold.fetch.debug: 500ms
index.search.slowlog.threshold.fetch.trace: 200ms
index.search.slowlog.level: info

If you want to know more about slow logs, you can check out our article on how to configure slow logs properly.


Optimizing your query

Optimizing your queries is one thing you can do to improve OpenSerch’s search performance. A bad query that collects more document results than needed will decrease your search speed.

Don’t put a large number on size parameter

Size parameter in OpenSerch determines how many documents OpenSerch will return in responses. A large value in the size parameter will reduce your search speed because a large number of documents need to be constructed by OpenSearch. Other than that, the transfer latency between OpenSearch and the client will also slow the search speed.

It’s recommended to double check and ensure that you set the value to the amount of documents you need.

The default size for a query is 10. You can change the size in the search parameter:

curl --request GET \
 --url 'http://localhost:9200/search-speed/_doc/_search?size=100' \
 --header 'Content-Type: application/json'

Get only the fields you need

Similar to retrieving more documents than you need, getting too many fields you don’t use will also slow down your search speed. This is due to the same reason we mentioned earlier –  OpenSearch will need to construct and transfer more documents to the client. If you combine both a large size parameter and many fields, together they will significantly slow your search speed.

Because of that, it’s recommended to only get the fields that you truly need.

There are multiple methods for configuring which fields you want to get. Here are a few methods you can use:

Using _source in the request body:

curl --request POST \
 --url http://localhost:9200/search-speed/_doc/_search \
 --header 'Content-Type: application/json' \
 --data '{
"_source":["name"]
}'

Using “fields” in the request body and turning off _source:

curl --request POST \
  --url http://localhost:9200/search-speed/_doc/_search \
  --header 'Content-Type: application/json' \
  --data '{
	"fields": [
		"name",
		"description"
	],
	"_source": false
}'

Please note that the search result with this method will be different, you will need to  get the documents’ value in the fields instead of _source:

{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 2,
      "relation": "eq"
    },
    "max_score": 1.0,
    "hits": [
      {
        "_index": "search-speed",
        "_type": "_doc",
        "_id": "r2cy4HUB4Qqjzh5nmZLw",
        "_score": 1.0,
        "fields": {
          "name": [
            "hello"
          ],
          "description": [
            "world"
          ]
        }
      },
      {
        "_index": "search-speed",
        "_type": "_doc",
        "_id": "sGc14HUB4Qqjzh5n6JJx",
        "_score": 1.0,
        "fields": {
          "name": [
            "Opster"
          ],
          "description": [
            "Opster"
          ]
        }
      }
    ]
  }
}

Doc Value Fields, note that this method is not supported for text field type:

curl --request POST \
  --url http://localhost:9200/search-speed/_doc/_search \
  --header 'Content-Type: application/json' \
  --data '{
	"docvalue_fields": [
		"name.keyword"
	]
}'

There are additional methods you can use like stored fields, scripts fields and runtime fields, but the above are the most basic solutions for selecting the fields you want OpenSearch to return. 

Avoid using scripts

Scripting is a feature in OpenSearch that allows you to evaluate custom expressions. It is a powerful feature, but it can majorly affect your search speed.

You should be careful when using scripts because OpenSearch will apply the script to every result. The more data you have in the index, the slower the search will become as it goes over every result.

Avoid leading wildcard queries

Wildcard queries in OpenSerch are similar to LIKE queries in SQL. For example, if you query *OpenSearch* then the query will get all results containing the word OpenSearch. The real problem with wildcard queries in OpenSerch is using the leading wildcard query, e.g. *OpenSearch.

OpenSerch is designed to search exact tokens efficiently. With leading wildcard query, though, OpenSearch can’t carry out the search efficiently. When you search OpenSearch with a leading wildcard query, the query needs to go through the whole Inverted Index to discover which terms in the entire index contain the queried term.

Because of that, it’s recommended to stick with configuring your analyzer so it can support the query you want to use instead of using leading wildcard query.

Use timeout when searching

A slow query uses significant computational resources, often blocks the thread and slows down your OpenSearch server. To avoid queries that are too long to complete, OpenSearch offers a timeout feature. By using the timeout feature, you can stop your query if it’s growing too long to finish. It’s a very important feature to configure.

To set search timeout, you can define it in the search parameter:

curl --request POST \
 --url 'http://localhost:9200/search-speed/_doc/_search?timeout=5s' \
 --header 'Content-Type: application/json' \
 --data '{}'

You can also define it in the global settings with this key:

search.default_search_timeout

Avoid complex aggregations if you don’t need them

Aggregations is a powerful OpenSearch feature you can use for multiple different things. Many people, especially in data analytics, only use OpenSearch because of its aggregations feature. But, using too many aggregations comes with a price – slower search speeds. You need to be careful with aggregations, especially if your query requires fast search speed like autocomplete.


Tune your OpenSearch settings

Tuning OpenSearch settings is always hard to do. You want to ensure high availability, plan for scalability and achieve excellent search performance, all while trying to minimize the cost. You also need to constantly change your settings based on your products users. In this section, we’ll cover configuration tips you can apply to your OpenSearch settings to improve search performance.

Freeze unused indices

OpenSearch’s indices use memory to store data structure for faster performance. The problem is, unused indices will still utilize that memory. Too many unused indices will hog your memory and clutter it, slowing down your other indices’ search speed. Fortunately, you can use OpenSearch’s freeze API to stop the unused indices from using your memory. You still can search a frozen index, but note that the search will be slower because the index doesn’t use memory anymore.

To freeze an index you can use _freeze API:

curl --request POST \
 --url http://localhost:9200/search-speed/_freeze \

Increase refresh interval

If you’ve been using OpenSearch for some time, you might’ve noticed that you can’t search a document as soon as it’s indexed. This happens because before the documents are indexed into the shard (searchable), it goes through an in-memory buffer first. OpenSearch uses an in-memory buffer because it’s more efficient to first store the tokens in the memory if there are  large indexing processes. The process of tokens going from the in-memory buffer to the shard is called “refresh”.

Refresh is a very expensive process that can reduce your shard performance, and therefore also reduce your search performance. By default, the refresh process occurs every 1 second. Reducing its interval is usually not advised. Reducing refresh interval means that the in-memory buffer won’t work efficiently because it won’t be able to store many tokens before indexing it to the shard. 

So, if reducing the interval will slow down your performance, what about increasing it? Well, it’s not that simple.

Increasing the refresh interval generally will increase your search performance, but you need to be careful. If you increase the refresh interval too much, the refresh process will be heavy and take longer to finish, which can harm your search performance instead of improving it. Another thing to note when increasing the refresh interval is that your document will also take longer to become searchable.

You will need trial and error to determine the most efficient refresh interval for your system. Usually, the default refresh interval of 1 second works pretty well with most use cases.

You can change the refresh interval in the index settings:

curl --request PUT \
  --url http://localhost:9200/search-speed/_settings \
  --header 'Content-Type: application/json' \
  --data '{
	"settings": {
		"index": {
			"refresh_interval": "5s"
		}
	}
}'

 Or you can also change it in the OpenSearch’s settings with key:

index.refresh_interval

Increase node query cache size

OpenSearch uses node query cache to cache query results so it can return the results faster when it’s queried again. The cache implements LRU policy, so when it becomes full, it will evict the data that was least used recently. 

If your node query cache size is too small, part of your query might not be cached and because of that, your OpenSearch’s search performance may decrease. 

To change the size, you can change the global setting with this key:

indices.queries.cache.size

You can read more information about node query cache setting here.

Optimize shards and replicas

Shards and replicas are part of OpenSearch’s foundation. They’re responsible for OpenSearch’s high availability, scalability, and fast performance. Shards and replicas will allow OpenSearch to search concurrently and hence will improve your search performance. But you also need to be careful when increasing their numbers, because too many shards and replicas will lower your search performance and can make your OpenSearch server shut down. 

For a deeper understanding of how shards and replicas work, we recommend you read Opster’s guide on OpenSearch Shards and Replicas.

There is also an article about RCA Analysis of OpenSearch outage – check it out if you want to know how a single incorrectly set shard setting affects your OpenSearch. For more tips on how to improve OpenSearch search performance, read this blog.

Increase hardware resources

Increasing your hardware resources is the most obvious way to increase your performance. The problem with this approach is, it’s very expensive. Increasing your OpenSearch’s hardware resources should be the last thing that you do when you’re trying to improve OpenSearch performance and in many cases will not solve search latency issues or improve search performance.