Quick links
- Background
- How to detect slowness in your OpenSearch
- Optimizing your query
- Tune your OpenSearch settings
- Increase OpenSearch speed with tools
Background
Search speed is the major selling point of OpenSearch. Most of the time, it’s the reason people decide to use OpenSearch in the first place – which is why it’s key to ensure it produces results quickly.
By optimizing and maintaining OpenSearch search speed, you can improve your product’s user experience and in turn improve your product’s conversion rate.
In this article, we will detail how to increase OpenSearch speed by optimizing query and OpenSearch settings.
How to detect slowness in your OpenSearch
Before we learn how to increase your OpenSearch search speed, it’s important to first cover how to detect the slowness in your OpenSearch. Here are a few different ways you can do so:
Using slow logs
OpenSearch provides a very convenient feature called slow logs. When configured correctly, OpenSearch will print any slow query you can debug so you can improve those specific queries. You can configure slow logs on the index level or OpenSerch level.
To configure it on the index level:
curl --request PUT \ --url http://localhost:9200/search-speed \ --header 'Content-Type: application/json' \ --data '{ "settings": { "index.search.slowlog.threshold.query.warn": "10s", "index.search.slowlog.threshold.query.info": "5s", "index.search.slowlog.threshold.query.debug": "2s", "index.search.slowlog.threshold.query.trace": "500ms", "index.search.slowlog.threshold.fetch.warn": "1s", "index.search.slowlog.threshold.fetch.info": "800ms", "index.search.slowlog.threshold.fetch.debug": "500ms", "index.search.slowlog.threshold.fetch.trace": "200ms", "index.search.slowlog.level": "info" } }'
To configure it on the OpenSearch level, you can change the properties:
index.search.slowlog.threshold.query.warn: 10s index.search.slowlog.threshold.query.info: 5s index.search.slowlog.threshold.query.debug: 2s index.search.slowlog.threshold.query.trace: 500ms index.search.slowlog.threshold.fetch.warn: 1s index.search.slowlog.threshold.fetch.info: 800ms index.search.slowlog.threshold.fetch.debug: 500ms index.search.slowlog.threshold.fetch.trace: 200ms index.search.slowlog.level: info
If you want to know more about slow logs, you can check out our article on how to configure slow logs properly.
Optimizing your query
Optimizing your queries is one thing you can do to improve OpenSerch’s search performance. A bad query that collects more document results than needed will decrease your search speed.
Don’t put a large number on size parameter
Size parameter in OpenSerch determines how many documents OpenSerch will return in responses. A large value in the size parameter will reduce your search speed because a large number of documents need to be constructed by OpenSearch. Other than that, the transfer latency between OpenSearch and the client will also slow the search speed.
It’s recommended to double check and ensure that you set the value to the amount of documents you need.
The default size for a query is 10. You can change the size in the search parameter:
curl --request GET \ --url 'http://localhost:9200/search-speed/_doc/_search?size=100' \ --header 'Content-Type: application/json'
Get only the fields you need
Similar to retrieving more documents than you need, getting too many fields you don’t use will also slow down your search speed. This is due to the same reason we mentioned earlier – OpenSearch will need to construct and transfer more documents to the client. If you combine both a large size parameter and many fields, together they will significantly slow your search speed.
Because of that, it’s recommended to only get the fields that you truly need.
There are multiple methods for configuring which fields you want to get. Here are a few methods you can use:
Using _source in the request body:
curl --request POST \ --url http://localhost:9200/search-speed/_doc/_search \ --header 'Content-Type: application/json' \ --data '{ "_source":["name"] }'
Using “fields” in the request body and turning off _source:
curl --request POST \ --url http://localhost:9200/search-speed/_doc/_search \ --header 'Content-Type: application/json' \ --data '{ "fields": [ "name", "description" ], "_source": false }'
Please note that the search result with this method will be different, you will need to get the documents’ value in the fields instead of _source:
{ "took": 2, "timed_out": false, "_shards": { "total": 1, "successful": 1, "skipped": 0, "failed": 0 }, "hits": { "total": { "value": 2, "relation": "eq" }, "max_score": 1.0, "hits": [ { "_index": "search-speed", "_type": "_doc", "_id": "r2cy4HUB4Qqjzh5nmZLw", "_score": 1.0, "fields": { "name": [ "hello" ], "description": [ "world" ] } }, { "_index": "search-speed", "_type": "_doc", "_id": "sGc14HUB4Qqjzh5n6JJx", "_score": 1.0, "fields": { "name": [ "Opster" ], "description": [ "Opster" ] } } ] } }
Doc Value Fields, note that this method is not supported for text field type:
curl --request POST \ --url http://localhost:9200/search-speed/_doc/_search \ --header 'Content-Type: application/json' \ --data '{ "docvalue_fields": [ "name.keyword" ] }'
There are additional methods you can use like stored fields, scripts fields and runtime fields, but the above are the most basic solutions for selecting the fields you want OpenSearch to return.
Avoid using scripts
Scripting is a feature in OpenSearch that allows you to evaluate custom expressions. It is a powerful feature, but it can majorly affect your search speed.
You should be careful when using scripts because OpenSearch will apply the script to every result. The more data you have in the index, the slower the search will become as it goes over every result.
Avoid leading wildcard queries
Wildcard queries in OpenSerch are similar to LIKE queries in SQL. For example, if you query *OpenSearch* then the query will get all results containing the word OpenSearch. The real problem with wildcard queries in OpenSerch is using the leading wildcard query, e.g. *OpenSearch.
OpenSerch is designed to search exact tokens efficiently. With leading wildcard query, though, OpenSearch can’t carry out the search efficiently. When you search OpenSearch with a leading wildcard query, the query needs to go through the whole Inverted Index to discover which terms in the entire index contain the queried term.
Because of that, it’s recommended to stick with configuring your analyzer so it can support the query you want to use instead of using leading wildcard query.
Use timeout when searching
A slow query uses significant computational resources, often blocks the thread and slows down your OpenSearch server. To avoid queries that are too long to complete, OpenSearch offers a timeout feature. By using the timeout feature, you can stop your query if it’s growing too long to finish. It’s a very important feature to configure.
To set search timeout, you can define it in the search parameter:
curl --request POST \ --url 'http://localhost:9200/search-speed/_doc/_search?timeout=5s' \ --header 'Content-Type: application/json' \ --data '{}'
You can also define it in the global settings with this key:
search.default_search_timeout
Avoid complex aggregations if you don’t need them
Aggregations is a powerful OpenSearch feature you can use for multiple different things. Many people, especially in data analytics, only use OpenSearch because of its aggregations feature. But, using too many aggregations comes with a price – slower search speeds. You need to be careful with aggregations, especially if your query requires fast search speed like autocomplete.
Tune your OpenSearch settings
Tuning OpenSearch settings is always hard to do. You want to ensure high availability, plan for scalability and achieve excellent search performance, all while trying to minimize the cost. You also need to constantly change your settings based on your products users. In this section, we’ll cover configuration tips you can apply to your OpenSearch settings to improve search performance.
Freeze unused indices
OpenSearch’s indices use memory to store data structure for faster performance. The problem is, unused indices will still utilize that memory. Too many unused indices will hog your memory and clutter it, slowing down your other indices’ search speed. Fortunately, you can use OpenSearch’s freeze API to stop the unused indices from using your memory. You still can search a frozen index, but note that the search will be slower because the index doesn’t use memory anymore.
To freeze an index you can use _freeze API:
curl --request POST \ --url http://localhost:9200/search-speed/_freeze \
Increase refresh interval
If you’ve been using OpenSearch for some time, you might’ve noticed that you can’t search a document as soon as it’s indexed. This happens because before the documents are indexed into the shard (searchable), it goes through an in-memory buffer first. OpenSearch uses an in-memory buffer because it’s more efficient to first store the tokens in the memory if there are large indexing processes. The process of tokens going from the in-memory buffer to the shard is called “refresh”.
Refresh is a very expensive process that can reduce your shard performance, and therefore also reduce your search performance. By default, the refresh process occurs every 1 second. Reducing its interval is usually not advised. Reducing refresh interval means that the in-memory buffer won’t work efficiently because it won’t be able to store many tokens before indexing it to the shard.
So, if reducing the interval will slow down your performance, what about increasing it? Well, it’s not that simple.
Increasing the refresh interval generally will increase your search performance, but you need to be careful. If you increase the refresh interval too much, the refresh process will be heavy and take longer to finish, which can harm your search performance instead of improving it. Another thing to note when increasing the refresh interval is that your document will also take longer to become searchable.
You will need trial and error to determine the most efficient refresh interval for your system. Usually, the default refresh interval of 1 second works pretty well with most use cases.
You can change the refresh interval in the index settings:
curl --request PUT \ --url http://localhost:9200/search-speed/_settings \ --header 'Content-Type: application/json' \ --data '{ "settings": { "index": { "refresh_interval": "5s" } } }'
Or you can also change it in the OpenSearch’s settings with key:
index.refresh_interval
Increase node query cache size
OpenSearch uses node query cache to cache query results so it can return the results faster when it’s queried again. The cache implements LRU policy, so when it becomes full, it will evict the data that was least used recently.
If your node query cache size is too small, part of your query might not be cached and because of that, your OpenSearch’s search performance may decrease.
To change the size, you can change the global setting with this key:
indices.queries.cache.size
You can read more information about node query cache setting here.
Optimize shards and replicas
Shards and replicas are part of OpenSearch’s foundation. They’re responsible for OpenSearch’s high availability, scalability, and fast performance. Shards and replicas will allow OpenSearch to search concurrently and hence will improve your search performance. But you also need to be careful when increasing their numbers, because too many shards and replicas will lower your search performance and can make your OpenSearch server shut down.
For a deeper understanding of how shards and replicas work, we recommend you read Opster’s guide on OpenSearch Shards and Replicas.
There is also an article about RCA Analysis of OpenSearch outage – check it out if you want to know how a single incorrectly set shard setting affects your OpenSearch. For more tips on how to improve OpenSearch search performance, read this blog.
Increase hardware resources
Increasing your hardware resources is the most obvious way to increase your performance. The problem with this approach is, it’s very expensive. Increasing your OpenSearch’s hardware resources should be the last thing that you do when you’re trying to improve OpenSearch performance and in many cases will not solve search latency issues or improve search performance.