Background
Autocomplete is a must nowadays. Elasticsearch has been releasing many new features to enable autocomplete to be more flexible and powerful, as well as more adaptable to each use case’s requirements.
There are many ways to approach autocomplete (learn more about autocomplete in Elasticsearch here).
Terms enum API
In version 7.14, Elasticsearch released a new API called Terms enum API.
The Terms enum API looks for similarities in the index based on partial matches. This approach can help us run low latency lookups in our fields.
As the name says “terms enum” suggests, this API works with terms, or in other words: keyword type fields. If you want to learn more about mappings and field types, read more here.
By default, the terms enum API matches terms with case sensitivity, intended to keep the query as light as possible. It’s also important to note that terms_enum will always match from the beginning of the field value.
Terms enum API has a “timeout” property, which will stop the query after the time defined. After timeout, the query might return partial or empty results. We can also limit the number of terms using the “size” property.
There are two more parameters in Terms enum API that are more advanced: index_filter and search_after.
Index_filter will allow us to call the Terms enum API API against many indices using a wildcard (see in-depth explanation below). Search_after will allow us to paginate our terms, if we define on which term do you want to start the search, similar to the “from” parameter (see example below).
How to use Terms enum API
Let’s try indexing some documents to a new index.
POST _bulk { "index" : { "_index" : "test_terms_enum"} } {"name": "Star wars"} { "index" : { "_index" : "test_terms_enum"} } {"name": "Star trek"} { "index" : { "_index" : "test_terms_enum"} } {"name": "Shrek"} { "index" : { "_index" : "test_terms_enum"} } {"name": "Heaven is full of stars"} { "index" : { "_index" : "test_terms_enum"} } {"name": "Starman"} { "index" : { "_index" : "test_terms_enum"} } {"name": "The last status"} { "index" : { "_index" : "test_terms_enum"} } {"name": "starter pack"}
Now, let’s try to call the _terms_enum API:
GET test_terms_enum/_terms_enum { "field": "name", "string": "star" }
We should see at least some “star” related results:
{ "_shards" : { "total" : 1, "successful" : 1, "failed" : 0 }, "terms" : [ ], "complete" : true }
However, our results show nothing. This is because, as we mentioned, this API works with keyword field types.
Elastic will generate a .keyword field dynamically, so let’s try again:
GET test_terms_enum/_terms_enum { "field": "name.keyword", "string": "star" }
This should now return Star wars, Star trek, Starman and starter pack, right?
{ "_shards" : { "total" : 1, "successful" : 1, "failed" : 0 }, "terms" : [ "starter pack" ], "complete" : true }
Once again, not the results we were expecting. This is because of the case sensitivity of the API. We can change this as follows:
GET test_terms_enum/_terms_enum { “size”:10, “timeout”:”1s”, "field": "name.keyword", "string": "star", "case_insensitive": true }
Let’s check the response:
{ "_shards" : { "total" : 1, "successful" : 1, "failed" : 0 }, "terms" : [ "Star trek", "Star wars", "Starman", "starter pack" ], "complete" : true }
Perfect. Now we have the expected results. Note the “size” property which is set to default in this example, 10, allowing us to limit the amount of terms.
What about “Heaven is full of stars”? Because the Terms enum API will match from the beginning of the field value, for this one to match the query string should be “hea”.
Also note the “timeout” property seen here, one second in this case, which will stop the query after the time defined.
Now let’s dive into two more important parameters: index_filter and search_after.
index_filter
You can run the Terms enum API against many indices using a wildcard (*)
For example:
GET test_terms_enum*/_terms_enum { "field": "name.keyword", "string": "star", "case_insensitive": true, "index_filter": { "range": { "rating": { "gte": 10 } } } }
This will query all indices starting with test_terms_enum, and skip all of those that return no results (map to match_none) when filtering by a rating field equal or greater than 10.
As we don’t have that ranking field in our index, that filter will return no results and the index will be omitted, so no terms will be shown even if “star” has matches.
Important note
The filtering is done on a best-effort basis, it uses index statistics and mappings to rewrite queries to match_none instead of fully executing the request. For instance a range query over a date field can rewrite to match_none if all documents within a shard (including deleted documents) are outside of the provided range. However, not all queries can rewrite to match_none so this API may return an index even if the provided filter matches no document.
See the source here.
search_after
This parameter is used as a live pointer for pagination.
In this context you can use search_after to paginate your terms, you just have to define on which term you want to start the search, similar to the “from” parameter in a search query.
Pagination is important to keep your payload light, and to request only the data the user needs to see at a certain moment. It makes no sense to request 1000 results at once if the user can only see 10 in one page. It’s a better idea to run a new request on each page change and request 10 at each time.
GET test_terms_enum/_terms_enum { "field": "name.keyword", "case_insensitive": true, "string": "star", "search_after": "Star wars" }
This query will start searching after the term “Star wars”. The idea is to set this parameter dynamically based on the current last term.
Response without search_after:
{ "_shards" : { "total" : 1, "successful" : 1, "failed" : 0 }, "terms" : [ "Star trek", "Star wars", "Starman", "starter pack" ], "complete" : true }
Response with search_after:
{ "_shards" : { "total" : 1, "successful" : 1, "failed" : 0 }, "terms" : [ "Starman", "starter pack" ], "complete" : true }
Note how “Star wars” works as a pointer, then the second search is run after that term.
This is a work in progress, you can check out the discussion on this topic here.
Conclusion
We learned how to use the new Terms enum API. This API might be very useful in some use cases, and combined with other approaches can improve your performance and relevance significantly.
PROS
- Easy to use
- Fast
- No reindexing or new fields needed
- Case sensitive or insensitive
- Supports pagination
CONS
- Only matches the start of the term
- Fuzziness is not supported