Elasticsearch keyword vs. text vs. wild card. Elasticsearch strings explanation
Overview
String literals in Elasticsearch can come in different flavors. Keyword, wildcard and text field types all have different features and are ideal for different use cases. Below is an explanation of the differences between each one and the context in which to use the different types for your string fields.
Text vs. Keyword
By default, in recent versions of Elasticsearch all string fields get indexed as both text and keyword.
The difference between text and keyword
In early Elasticsearch versions there was a field type called “string”. This was used to enable full text search. These fields would go through an analysis pipeline that performs operations such as lowercasing, removing punctuation, splitting the document into single tokens and filtering them further by stopwords etc.
This process works perfectly for searching larger documents, but sometimes this isn’t the ideal behavior. When you want to filter by certain values or list them all using aggregations, you need a different type because you don’t want the input document to go through an analysis pipeline. You want it to stay not analyzed.
So if you wanted to use a field for exact filtering or term aggregations you had to configure the field of type “string” with: “index” : “not_analyzed”.
"old_string_field" : { "type" : "string", "fields" : { "keyword" : { "type" : "string", "index" : "not_analyzed" } } }
This was exactly how you could differentiate between text and keyword. Since this was not very intuitive for users not familiar with information retrieval, 2 new types were created: text and keyword.
As of Elasticsearch version 5 the default mapping for String literals is:
"new_string_field" : { "type" : "text", "fields" : { "keyword" : { "type" : "keyword" } } }
So the differences are:
- Text is fully analyzed and can be used for partial full text matching.
- Keyword will be indexed as is without any modification. It’s ideal for term aggregations and for filtering exact values.https://www.elastic.co/guide/en/elasticsearch/reference/7.17/text.html#text-field-type
Keyword vs. Wildcard
When you’re planning to run many wildcard queries you should use the wildcard type. It works well for machine-generated content like log messages that you would typically grep through in the terminal.
Performance is usually poor if you’re running wildcard queries on regular text or keyword fields. If you already know your users will run wildcard queries, you should use the wildcard field to maintain cluster stability. Read more about wildcard fields and how they process queries internally.
The wildcard type was introduced in Elasticsearch version 7.9.
Text vs. Match Only Text
The type “match_only_text” is very similar to “text” but it saves disk space by sacrificing granular scoring. Read more about it here.
Code samples
Create a multi-field mapping to enable all string types on the field message:
PUT string-types { "mappings": { "properties": { "message": { "type": "text", "analyzer": "standard", "fields": { "keyword": { "type" : "keyword" }, "wildcard_field" : { "type" : "wildcard" } } } } } }
Which Elasticsearch string type should I use?
Use the text field if:
- You’re planning to perform regular fulltext search / search for a specific word or phrase
- The content is in in regular, written text, such that a person could easily read
Use the keyword type if:
- You’re planning to filter exact values
- You’re planning to filter on prefix character sequences
- You’re planning to perform term aggregations like for a faceted navigation on a website
Use the wildcard type if:
- You’re trying to find the needle in poorly tokenized or machine generated text
- You do not intend to use queries that rely on word positions
Use match_only_text if:
- You intend to run fulltext search but granular scores are not very important to you