Quick links:
- What is a boolean query?
- Filter vs. Query
- How can boosting be achieved in boolean queries
- Types of boolean clauses in Elasticsearch
- Conclusion
What is a boolean query?
Boolean queries are used to frame the search query with logical operators, namely “AND”, “OR” and “NOT”. Elasticsearch supports the same ability with the “bool query”. These queries can be framed based on your requirements.
We can add any type of query inside each bool clause, such as terms, match and query_string.
Filter vs. Query
When should you use filters?
- The “filter” clause can be used to provide a binary response for the given query. For instance, “is this product currently in stock or not” and similarly, “is this record within the specified price/date range or not”?
- These types of queries will reduce your search space to a specific set of documents, and this doesn’t contribute to your score.
When should you use queries?
- Query context is searching for a term or multiple terms, and seeing how well the documents are getting matched to it. “must” and “should” clauses can be used in these cases.
- These types of queries do contribute to your score.
How can boosting be achieved in boolean queries
By default, Elasticsearch sorts the results based on score. Score will be computed for the documents that match the query in “query context”. In such cases, we can apply boosting for each query inside these clauses. You will find more examples in the next section.
Beyond this, Elasticsearch will return 0.0 as the score for documents that match any filter clause. There are a few cases where we need to apply scores for those filtered documents as well. In such cases, “constant_score” can be used in the filter clause to return a relevance score for all the documents that match the filter clause.
Types of boolean clauses in Elasticsearch
- Filter
- Must
- Should
- Must_not
A single bool query can contain a combination of these clauses. For example:
GET /index_name/_search { "query": { "bool": { "filter": [ { "term": { "FIELD": "VALUE" } } ], "should": [ { "match": { "FIELD": "TEXT" } }. { "query_string": { "query": "VALUE", "default_field": "FIELD" } } ], "must": [ { "term": { "FIELD": "VALUE" } } ], "must_not": [ {} ] } } }
1. Filter
The filter clause will be used to filter out the documents that match the query.
All conditions are mandatory: Elasticsearch will return only documents that match all the clauses. However the score will not be computed.
The filter clause query results can be cached in a few cases when we have a static filter on any field. So, the same filter called again will return results from the cache.
GET /products/_search { "query": { "bool": { "filter": [ { "term": { "color": "blue" } }, { "term": { "in_stock": true } } ] } } }
In the example above, we are trying to filter documents that have the color “blue” in them and are in stock. Here, color is a “keyword” type field. So documents will be filtered only if they match the exact term, and it is also a case-sensitive one. The “in_stock” field is a boolean type, so Elasticsearch just returns all available products.
As mentioned earlier, scores will not be computed for these types of searches. If the query has only this filter, then the max score will be “0.0”.
2. Must
The “must” clause is also mandatory, so only documents that match all clauses will be returned.
It is like the logical operator “AND”. All the queries inside “must” will be combined with the “AND” operator internally.
This type of query contributes to score.
GET /products/_search { "query": { "bool": { "must": [ { "term": { "color": { "value": "blue", "boost": 5 } } }, { "term": { "p_type": "Floral Top" } } ] } } }
In the example above, we are trying to match the documents that have both color “blue” and p_type “Floral Top”. Here “p_type” is a text type field and color is of the keyword type.
Elasticsearch will try to match both the terms “floral” and “top” (case-insensitive search) in the p_type field and do an exact match on the color field (case-sensitive).
In addition, the term query has a “boost” param. This is to boost the documents that match this query with the boost value of ”5.0”. The score will be calculated for the documents that match this single query as 1.0 * 5.0 = 5.0.
3. Should
Clauses that are used in the “should” query will be combined with the “OR” operator. Elasticsearch returns documents that match any one of the conditions.
If multiple terms are used within the query, then we can add “minimum_should_match” to the query. This helps to return documents that match partially as per the given “mm” value.
Elasticsearch supports multiple formats while configuring the Minimum_should_match value. Some of them are percentages and others are direct numeric values. If “30%” is provided as the mm value, then Elasticsearch will return documents that match at least 30% of words in the given phrase. Similarly, for direct numeric values, those terms should get matched in the returned document.
GET /products/_search { "query": { "bool": { "should": [ { "term": { "color": { "value": "blue", "boost": 2.0 } } }, { "query_string": { "query": "Floral Long Sleeve Dress", "fields": ["_text_"], "minimum_should_match": "50%" } } ] } } }
In the example above, we are trying to match the documents that can have either the color “blue” or the default text search field “floral long sleeve dress” (case-insensitive search).
Since the “minimum_should_match” value is 50%, out of 4 words, if any document matches 2 words, then Elasticsearch will return those documents and compute a score based on the matched terms.
The first term query has an additional “boost” parameter. This is to boost the documents that match this query with the boost value of ”2.0”. The score will be calculated for the documents that match this single query as 1.0 * 2.0 = 2.0
4. Must_not
The must_not clause query also runs in the “filter” context. It is like the logical operator “NOT”, where the documents that match these queries will not be returned.
This does not contribute to the final score, and these query results can also be cached.
GET /products/_search { "query": { "bool": { "must_not": [ { "term": { "color": "blue" } } ] } } }
The above query returns all the documents that do not have color “blue” in it.
Conclusion
In this article, we learned about the bool query in Elasticsearch and how to use it for various scenarios. A well-framed boolean query retrieves relevant results and much better query performance.