Quick links
- Introduction and background
- What is a boosting query used for?
- How to Implement Boosting Query
- Notes and good things to know
Introduction and Background
Elasticsearch has a comprehensive Query DSL (Domain Specific Language) that is based on JSON for defining queries. The query DSL uses two separate types of clauses: leaf query clauses and compound query clauses.
- Leaf query clauses – Like match, term, and range queries, leaf query clauses look for a specific value in a particular field. These queries can be used by themselves.
- Compound query clauses – these change the behavior of other compound or leaf queries, combine their results, and/or switch the context from the query to the filter. You can logically combine many queries, such as the bool and dis_max queries, or change the behavior of many queries by using compound queries, such as the constant_score and boosting queries.
For an overview on relevance scores, query context and filter context, see this guide.
Boosting a query will only return documents that match a positive query; it will minimize the score of documents that match the negative query.
What is a boosting query used for?
Boosting queries return documents that match a positive query while decreasing the relevance score of documents matching the negative query. You can demote specific documents with the boosting query without omitting them from the search results.
To better understand the query boosting use case, imagine that you have a set of search criteria and that a set of documents satisfies those criteria. There is a subset of these documents that has a certain characteristic that can be validated by an Elasticsearch query, and these documents must be pushed to the end of the list of results even though you want the user to see all of these articles. Therefore, we do a negative boosting using this query.
How to Implement Boosting Query
Boosting queries have three top-level parameters, which are:
- positive: a required query object parameter that represents the query you want to run. Each document returned must match the query. It is the main query that defines the criteria on which the documents are to be returned.
- negative: a required query object parameter that represents the query that was employed to reduce the relevance score of documents that matched. The boosting query determines the final relevance score for the document in the following manner: if a returned document matches both the positive query and the negative query, take the initial relevance score from the positive query, and multiply it with the negative_boost value.
- negative_boost: a required parameter of type float. The relevance scores of documents that match the negative query are reduced using this floating-point number, which ranges from 0 to 1.0.
Negative query results are multiplied by a value between 0 and 1 called the “negative_boost”. The value of the negative query is reduced to a quarter of the value of a positive query if the negative_boost is set to 0.25; to half at 0.5, to one-tenth at 0.1, etc. As a result, you have a lot of flexibility in how you grade your queries.
In the example below, the boosting query allows us to still include results that appear to be about the C++ programming, but to downgrade them lower than they would otherwise be.
GET books/_search { "query": { "boosting": { "positive": { "match": { "text": "Programming" } }, "negative": { "term": { "text": "C++" } }, "negative_boost": 0.5 } } }
Both a positive and a negative query are accepted. Documents that also match the negative query which is “C++”,” will be downgraded by multiplying the original _score of the document by the negative_boost. Only documents that match the positive query which is “Programming”, will be included in the results list. Any documents in the upper example that contain any of the negative terms will have their _score reduced by half.
In the other example below, we want the books with Elasticsearch in the title, but also we want to reduce the relevance of books published before 2017.
GET books/_search { "_source": [ "title", "publish_date" ], "query": { "boosting": { "positive": { "match": { "title": "Elasticsearch" } }, "negative": { "range": { "publish_date": { "lt": "2017-01-01" } } }, "negative_boost": 0.2 } } }
In the upper example, we used a query range as a negative query in the query boosting, to apply a deflator on relevance, which makes the books that have been published after 2017 with “Elasticsearch” in the title the most relevant books.
Notes and good things to know
- In query context, scores are calculated as single precision floating point numbers with just 24 bits of precision for the significand. Score calculations that are more than the significand’s precision will be converted to floats with some loss of precision.
- Use filter context for all other query clauses and query context just for conditions that should have an impact on the score of documents that match (i.e., how closely the document matches).