Quick links
- Definition
- How to implement the intervals query
- Example of an interval query
- Filters in Elasticsearch
- Notes and good things to know
Interval queries definition
Intervals query is a type of query that provides fine grained-control over the words and their positions in a text that are required for a document to match a query.
How to implement the intervals query
At the top level of the query is the field to be searched, followed by any number of the following clauses, as detailed below:
- match
- prefix
- fuzzy
- wildcard
- all_of (used to chain multiple conditions as AND)
- any_of (used to chain multiple conditions as OR)
1. Match
- query – text we are looking for
- max_gaps (default -1), -1 = no limit to distance between words 0 = words together
- ordered true words must be found in order they are expressed in the query
2. Prefix
- query – text being searched as prefix to any word in the text
3. Fuzzy
- query – text being searched
- fuzziness – number of edits that are required to move from query to the word in the text
4. Wildcard
- pattern – a string containing ? for any single character or * for 0 or more characters.
- eg. 123?4*6 would match 123A456 or 123B4something6, but not 123something456
- use_field – an optional parameter if you want to use a different field than the top level field
- analyzer – an optional parameter if you want to use a different search analyzer to normalize the pattern.
5. All_of
- intervals – a list of rule sub clauses that must be matched
- max_gaps – (default -1), -1 = no limit to distance between words 0 = words together
- ordered true/false: rule clauses must be met in the text in the order they are expressed in the query
6. Any_of
- intervals a list of rule sub clauses, where at least one must be matched in the text
Example of an interval query
The query below uses both “all_of” and “any_of” (AND OR)
The query below would match:
- if you go down to the woods today you’re in for a big surprise
- if you go down to the woods today you’ll never believe your eyes
Would not match:
- if you go down to the green woods today you’re in for a big surprise (because max gaps:0)
- you’re in for a big surprise if you go down to the woods today (because of “ordered:true)
POST _search { "query": { "intervals" : { "my_field" : { "all_of" : { "ordered" : true, "intervals" : [ { "match" : { "query" : "if you go down to the woods", "max_gaps" : 0, "ordered" : true } }, { "any_of" : { "intervals" : [ { "match" : { "query" : "big surprise" } }, { "match" : { "query" : "eyes" } } ] } } ] } } } } }
Filters in Elasticsearch
Filters are used to attach additional conditions to rules.
POST content_articles_v7/_search { "query": { "intervals" : { "content.en.short_description" : { "match" : { "query" : "reconstruction substations", "max_gaps" : 9, "filter" : { "not_containing" : { "match" : { "query" : "heating" } } } } } } } }
The query above would return “reconstruction of electrical substations” but not “reconstruction of heating substations.”
The following example uses a script to filter intervals based on their start and end positions. The variable can use:
interval.start – The position where the interval starts
interval.end – The position where the interval ends
interval.gaps – The number of gaps between the words
POST my_index/_search { "query": { "intervals" : { "my_text" : { "match" : { "query" : "hot meal", "filter" : { "script" : { "source" : "interval.start > 5 && interval.end < 30 && interval.gaps == 0" } } } } } } }
Notes and good things to know
Users can obtain unexpected results if “any of” is used together with two queries, one of which is the prefix of another. This is because the intervals query minimizes intervals.
Consider the following example that searches for the , immediately followed by “quick OR “quick brown” immediately followed by “fox.” Contrary to expectations, this query does not match the document “the quick brown fox” because the middle of any_of rule minimizes the options and only considers “quick” as a valid option.
POST my_index/_search { "query": { "intervals" : { "my_text" : { "all_of" : { "intervals" : [ { "match" : { "query" : "the" } }, { "any_of" : { "intervals" : [ { "match" : { "query" : "quickl" } }, { "match" : { "query" : "quick brown" } } ] } }, { "match" : { "query" : "fox" } } ], "max_gaps" : 0, "ordered" : true } } } } }
For this reason, you should explicitly state the options at the top level as shown here:
POST my_index/_search { "query": { "intervals" : { "my_text" : { "any_of" : { "intervals" : [ { "match" : { "query" : "the quick brown fox", "ordered" : true, "max_gaps" : 0 } }, { "match" : { "query" : "the quick fox", "ordered" : true, "max_gaps" : 0 } } ] } } } } }