Overview
When defining mappings, OpenSearch will configure the fields that contain an array of objects within them as “object” type. This is fine in many cases, but sometimes the mappings will need to be adjusted. Below we will cover different scenarios and how to choose the correct mapping for every case.
Object fields
One of the advantages of using document based structures is that its properties can be grouped in a hierarchical shape. This is what we call objects.
{ "name":"I'm an object", "category": "single-object" }
Objects can be embedded inside objects and go as deep as needed.
{ "name": "Duveteuse", "category": "dog", "human_partner": { "full_name": "Ami Chien", "address": { "street": "Jolie Rue #1234", "city": "Paris", "country": { "name": "France", "code": "FR" } } } }
It doesn’t matter how deep the object inside object relation goes because OpenSearch internally will flatten it out (see explanation below).
Arrays of objects can be created as property values.
{ "name": "Father object", "age": 50, "category": "self-explaining", "children": [ { "name": "Child object1", "age": 1, "category": "learning-objects" }, { "name": "Child object2", "age": 2, "category": "learning-objects" }, { "name": "Child object3", "age": 3, "category": "learning-objects" } ] }
In this situation the field type matters, and sometimes we will have to switch from the default object type to a nested type.
Nested field type
Nested is a special type of object that is indexed as a separate document, and a reference to each of these inner documents is stored with the containing document, so we can query the data accordingly.
The problem with using object fields
To demonstrate the use of object fields vs. nested field types, we’ll first index some documents. Examples can be executed in OpenSearch Dashboards.
PUT books_test
PUT books_test/_doc/1 { "name": "An Awesome Book", "tags": [{ "name": "best-seller" }, { "name": "summer-sale" }], "authors": [ { "name": "Gustavo Llermaly", "age": "32", "country": "Chile" }, { "name": "John Doe", "age": "20", "country": "USA" } ] }
PUT books_test/_doc/2 { "name": "A Regular Book", "tags": [{ "name": "free-shipping" }, { "name": "summer-sale" }], "authors": [ { "name": "Regular author", "age": "40", "country": "USA" }, { "name": "John Doe", "age": "20", "country": "USA" } ] }
OpenSearch will dynamically generate these mappings:
GET books_test/_mapping
{ "books_test": { "mappings": { "properties": { "authors": { "properties": { "age": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }, "country": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }, "name": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } } } }, "name": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }, "tags": { "properties": { "name": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } } } } } } } }
Let’s focus on the “authors” and “tags” fields. Both are set as “object” type fields. This means OpenSearch will flatten the properties. Document 1 will look like this:
{ "name": "An Awesome Book", "tags.name": ["best-seller", "summer-sale"], "authors.name": ["Gustavo Llermaly", "John Doe"], "authors.age": [32, 20], "authors.country": ["Chile, USA"] }
As you can see, the “tags” field looks like a regular string array, but the “authors” field looks different – it was split into many array fields.
The issue with this is that OpenSearch is not storing each “authors” object’s properties separately from those of every other “authors” object.
To illustrate the problem with this mapping, let’s look at the two following queries.
Query 1: Looking for books with authors from Chile or authors who are 30-years-old or younger.
Spoiler: Both books meet these conditions.
To find books meeting these criteria, we would run the following query:
GET books_test/_search { "query": { "bool": { "should": [ { "term": { "authors.country.keyword": "Chile" } }, { "range": { "authors.age": { "lte": 30 } } } ] } } }
Both books are returned, which is correct because Gustavo Llermaly is from Chile, and John Doe is less than 30 years old.
Query 2: Books written by authors who are 30-years-old or younger AND are from Chile.
Spoiler: No books meet the criteria.
GET books_test/_search { "query": { "bool": { "filter": [ { "term": { "authors.country.keyword": "Chile" } }, { "range": { "authors.age": { "lte": 30 } } } ] } } }
This query will also return both documents and that’s incorrect. We know that the only author from Chile is 32 years old, and therefore does not meet all the necessary criteria, but OpenSearch didn’t store this relation between the authors and ages.
How to resolve it
To accurately complete the second query, we need to use a different field type called nested.
Nested is a special type of object that is indexed as a separate document, and a reference to each of these inner documents is stored with the containing document, so we can query the data accordingly.
We will have to change the mapping type. To change existing mappings we need to reindex our data.
First, create an empty index to avoid the OpenSearch dynamic mappings feature automatically generating mappings for our authors field:
PUT books_test_nested { "mappings": { "properties": { "authors": { "type": "nested" } } } }
*OpenSearch will generate all the other mappings based on the documents we index.
Now use the reindex API to move the documents from our old index to the new one:
POST _reindex { "source": { "index": "books_test" }, "dest": { "index": "books_test_nested" } }
Run this to ensure the documents transferred properly:
GET books_test_nested/_search
Now if we were to run the queries we used to answer the two questions above about books, both queries will return 0 results. This is because the nested field type uses a different type of query called nested query.
If we try to answer the questions again with nested queries, it will go as follows:
Query 1: Looking for books with authors from Chile or authors who are 30-years-old or younger.
GET books_test_nested/_search { "query": { "nested": { "path": "authors", "query": { "bool": { "should": [ { "term": { "authors.country.keyword": "Chile" } }, { "range": { "authors.age": { "lte": 30 } } } ] } } } } }
Both books are still coming back in the results, which is perfect.
Query 2: Books written by authors who are 30-years-old or younger and are from Chile.
GET books_test_nested/_search { "query": { "nested": { "path": "authors", "query": { "bool": { "filter": [ { "term": { "authors.country.keyword": "Chile" } }, { "range": { "authors.age": { "lte": 30 } } } ] } } } } }
No books are returned which is the expected result.
Why this is important
Using the nested field type for every object’s array field “just in case we need it later” sounds tempting, but it should be used exclusively, only when needed. Under the hood, Lucene is creating a new document per object in the array, and this could degrade performance or even cause a mapping explosion.
To avoid poor performance, the number of nested fields per index is limited to 50, and the number of nested objects per document is limited to 10000.
Both settings can be changed but it is not recommended:
index.mapping.nested_fields.limit
index.mapping.nested_objects.limit
If you need to index a large and unpredictable number of keyword fields on inner objects then you can use the flattened field type which maps all the object content into a single field and allows you to run basic query operations.
Summary
- Fields based on objects or arrays of objects are created with object type by default.
- Object field type does not support querying tied properties within individual objects.
- Do not use nested type if there will only be one inner object per outer object.
- Otherwise, use nested type fields if you need to query two or more fields within the same inner object, otherwise use the object type.
- Too many nested objects could cause performance degradation or mapping explosion.
- Use flattened field type to map all keyword fields of an inner object into a single field.