Quick Links
- Parent child relationship
- Use cases
- Implementing parent child relationships
- Querying parent child relationships
- Limitations
Introduction
Elasticsearch offers a unique feature known as the parent-child relationship, which allows you to link documents in a way that mirrors real-world relationships. This feature is particularly useful when dealing with complex data structures that require a hierarchical relationship between documents.
The parent-child relationship
The parent-child relationship in Elasticsearch is a type of connection between two document types. The parent document and child document are independent, meaning that they can be updated and indexed separately. However, they are related in such a way that the child document can be associated with the parent document.
This relationship is different from the nested object model, where all nested documents are stored within the parent document and cannot be accessed or updated separately. In the parent-child model, the child documents are stored separately but are linked to the parent document through a parent ID.
Use cases for parent-child relationships
Parent-child relationships are useful in scenarios where there is a one-to-many relationship between entities. For example, in an e-commerce application, a single customer (parent) can have multiple orders (children). Similarly, in a blogging platform, a single blog post (parent) can have multiple comments (children).
Implementing parent-child relationships
To implement a parent-child relationship in Elasticsearch, you need to use the `join` data type.
Here’s how you can do it:
Step 1:
Define the parent/child relationship using the join data type, for instance for questions and answers:
json PUT /my_index { "mappings": { "properties": { "content": { "type": "text" }, "join_field": { "type": "join", "relations": { "question": "answer" } } } } }
Step 2:
Index a parent question document:
json PUT /my_index/_doc/1 { "content": "This is a question", "join_field": { "name": "question" } }
Step 3:
Index a child answer document and specify the ID of the parent question document as well as the routing to make sure that the child document is stored on the same shard as the parent document:
json PUT /my_index/_doc/2?routing=1 { "content": "This is an answer", "join_field": { "name": "answer", "parent": 1 } }
Querying parent-child relationships
You can query parent-child relationships using the `has_child` and `has_parent` queries. The `has_child` query returns parent documents that have at least one child document that matches the specified query. The `has_parent` query returns child documents that have a parent document that matches the specified query.
Here’s an example of a `has_child` query:
json GET /my_index/_search { "query": { "has_child": { "type": "answer", "query": { "match": { "content": "some answer" } } } } }
And here’s an example of a `has_parent` query:
json GET /my_index/_search { "query": { "has_parent": { "parent_type": "question", "query": { "match": { "content": "some question" } } } } }
Limitations of parent-child relationships
While parent-child relationships offer a lot of flexibility, they also come with some limitations. The main limitation is that the parent and child documents must be indexed on the same shard. This is because the relationship between the parent and child documents is managed at the shard level. This can lead to uneven data distribution if a large number of child documents are associated with a single parent document.
Regarding performance, you need to be aware that the `has_child` and `has_parent` queries perform joins, and thus, can be slower than other queries. The best practice will always be to denormalize your data instead of using the `join` data type.