Elasticsearch Elasticsearch Nested Aggregation 

By Opster Team

Updated: Mar 10, 2024

| 3 min read

Introduction

Handling complex data structures is a common challenge when working with Elasticsearch. One such structure is nested documents, from which you can efficiently extract metrics using Elasticsearch’s nested aggregations. In this article, we will delve into the concept of Elasticsearch nested aggregation, its use cases, and how to implement it effectively.

Understanding Nested Documents

Before diving into nested aggregations, it’s essential to understand nested documents. In Elasticsearch, documents can have complex fields that actually contain a collection of related sub-documents, known as nested documents. These nested documents are stored in a separate index, allowing for efficient querying and better relevance scoring.

Use Cases for Nested Aggregation

When to use nested aggregations in Elasticsearch?

Nested aggregations are particularly useful when dealing with complex data structures which contain a collection of sub-documents related to it, such as:
– E-commerce platforms: Analyzing product reviews, where each product has multiple reviews, and each review has multiple attributes such as rating, author, and date.
– Social media platforms: Analyzing user-generated content, where each user has multiple posts, and each post has multiple attributes such as likes, comments, and shares.
– Log analysis: Analyzing logs with multiple levels of information, such as server logs containing request and response data, each with its attributes.

Implementing Nested Aggregation

How to implement nested aggregations in Elasticsearch

  1. Define the Nested Mapping

  2. Index the Documents

  3. Perform Nested Aggregation

Now, let’s walk through the process of implementing nested aggregation in Elasticsearch step-by-step.

Step 1: Define the Nested Mapping

First, you need to define the nested mapping for the field containing nested documents. For example, let’s consider an e-commerce platform with products and reviews:

PUT /products
{
  "mappings": {
    "properties": {
      "name": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      },

      "reviews": {
        "type": "nested",
        "properties": {
          "rating": {
            "type": "integer"
          },
          "author": {
            "type": "text"
          },
          "date": {
            "type": "date"
          }
        }
      }
    }
  }
}

The keyword sub-field is necessary because we are going to create buckets for each product and then calculate the average reviews rating for each product.

Step 2: Index the Documents

Next, index the documents below, which contain a couple of reviews in the form of nested documents. 

PUT /products/_doc/1
{
  "name": "Product A",
  "reviews": [
    {
      "rating": 5,
      "author": "Alice",
      "date": "2021-01-01"
    },
    {
      "rating": 4,
      "author": "Bob",
      "date": "2021-01-02"
    }
  ]
}

PUT /products/_doc/2
{
  "name": "Product B",
  "reviews": [
    {
      "rating": 1,
      "author": "John",
      "date": "2021-01-03"
    },
    {
      "rating": 2,
      "author": "Mary",
      "date": "2021-01-04"
    },
    {
      "rating": 3,
      "author": "James",
      "date": "2021-01-05"
    },
    {
      "rating": 4,
      "author": "Elisabeth",
      "date": "2021-01-06"
    },
    {
      "rating": 5,
      "author": "Richard",
      "date": "2021-01-07"
    }
  ]
}

Step 3: Perform Nested Aggregation

Now, you can perform nested aggregation on the nested documents. For example, let’s calculate the average rating for each product:

GET /products/_search
{
  "size": 0,
  "aggs": {
    "products": {
      "terms": {
        "field": "name.keyword"
      },
      "aggs": {
        "reviews": {
          "nested": {
            "path": "reviews"
          },
          "aggs": {
            "average_rating": {
              "avg": {
                "field": "reviews.rating"
              }
            }
          }
        }
      }
    }
  }
}

We first create buckets for each product using the terms aggregation. Then, for each product we run the nested aggregation so we have access to the collection of nested documents. Finally, we can calculate a metric aggregation on those nested documents, in the case of our example, the average rating.

Here is what the response should look like:

{
…
  "aggregations": {
    "products": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "Product A",
          "doc_count": 1,
          "reviews": {
            "doc_count": 2,
            "average_rating": {
              "value": 4.5
            }
          }
        },
        {
          "key": "Product B",
          "doc_count": 1,
          "reviews": {
            "doc_count": 5,
            "average_rating": {
              "value": 3
            }
          }
        }
      ]
    }
  }
}

Conclusion

Elasticsearch nested aggregation is a powerful technique for analyzing complex data structures that contain nested documents. By defining the nested mapping, indexing nested documents, and performing nested aggregations, you can efficiently analyze and extract insights from your data. Whether you’re working with e-commerce, social media, or log analysis, nested aggregation can help you uncover valuable information and improve your search and analytics capabilities.

If you want to learn more about Elasticsearch’s aggregation capabilities you can take a look at this guide available on Opster’s website. You may also be interested in this guide, which explains the difference between object and nested types. There’s also this guide which goes into detail on how to create relationships in Elasticsearch using the nested field type.