Elasticsearch Elasticsearch Aggregation Query

By Opster Team

Updated: Aug 26, 2023

| 2 min read

Quick Links

Introduction

Elasticsearch aggregation queries are a cornerstone of advanced data analysis, enabling users to extract complex and detailed insights from their data. Aggregations provide a way to group and extract statistics from your data, and they come in various flavors, including metrics, bucketing, and pipeline aggregations.

Metrics aggregations

Metrics aggregations are used to compute metrics over a set of documents. They include simple ones like min, max, sum, and average, as well as more complex ones like stats, extended stats, and percentiles.

For instance, if you want to find the average price of all products in your e-commerce store, you could use the following aggregation query:

json
GET /products/_search
{
  "size": 0,
  "aggs": {
    "average_price": {
      "avg": {
        "field": "price"
      }
    }
  }
}

In this example, the “size”: 0 parameter is used to return only aggregation results and not any documents. The “average_price” aggregation calculates the average value of the “price” field across all documents.

Bucketing aggregations

Bucketing aggregations, on the other hand, are used to group documents into buckets based on certain criteria. Common types of bucketing aggregations include terms, date histogram, and range.

For example, to group products by category, you could use a terms aggregation like this:

json
GET /products/_search
{
  "size": 0,
  "aggs": {
    "products_by_category": {
      "terms": {
        "field": "category.keyword"
      }
    }
  }
}

In this query, the “products_by_category” aggregation creates a bucket for each unique value of the “category.keyword” field.

Pipeline aggregations

Pipeline aggregations, which operate on the results of other aggregations, provide a way to create more complex analytics. They include derivatives, moving averages, and bucket scripts.

For instance, to calculate the monthly growth rate of sales, you could use a derivative pipeline aggregation on a date histogram of sales like this:

json
GET /sales/_search
{
  "size": 0,
  "aggs": {
    "sales_over_time": {
      "date_histogram": {
        "field": "date",
        "calendar_interval": "month"
      },
      "aggs": {
        "sales": {
          "sum": {
            "field": "amount"
          }
        },
        "sales_growth": {
          "derivative": {
            "buckets_path": "sales"
          }
        }
      }
    }
  }
}

In this query, the “sales_over_time” aggregation creates a bucket for each month, and the “sales” sub-aggregation calculates the total sales for each month. The “sales_growth” pipeline aggregation then calculates the month-over-month growth in sales.

Nested aggregations

Elasticsearch also supports nested aggregations, allowing you to compute aggregations within buckets. This is particularly useful when dealing with hierarchical or multi-level data.

For example, to find the average price of products within each category, you could use a nested terms and average aggregation like this:

json
GET /products/_search
{
  "size": 0,
  "aggs": {
    "products_by_category": {
      "terms": {
        "field": "category.keyword"
      },
      "aggs": {
        "average_price": {
          "avg": {
            "field": "price"
          }
        }
      }
    }
  }
}

In this query, the “products_by_category” aggregation creates a bucket for each category, and the “average_price” sub-aggregation calculates the average price within each category.