Quick Links
- Implementing moving functions in Elasticsearch
- Advanced usage of moving functions
- Use cases for moving functions
Introduction
Moving functions are a critical component in data analysis and forecasting in Elasticsearch. They provide a smoothed line for seasonality or trends in your data. This article will delve into the advanced usage of moving functions in Elasticsearch, including how to implement them and some common use cases.
Implementing moving functions in Elasticsearch
Elasticsearch provides a moving functions aggregation as part of its analytics capabilities. This aggregation calculates a moving function for a specified window of documents. Moving functions can take the form of a custom script defined by the user. There are also pre-built functions that can be used out of the box, such as min, max, sum, standard deviation, moving average, etc.
Here’s how to implement it:
1. Define your base aggregation. This could be any type of aggregation that supports bucket aggregations, such as date histogram, histogram, or terms.
2. Add a metric aggregation as a sub-aggregation. This can be any type of aggregation that computes some metric (such as sum, min, max, etc) over all the documents present in the buckets created by the previous aggregation.
3. Add a moving function aggregation as another sub-aggregation. Specify either a custom function script or a pre-built one as well as the `buckets_path` to point to the previous metric aggregation for which you want to calculate the moving function.
4. Define the window size for the moving function calculation. This is the number of recent buckets to consider.
Here’s an example of how to implement a moving function aggregation for computing the moving average of monthly sales:
json GET /_search { "size": 0, "aggs": { "sales_over_time": { "date_histogram": { "field": "date", "calendar_interval": "month" }, "aggs": { "sales": { "sum": { "field": "sales" } }, "sales_moving_avg": { "moving_fn": { "buckets_path": "sales", "script": "MovingFunctions.linearWeightedAvg(values)", "window": 3 } } } } } }
In this example, we’re calculating the sum of sales for each month, and then calculating a 3-month moving linear weighted average of sales.
Advanced usage of moving functions
Moving functions in Elasticsearch support several types of moving average models, including unweighted, linear weighted, exponential weighted (aka single exponential), holt (aka double exponential) and holtWinters (aka triple exponential). Each moving average model has its own strengths and weaknesses, and the choice of model depends on the characteristics of your data.
For example, if your data shows a clear trend but no seasonality, a linear model might be appropriate. If your data shows both a trend and seasonality, the holtWinters model might be a better choice.
Here’s an example of how to use the exponential weighted moving average model:
json GET /_search { "size": 0, "aggs": { "sales_over_time": { "date_histogram": { "field": "date", "calendar_interval": "month" }, "aggs": { "sales": { "sum": { "field": "sales" } }, "sales_moving_avg": { "moving_fn": { "buckets_path": "sales", "window": 3, "script": "MovingFunctions.ewma(values, 0.3)" } } } } } }
In this example, we’re using the ewma moving average function with an exponential decay of 0.3, which means that older data-points become exponentially less important by a factor of 0.3, rather than linearly less important like with the linear weighted average.
Use cases for moving functions
Moving functions are useful in a variety of scenarios:
1. Trend Analysis: Moving averages can help identify underlying trends in your data by smoothing out short-term fluctuations.
2. Anomaly Detection: By comparing the actual values to the moving average, you can identify anomalies. A significant deviation from the moving average might indicate an anomaly.
3. Forecasting: Moving averages can be used to forecast future values. However, keep in mind that moving averages are a simple form of forecasting and might not be accurate for complex data patterns.