Elasticsearch OpenSearch Transforms: How to Leverage OpenSearch Index Transforms

By Opster Expert Team - Gustavo

Updated: Sep 21, 2023

| 4 min read

Quick links

Background

Imagine there’s a huge data set, and the same aggregations against the data are always run, for example, using a Dashboard that displays a chart.

Chart illustrating the number of flights per airline by destination.
This chart illustrates the number of flights per airline by destination.

This can be achieved by using raw data (13059 documents, 1 document per flight), and 2 terms aggregations, one for Airlines, and one for the Destinations inside it. 

Nested aggregations might become very expensive at scale, in terms of performance.

Opensearch is optimized for these types of operations, but using “Transforms” can be extremely time saving. 

Definition

What are OpenSearch data transforms?

Transforms allows users to generate new indices based on existing data aggregations. So instead of running an aggregation every time, results can be pre-computed and displayed.

Instead of having 1 document per flight, there’s 1 document per airline per destination, with each of these documents holding the airline name, destination name, and number of flights. 

How to create transforms

For this example, the Flights Data sample is used. Go to Home → Add data →“Sample Flight Data”

Step 1 to create transforms in OpenSearch:  Go to Home → Add data →“Sample Flight Data”

Then, Index Management → Transform Jobs.

Step 2 to create transforms in OpenSearch:   Select "Index Management" and then "Transform Jobs"

And create a new transform job.

Then, choose a name, a source, and a destination index. The destination may or may not exist: 

Step 3 to create transforms in OpenSearch:  choosing a name, a source, and a destination index.

The dataset can be filtered using the DSL query.

In the next window click the plus sign and set the following: 

Carrier → Group by name
Dest → Group by name
Dest → Aggregate by count 

Users should see the following:

Step 4 to create transforms in OpenSearch: set the carrier & dests.

Each of these rows will be a document in the new index.

Click next.

Step 5 to create transforms in OpenSearch: specify schedule.

This can be triggered manually or run continuously, depending on user needs. The frequency can be set by adjusting the “Transform Execution Interval” parameter.

Now, users can review the settings and create the job. If “Job enabled by default” is set, and continuously disabled as it is in the screenshot, the job will run once and then disable. 

Step 6 to create transforms in OpenSearch: review settings and create job.

Visualizing data 

Now that the index is ready, we can visualize it. Users must create an index pattern for the Transform index. 

To create an index pattern go to Stack Management → Opensearch Dashboards → Index Pattern → Create new index pattern.

For the new index, the number of flights per destination of each airline is already computed, the documents just need to be grouped correctly.

To probe the integrity of the data, see the two tables below: One uses the 13059 documents index and the second uses the transformed index:

Tables used when probing the integrity of data.

Regular Index

The regular index utilizes the following configuration: 

Two aggregations, and then count the documents on each one.

Screenshot displaying a regular index configuration.

Transformed Index

For the transformed index, do the same aggregations but instead of counting the documents, take the value from “count_Dest,” which will provide the number of destinations each airline had: 

Screenshot showing a transformed index configuration.

Always expect one document here; represented by one row (airline x destination), so Min, Max, and Top hit yield the same result.

This is how the visualization will look, just like the one we created with the built in index:

Image showing how the visualization will look.

Now, the same results are achieved, but by saving loads of space, and running a faster query. 

Transform indices are much lighter and more efficient for aggregation tasks, but keep in mind that:

  • This new index will only contain the fields selected in the transformation phase
  • The new index is “timeless,” you cannot run timeline charts 

Example use case: keep a full version of the index for the most recent data, and store the older data as much lighter transform indices, keeping what is more relevant and saving space.

Conclusion

Transforms are very handy to condense data when granularity is not required but aggregated results are. Operating on transform-based indices instead of raw data saves a considerable amount of space and processing resources when running queries.