Global ordinals in OpenSearch
Terms aggregations rely on an internal data structure known as global ordinals. These structures maintain statistics for each unique value of a given field. Those statistics are calculated, kept at the shard level and further combined in the reduce phase to produce a final result.
The performance of terms aggregations on a given field can be harmed as the number of unique possible values for that field increases (high cardinality), but also because by default those stats are lazily calculated, meaning the first aggregation run after the last refresh will have to calculate them.
If you have a scenario where you have a heavy indexing rate of documents that contain fields with high-cardinality and you frequently execute terms aggregations on those fields, your cluster might be struggling with this issue, since the global ordinals will be frequently being recalculated.
OpenSearch suggests three approaches for dealing with this issue:
1. First, consider that the global ordinals are calculated per shard, so they only need to be recalculated for the shards that are modified, otherwise the already calculated values can be used. That being said, for indices that hold time-series data it is a good idea to implement them as time-based indices, in order to reduce the amount of shards that are modified and ultimately the amount of global ordinals that will need to be recalculated. So instead of keeping all your time-series data in one huge index, consider breaking it in smaller periodic indices.
2. Second, if you are willing to trade some performance load from search time with the ingest time, you could switch global ordinals processing from lazy to eager. That means the cluster will process them whenever new segments get refreshed, instead of when the first aggregation is executed. You can minimize the impact at ingest time, if you also choose to reduce the refresh interval.
You can enable an eager strategy for the global ordinals calculation of a given field by setting the eager_global_ordinals parameter in the field’s mapping, like so:
PUT my_index/_mapping/_doc { "properties": { "email": { "type": "keyword", "eager_global_ordinals": true } } }
3. The third approach suggested (to be used with caution) is to not build global ordinals at all, and instead make the cluster calculate the aggregation metrics directly over the raw values of the field. This eliminates the performance problems of calculating global ordinals for high-cardinality fields, but it comes at the expense of harming subsequent terms aggregations, since they will not benefit from the global ordinals. That will also cause OpenSearch to increase the memory use, since it will have to build and keep a map with every unique value found.
This approach should only be used when the aggregation is expected to be applied to a small number of documents, usually by filtering documents out.
If you choose to give it a try, the only thing you need to do is to set a “execution_hint”: “map” parameter in your terms aggregation, like so:
"aggregations": { "emails": { "terms": { "field": "email", "execution_hint": "map" } } }
Find out if you are struggling with excessive global ordinals calculation
You can find out if you are facing some performance issues with the calculation of global ordinals by temporarily enabling TRACE level logging. Execute the following command to change the relevant cluster setting in a transient way, meaning it will not be persisted to the cluster state:
PUT _cluster/settings { "transient": { "logger.org.opensearch.index.fielddata": "TRACE" } }
After that look for entries in the opensearch.log file such as the following:
global-ordinals [<email>][1014089] took [592.3ms]
That will give you a hint on how much time the calculation of global ordinals is taking and if that is what is affecting the performance of your aggregations. Be sure to bring back the logging setting to a less verbose mode after you’re done with your investigation.
You can also check for the size of the global ordinals data structure by calling the _stats API and taking a look at the memory_size_in_bytes value:
GET <index_name>/_stats/fielddata?fielddata_fields=<field_name>
Just don’t forget to reset the transient setting after you are finished debugging. You do that by assigning it a null value:
PUT _cluster/settings { "transient": { "logger.org.opensearch.index.fielddata": null } }
Profiling
OpenSearch provides the Profile API as a debugging tool that allows you to get detailed information about the execution of your request, including your aggregations.
When profiling your request just be aware that:
- It introduces a non-negligible overhead to the overall execution time, since it needs to keep track and collect metrics about each phase of the request execution.
- It currently does not measure the search fetch phase nor the network overhead.
- Profiling of the reduce phase of aggregation is currently not available.
- OpenSearch considers this API to be experimental and still under development, since they are instrumenting parts of Lucene that were not designed to be exposed to profiling, so strange behavior/numbers can occur. Be aware.