Quick links
Introduction
An Elasticsearch index pattern allows users to define how to match and interact with multiple indices. It is a key feature when working with data in Elasticsearch because it allows users to focus exactly on the data set that they want to work on.
In this article, we will review how to use index patterns in Elasticsearch, as well as best practices.
Elasticsearch Index Patterns: Best Practices and Usage
1. Use Wildcards to Match Multiple Indices
When defining an index pattern, you can use wildcards (*) to match multiple indices. This is particularly useful when you have time-based indices, such as logs or metrics data, that are split into daily or monthly indices.
For example, if you have daily log indices like log-2023.01.01, log-2023.01.02, and so on, you can define the index pattern as log-* to match all daily log indices, or log-2023.* to match all indices created in 2023, as shown in Figure 1, below:
Figure 1: Different ways to specify wildcarded index patterns
However, using wildcards can be rather cumbersome because users might have to compute dates in their own application logic. And, over time as the number of indices grows, users may find themselves having to query more and more data making the process even more inconvenient. Learning to leverage date math expressions can help alleviate these pain points, as we’ll see in the next section.
2. Utilize Date Math in Index Patterns
Date math expressions allow users to have Elasticsearch dynamically calculate dates based on the current date, which can be useful when working with time-based indices. Users can use date math in index patterns to match indices based on a specific time range.
In order to specify index names using date math expressions, a strict format must be followed, as shown in Figure 2, below:
Figure 2: Date math expression format
The static name can be anything users want as long as it adheres to index naming conventions. The date math expression is where users define the date expression that will be dynamically calculated by Elasticsearch at runtime. The date format is optional (defaults to `yyyy.MM.dd`) but if specified, it needs to conform to java-time date format. The time zone is also optional (defaults to `UTC`).
For example, to match the `log` index created 7 prior, users can use an index pattern like <log-{now/d-7d{yyyy.MM.dd}}>. So, if today is October 5th, 2023, then this index pattern will resolve to log-2023.09.28.
There are however two downsides to using date math expressions. These are 1) it is not possible to express a time interval using a single index pattern, and 2) it is not possible to use wildcards. So, if users need to query all indices over the last three days (today included), they will need to use the following index patterns:
<log-{now/d}>,<log-{now/d-1d}>,<log-{now/d-2d}>
It is also worth noting that when used in a request path, the index patterns need to be URL-encoded because they contain special characters:
%3Clog-%7Bnow%2Fd%7D%3E%2C%3Clog-%7Bnow%2Fd-1d%7D%3E%2C%3Clog-%7Bnow%2Fd-2d%7D%3E/_search
In contrast to the previous section, users are always going to query three days of data, but they don’t need to have custom logic to compute the appropriate dates targeting the indexes they want to query.
3. Keep Index Patterns Up-to-date
As new indices are created or old ones are deleted, it’s essential to keep index patterns up-to-date to ensure they match the correct set of indices. Regularly review index patterns and update them as needed to maintain accurate search results and analytics.
For instance, let’s assume that one year of data is kept in daily indices. If today is October 5th, 2023, then there are 365 indices since October 6th, 2022. The index pattern to query all data is easy to figure out, it’s going to be log-*. Now, if a user also needs two index patterns, one for querying the current year’s data (all 2023 so far) and another to query the rest of the previous year’s data (end of 2022), and if the application code uses static index patterns, like log-2023* and log-2022*, they will need to be updated in 2024. This is because log-2024* won’t be present and log-2022* will no longer return anything.
4. Use Aliases for Flexibility
Index aliases provide a level of abstraction that allows users to use a single name when referencing one or more indices. This can be particularly useful when working with index patterns, as it allows the underlying indices to be changed without modifying the index pattern itself.
For example, users can create an alias called logs that refers to indices log-2023.01.01, log-2023.01.02, and so on. Then, that alias can be used as an index pattern to match all indices identified by the logs alias.
At a glance, it can look like aliases and index patterns function identically, however, aliases are a bit more flexible. While an index pattern can identify several indices based on a specific naming pattern, aliases allow users to group indices having different names under a single named pattern, as is shown in Figure 3, below:
Figure 3: Aliases add even more flexibility
As you can see, we have defined two index patterns called log-* and app-* which are based on the index names. However, we have also defined an alias, whose name (all-logs) has nothing in common with the index names beneath it, but we can see in the code below that it leverages both index patterns log-* and app-* in order to target all indices. From there, the sky’s the limit and users can imagine all kinds of ways to group indices using either index patterns and/or aliases.
POST _aliases { "actions": [ { "add": { "index": "log-*", "alias": "all-logs" } }, { "add": { "index": "app-*", "alias": "all-logs" } } ] }
5. Optimize Index Patterns for Performance
When working with large datasets, it’s essential to optimize index patterns for performance. One way to do this is by limiting the number of indices matched by the index pattern. For example, instead of using a wildcard to match all indices, use a more specific pattern that only matches the most relevant indices.
For example, let’s pretend you are storing the logs of an application App1 into daily indices called log-app1-2023.01.01, log-app1-2023.01.02, etc, and the logs of another application App2 in log-app2-2023.01.01, log-app2-2023.01.02, etc. If you only want to query the logs of App1, use the index pattern log-app1-* instead of log-*, so that the query doesn’t have to visit log-app2-* indices which you are not interested in.
Additionally, consider using filtered aliases to limit the data returned by queries that use index patterns. This can help improve query performance by reducing the amount of data that needs to be processed.
6. Test Index Patterns Before Deployment
Before deploying an index pattern in a production environment, it’s essential to test it thoroughly to ensure it matches the correct set of indices and provides accurate search results and analytics. Use tools like Kibana or Elasticsearch APIs to test index patterns and verify their accuracy.
Conclusion
Elasticsearch index patterns are a powerful way to interact with multiple indices, enabling users to perform searches and analytics across a wide range of data. By following the best practices and usage guidelines outlined in this article, users can effectively manage and optimize index patterns for improved performance and accuracy, and eventually leverage aliases for even more flexibility.