Elasticsearch Elasticsearch Data Nodes: Optimizing Performance and Resource Utilization

By Opster Team

Updated: Jun 22, 2023

| 2 min read

Introduction

Data nodes in Elasticsearch play a critical role in storing, indexing, and searching data. They are responsible for holding the actual data (shards) and executing data-related operations such as CRUD, search, and aggregations. In this article, we will discuss how to optimize data nodes for better performance and resource utilization.

Optimize data nodes for better performance and resource utilization

1. Configure Data Node Hardware

To ensure optimal performance, it is essential to configure the hardware of your data nodes according to your use case. Consider the following factors when selecting hardware for data nodes:

  • CPU: Choose a CPU with the appropriate core count to handle concurrent indexing and search requests efficiently.
  • Memory: Allocate sufficient memory to the data nodes to hold the JVM heap, file system cache, and other system processes.
  • Storage: Use SSDs for faster disk I/O and improved query performance. Ensure that the storage capacity is sufficient to hold the data and accommodate future growth.
  • Network: Use a high-speed network to minimize latency between nodes and improve cluster communication.

2. Configure JVM Heap Size

The JVM heap size is a crucial factor in Elasticsearch performance. It is recommended to allocate up to 50% of the available system memory to the JVM heap, with a maximum of 32GB. This allows the remaining memory to be used for file system cache and other processes. To set the heap size, create a custom JVM options file inside the `jvm.options.d` directory in the Elasticsearch configuration directory and configure the heap size:

-Xms16g
-Xmx16g

3. Configure Index Shards and Replicas

Properly configuring index shards and replicas can significantly improve data node performance. Consider the following guidelines:

  • Shard size: Aim for a shard size between 10GB and 50GB. Larger shards can lead to longer recovery times and increased heap usage.
  • Number of shards: Distribute shards evenly across data nodes to balance the load. Use the `_split` or `_shrink` APIs to adjust the number of shards if necessary.
  • Replicas: Set the number of replicas according to your fault tolerance and search performance requirements. More replicas can improve search performance but consume additional resources.

4. Optimize Indexing Performance

To improve indexing performance on data nodes, consider the following optimizations:

  • Bulk indexing: Use the bulk API to index multiple documents in a single request, reducing the overhead of individual indexing operations.
  • Refresh interval: Increase the refresh interval (default is 1 second) to reduce the frequency of segment creation and merging. This can be set on a per-index basis using the `index.refresh_interval` setting.
  • Index templates: Use index templates to pre-configure index settings and mappings, ensuring consistent configurations across indices.

5. Optimize Search Performance

To enhance search performance on data nodes, consider these optimizations:

  • Query optimization: Use filters, pagination, and source filtering to reduce the amount of data returned by search queries.
  • Caching: Enable query and request cache to cache the results of frequently executed queries, reducing the load on data nodes.
  • Search profiling: Use the search profiler to identify slow queries and optimize them accordingly.

6. Monitor Data Node Performance

Monitoring data node performance is essential to identify bottlenecks and ensure optimal resource utilization. Use the Elasticsearch monitoring APIs, such as the `_nodes/stats` and `_cluster/health` endpoints, to gather performance metrics and monitor the health of your data nodes.

7. Perform Regular Maintenance

Regular maintenance tasks can help maintain data node performance and prevent issues:

  • Index lifecycle management: Use the index lifecycle management (ILM) feature to automate index management tasks such as rollover, force merge, and deletion.
  • Segment merging: Monitor the segment count and use the force merge API to reduce the number of segments when necessary.
  • Cluster rebalancing: Use the cluster allocation settings and the `_cluster/reroute` API to rebalance shards across data nodes, ensuring even resource utilization.

Conclusion 

By following these best practices and optimizations, you can significantly improve the performance and resource utilization of your Elasticsearch data nodes, ensuring a responsive and efficient cluster.