Elasticsearch Configuration: Configuration for Optimal Performance

By Opster Team

Updated: Nov 14, 2023

| 3 min read

Introduction

Elasticsearch is a widely used search and analytics engine that provides powerful capabilities for handling large volumes of data. To ensure optimal performance, it is essential to configure Elasticsearch correctly. This article will discuss advanced configuration options and best practices to help you fine-tune your Elasticsearch cluster for maximum efficiency.

1. JVM Heap Size

The Java Virtual Machine (JVM) heap size is a critical factor in Elasticsearch performance. It is the memory allocated to Elasticsearch for storing data and executing operations. The recommended heap size is 50% of the available system memory, with a maximum of 32GB. To set the heap size, you need to create a custom JVM options file located in the Elasticsearch `config/jvm.options.d` directory:

-Xms<size>g
-Xmx<size>g

Replace `<size>` with the desired heap size in gigabytes. Ensure that both values are the same to prevent heap resizing during runtime.

2. Thread Pool Configuration

Elasticsearch uses thread pools to manage concurrent tasks. Properly configuring thread pools can improve performance and prevent resource contention. The most important thread pools to configure are:

Search: Handles search and aggregation operations.
Bulk: Handles bulk indexing requests.
Write: Handles single-document index, update, and delete operations.

To configure thread pools, add the following settings to the `elasticsearch.yml` file:

thread_pool:
  search:
    size: <number_of_threads>
    queue_size: <queue_size>
  write:
    size: <number_of_threads>
    queue_size: <queue_size>

Replace `<number_of_threads>` with the desired number of threads and `<queue_size>` with the desired queue size. A good starting point is to set the number of threads to the number of available CPU cores and adjust the queue size based on the workload.

3. Index Settings

Index settings can significantly impact Elasticsearch performance. Some important settings to consider are:

Number of shards: Determines how the index is divided into smaller segments. A higher number of shards can improve search performance but may increase indexing overhead. The default value is 1. To set the number of shards, add the following setting to the `elasticsearch.yml` file:

  index.number_of_shards: <number_of_shards>

Number of replicas: Determines the number of copies of each shard. Increasing the number of replicas can improve search performance and fault tolerance but may increase indexing overhead. The default value is 1. To set the number of replicas, add the following setting to the `elasticsearch.yml` file:

  index.number_of_replicas: <number_of_replicas>

Refresh interval: Controls how often Elasticsearch refreshes the index to make new documents searchable. A higher refresh interval can improve indexing performance but may return stale search results since the index is refreshed less frequently. The default value is 1s. To set the refresh interval, add the following setting to the `elasticsearch.yml` file:

  index.refresh_interval: <refresh_interval>

4. Node Configuration

Elasticsearch clusters consist of multiple nodes, each with a specific role. Configuring nodes correctly can improve cluster performance and stability. Some important node configurations are:

Node roles: Assign specific roles to nodes, such as data, master, or ingest, to distribute workload and prevent resource contention. To set node roles, add the following setting to the `elasticsearch.yml` file:

  node.roles: [<role1>, <role2>, ...]

Node attributes: Assign custom attributes to nodes, such as hardware specifications or datacenter location, to control shard allocation and routing. To set node attributes, add the following setting to the `elasticsearch.yml` file:

  node.attr.<attribute_name>: <attribute_value>

5. Cluster Settings

Cluster-wide settings can be updated dynamically using the Cluster Update Settings API. Some important cluster settings to consider are:

Shard allocation: Control how shards are allocated across nodes based on factors such as node attributes, disk usage, and index settings. To update shard allocation settings, use the following API request:

  PUT /_cluster/settings
  {
    "persistent": {
      "cluster.routing.allocation.<setting_name>": <setting_value>
    }
  }

Circuit breakers: Protect the cluster from running out of memory by limiting the memory usage of specific operations. To update circuit breaker settings, use the following API request:

  PUT /_cluster/settings
  {
    "persistent": {
      "indices.breaker.<breaker_name>.limit": "<percentage>%"
    }
  }

The different circuit breaker values that `<breaker_name>` can take are: `total`, `fielddata`, `request`, `inflight_requests`, and `accounting`.

Conclusion

In conclusion, configuring Elasticsearch for optimal performance involves fine-tuning various settings, such as JVM heap size, thread pools, index settings, node configurations, and cluster settings. By following the best practices and recommendations outlined in this article, you can ensure that your Elasticsearch cluster operates efficiently and effectively.

Elasticsearch Configuring Elasticsearch for Optimal Performance

Introduction

1. JVM Heap Size

2. Thread Pool Configuration

3. Index Settings

4. Node Configuration

5. Cluster Settings

Conclusion