Elasticsearch Guide to Resolving Disk Underutilization on Hot Tier in Elasticsearch

By Opster Team

Updated: Mar 10, 2024

| 2 min read

What does this mean? 

If there is more disk space allocated to hot nodes in the Elasticsearch cluster than needed, this means that the cluster is not utilizing the available disk space efficiently, which can lead to increased costs and suboptimal performance.

Why does this occur?

This event can occur due to various reasons, such as:

  1. Overestimation of storage requirements during the initial setup of the cluster.
  2. Decrease in data volume over time, leading to unused disk space.
  3. Inefficient data management practices, such as deleting old or unnecessary data without revising storage requirements.
  4. Removal of some replica shards that were added to support a high usage peak.

Possible impact and consequences of low disk utilization

The possible impact of this event includes:

  1. Increased costs: Over-allocating disk space can lead to higher infrastructure costs, as you are paying for resources that are not being used effectively, not only storage-wise., but if the provisioned data storage resulted from a specific memory-to-disk ratio, you might also be paying for too much RAM.
  2. Suboptimal performance: Underutilized disk space can result in inefficient data storage and retrieval, which can negatively impact the performance of your Elasticsearch cluster.

How to resolve

To resolve the issue of disk underutilization on the hot tier in Elasticsearch, you can follow these recommendations:

1. Move to smaller disk capacity: By moving to smaller disks, you can reduce the amount of unused disk space and optimize your disk utilization. This can help you save money and improve the performance of your Elasticsearch cluster.

2. Reduce the number of hot data nodes: Reducing the number of hot data nodes can help you optimize disk utilization by distributing data more evenly across the available hot nodes. This can also help you save money and improve the performance of your Elasticsearch cluster.

Command example to drain data to other hot nodes so that the specified hot node can be deprovisioned:

PUT /_cluster/settings
{
  "transient": {
    "cluster.routing.allocation.exclude._ip": "10.0.0.1"
  }
}

3. Optimize data management practices: Regularly review your data management practices (especially ILM) to ensure that you are deleting old or unnecessary data, and optimizing the use of disk space.

Conclusion

By following this guide, you can resolve the issue of disk underutilization on the hot tier in Elasticsearch. By optimizing your disk utilization and reducing the number of data nodes, you can save money and improve the performance of your Elasticsearch cluster.