This guide will cover how to plan for scaling down resources in Elasticsearch. The instructions in this guide are for manual processes in Elasticsearch.
Quick links
- Introduction
- Performance issues that could be caused by draining
- Considerations when removing an Elasticsearch node
- How to remove a node from the Elasticsearch.yml configuration
- How to rejoin excluded nodes
Introduction
When scaling down resources in Elasticsearch clusters, be aware of what happens when decreasing the number of nodes, and the correct procedure to remove them, while maintaining cluster stability.
The main considerations are:
- Performance Issues
- Preserving quorum of master nodes
- Data integrity issues
Performance issues that could be caused by draining
Disk availability
Removing a data node will transfer all its data to the other data nodes. Make sure that the other data nodes have sufficient disk capacity to receive the extra data, without exceeding the low disk watermark.
Be aware that if you are running Hot-Warm-Cold Architecture or Zone Awareness that this architecture creates restrictions on which shards are allocated on which nodes, which must also be taken into consideration when evaluating the available disk space on the remaining nodes. For a full explanation on how Elasticsearch manages disk space, read here.
Index and search rate
If you remove a data node, then the indexing and search activity of that node will need to be shared across other data nodes. Check your monitoring data to evaluate whether the remaining nodes have sufficient resources to handle the extra indexing and search activity.
As a rough guide, consider that the CPU usage on all remaining nodes will increase in proportion to the number of nodes reduced, divided by the total number of data nodes. For example, if you have 8 data nodes with a peak CPU of around 50%, after removing 1 node, you would expect a peak CPU of approx (50% * 8/7)= 57%. This is a very rough approximation, as true usage will depend on how shards are distributed across your cluster.
Considerations when removing an Elasticsearch node
- Master eligible node
- Non-master eligible node
Removing master eligible nodes – Preserving quorum of master nodes
Important! A master eligible node is any node that has roles: [master], including data nodes with the master role. In general, users should be particularly careful removing master eligible nodes, because under some circumstances the remaining master nodes may not be able to elect new master nodes, resulting in cluster downtime.
For a high availability Elasticsearch cluster, always have at least 3 master nodes. This will ensure that if one node is lost, the remaining master eligible nodes are still able to elect a new master. Furthermore, if users intend to remove more than half of the eligible nodes in a short timeframe, they must remove the nodes they want to shut down from the voting configuration before shutting down the nodes using the command below. This is because the cluster may not have had sufficient time to automatically reduce or adjust the minimum quorum necessary to elect a new master.
POST /_cluster/voting_config_exclusions?node_names=node_name1,node_name2
Ensuring data integrity when removing data nodes
Removing a data node will require transferring the data to all the other data nodes, which will be resource intensive. In order to minimize the resources required and ensure data integrity, users should carry out the shard migration in an orderly way. To do so, carry out the following steps:
How to migrate Elasticsearch shards correctly:
PUT /_cluster/settings { "transient": { "cluster.routing.allocation.exclude._ip": "192.168.1.150" } }
- Check that there are no shards left on the node you want to shut down
GET /_cat/allocation?v=true
If the result of the above command shows 0 shards on the excluded node(s), safely shut down the node(s) to permanently remove these from the cluster.
How to remove a node from the Elasticsearch.yml configuration
For master eligible nodes, users should remove the IP address from elasticsearch.yml configurations:
discovery.seed_hosts: - 192.168.1.10:9300 - 192.168.1.11 - seeds.mydomain.com
cluster.initial_master_nodes: - master-node-a - master-node-b - master-node-c
How to rejoin excluded nodes
If a data node is temporarily removed, and users later wish to return it to the cluster, then ensure the allocation exclusion setting created earlier is removed.
PUT /_cluster/settings { "transient": { "cluster.routing.allocation.exclude._ip": null } }