Briefly, this error occurs when Elasticsearch is trying to recover indices and the number of concurrent small file streams is being updated. This could be due to a high load on the system or network issues. To resolve this, you can increase the limit of concurrent small file streams if your system resources allow it. Alternatively, you can try to reduce the load on your system by optimizing your queries or reducing the number of indices. Lastly, check your network connection and ensure it’s stable and fast enough to handle the data transfer.
This guide will help you check for common problems that cause the log ” updating [indices.recovery.concurrent_small_file_streams] from [{}] to [{}] ” to appear. To understand the issues related to this log, read the explanation below about the following Elasticsearch concepts: indices, recovery and settings.
Overview
In Elasticsearch, an index (plural: indices) contains a schema and can have one or more shards and replicas. An Elasticsearch index is divided into shards and each shard is an instance of a Lucene index.
Indices are used to store the documents in dedicated data structures corresponding to the data type of fields. For example, text fields are stored inside an inverted index whereas numeric and geo fields are stored inside BKD trees.
Examples
Create index
The following example is based on Elasticsearch version 5.x onwards. An index with two shards, each having one replica will be created with the name test_index1
PUT /test_index1?pretty { "settings" : { "number_of_shards" : 2, "number_of_replicas" : 1 }, "mappings" : { "properties" : { "tags" : { "type" : "keyword" }, "updated_at" : { "type" : "date" } } } }
List indices
All the index names and their basic information can be retrieved using the following command:
GET _cat/indices?v
Index a document
Let’s add a document in the index with the command below:
PUT test_index1/_doc/1 { "tags": [ "opster", "elasticsearch" ], "date": "01-01-2020" }
Query an index
GET test_index1/_search { "query": { "match_all": {} } }
Query multiple indices
It is possible to search multiple indices with a single request. If it is a raw HTTP request, index names should be sent in comma-separated format, as shown in the example below, and in the case of a query via a programming language client such as python or Java, index names are to be sent in a list format.
GET test_index1,test_index2/_search
Delete indices
DELETE test_index1
Common problems
- It is good practice to define the settings and mapping of an Index wherever possible because if this is not done, Elasticsearch tries to automatically guess the data type of fields at the time of indexing. This automatic process may have disadvantages, such as mapping conflicts, duplicate data and incorrect data types being set in the index. If the fields are not known in advance, it’s better to use dynamic index templates.
- Elasticsearch supports wildcard patterns in Index names, which sometimes aids with querying multiple indices, but can also be very destructive too. For example, It is possible to delete all the indices in a single command using the following commands:
DELETE /*
To disable this, you can add the following lines in the elasticsearch.yml:
action.destructive_requires_name: true
Overview
In Elasticsearch, recovery refers to the process of recovering a shard when something goes wrong. Shard recoveries can take place in various circumstances, such as when a node fails and a replica shard needs to be recreated from a primary shard, when the cluster needs to relocate shards to different nodes due to a rebalancing or a change in shard allocation settings, or when restoring an index from an Elasticsearch snapshot. Alternatively, Elasticsearch can sometimes perform recoveries automatically, such as when a node restarts or disconnects and connects again. In summary, recovery can happen in the following scenarios:
- Node startup or failure (local store recovery)
- Replication of primary shards to replica shards
- Relocation of a shard to a different node in the same cluster
- Restoration of a snapshot
Planned node restart
If you are planning to restart a node, there are some actions that you can take to speed up the shard recoveries when the node has restarted. For optimal recovery speed, you should stop any indexing to the shards that are hosted on the node that is about to be restarted. Once you’ve stopped your indexing process, you can perform the following actions:
1. Disable shard allocation to prevent shards from being reallocated to other nodes while the node is restarting using the following command:
PUT _cluster/settings { "persistent": { "cluster.routing.allocation.enable": "primaries" } }
It is worth noting that by default the shard relocation process only starts after one minute and that delay can be configured with the `index.unassigned.node_left.delayed_timeout` index setting.
2. Once shard relocation is disabled, you need to flush the transaction logs (using the command below), which will ensure that all operations currently stored in the transactions log are safely committed to the Lucene index on disk. That will save you time during the restart since no operations will need to be replayed, meaning that the recovery of your shards will be faster.
POST /_flush
Note that prior to ES 8.0, this operation was called synced-flush, but it was deprecated in 7.6 and removed in 8.0.
3. At this point, you can restart your node.
4. When the node has properly restarted, you can re-enable shard allocation using the following command:
PUT _cluster/settings { "persistent": { "cluster.routing.allocation.enable": null } }
If you have several nodes to restart or you are performing a full cluster restart, you can use the same procedure. The key points to remember for speeding up the recovery process are to stop any indexing and to flush your transaction log.
While the recovery process is in progress, there are a few API calls that allow us to monitor the status of the shard recoveries:
# Check the recovery status of a specific index
GET /<index>/_recovery
# Check the recovery status of all indexes
GET /_recovery
# Check the recovery status of all indexes (more concise format)
GET _cat/recovery
Tweaking recovery speed
If you cannot stop your indexing process for whatever reason, you can still perform the same procedure. However, since new data will keep flowing in while the node is restarting, all the indexing operations will need to be replayed, which will slow down the recovery process. However, there are a few knobs that you can tune to speed this up provided you have sufficient hardware resources (CPU, RAM, network).
By default, the total inbound and outbound recovery traffic on each hot and warm data node is limited to 40 Mbps. For dedicated cold and frozen nodes, that limit ranges from 40 Mbps to 250 Mbps depending on the total amount of memory available on those nodes. These default values have been determined empirically based on the assumption that the hardware is composed of standard SSD disks and a network interface with 1 Gbps throughput.If you have superior hardware (e.g., 10 Gbps network and 100K IOPS disks), you can increase the recovery traffic limit to a higher value using the following command:
PUT /_cluster/settings { "transient": { "indices.recovery.max_bytes_per_sec": "100mb" } }
You should be very careful when changing this setting as it can harm your cluster performance if the value you set is too high. Also, there are a few other expert settings that you can tweak if you want to optimize the recovery process, but changing the defaults on these expert settings is strongly discouraged unless you know exactly what you’re doing.
Conclusion
In this guide, we have explained what the shard recovery process is and under which circumstances it kicks in. We have also reviewed a few techniques to speed up the recovery process and highlighted what you need to pay attention to when you start tweaking the default recovery settings values.
Settings in Elasticsearch
In Elasticsearch, you can configure cluster-level settings, node-level settings and index level settings. Here is a quick rundown of each level.
A. Cluster settings
These settings can either be:
- Persistent, meaning they apply across restarts, or
- Transient, meaning they won’t survive a full cluster restart.
If a transient setting is reset, the first one of these values that is defined is applied:
- The persistent setting
- The setting in the configuration file
- The default value
The order of precedence for cluster settings is:
- Transient cluster settings
- Persistent cluster settings
- Settings in the elasticsearch.yml configuration file
Examples
An example of persistent cluster settings update:
PUT /_cluster/settings { "persistent" : { "indices.recovery.max_bytes_per_sec" : "500mb" } }
An example of a transient update:
PUT /_cluster/settings { "transient" : { "indices.recovery.max_bytes_per_sec" : "40mb" } }
B. Index settings
These are the settings that are applied to individual indices. There is an API to update index level settings.
Examples
The following API call will set the number of replica shards to 5 for my_index index.
PUT /my_index/_settings { "index" : { "number_of_replicas" : 5 } }
To revert a setting to the default value, use null.
PUT /my_index/_settings { "index" : { "refresh_interval" : null } }
C. Node settings
These settings apply to nodes. Nodes can fulfill different roles. These include the master, data, and coordination roles. Node settings are set through the elasticsearch.yml file for each node.
Examples
Setting a node to be a data node (in the elasticsearch.yml file):
node.data: true
Disabling the ingest role for the node (which is enabled by default):
node.ingest: false
For production clusters, you will need to run each type of node on a dedicated machine with two or more instances of each, for HA (minimum three for master nodes).
Notes and good things to know
- Learning more about the cluster settings and index settings is important – it can spare you a lot of trouble. For example, if you are going to ingest huge amounts of data into an index and the number of replica shards is set to say, 5, the indexing process will be super slow because the data will be replicated at the same time it is indexed. What you can do to speed up indexing is to set the replica shards to 0 by updating the settings, and set it back to the original number when indexing is done, using the settings API.
- Another useful example of using cluster-level settings is when a node has just joined the cluster and the cluster is not assigning any shards to the node. Although shard allocation is enabled by default on all nodes, someone may have disabled shard allocation at some point (for example, in order to perform a rolling restart), and forgot to re-enable it later. To enable shard allocation, you can update the Cluster Settings API:
PUT /_cluster/settings{"transient":{"cluster.routing.allocation.enable":"all"}}
- It’s better to set cluster-wide settings with Settings API instead of with the elasticsearch.yml file and to use the file only for local changes. This will keep the same setting on all nodes. However, if you define different settings on different nodes by accident using the elasticsearch.yml configuration file, it is hard to notice these discrepancies.
- See also: Recovery
Log Context
Log “updating [indices.recovery.concurrent_small_file_streams] from [{}] to [{}]” classname is RecoverySettings.java.
We extracted the following from Elasticsearch source code for those seeking an in-depth context :
RecoverySettings.this.concurrentStreamPool.setMaximumPoolSize(concurrentStreams); } int concurrentSmallFileStreams = settings.getAsInt(INDICES_RECOVERY_CONCURRENT_SMALL_FILE_STREAMS; RecoverySettings.this.concurrentSmallFileStreams); if (concurrentSmallFileStreams != RecoverySettings.this.concurrentSmallFileStreams) { logger.info("updating [indices.recovery.concurrent_small_file_streams] from [{}] to [{}]"; RecoverySettings.this.concurrentSmallFileStreams; concurrentSmallFileStreams); RecoverySettings.this.concurrentSmallFileStreams = concurrentSmallFileStreams; RecoverySettings.this.concurrentSmallFileStreamPool.setMaximumPoolSize(concurrentSmallFileStreams); } RecoverySettings.this.retryDelayNetwork = maybeUpdate(RecoverySettings.this.retryDelayNetwork; settings; INDICES_RECOVERY_RETRY_DELAY_NETWORK);
[ratemypost]