What are cluster pending tasks?
Cluster pending tasks are updates to the cluster state which may have been initiated directly by a user or by the cluster itself.
Note that cluster pending tasks are specific tasks relating to the cluster state, and are not necessarily the same as the tasks from the task API (although there may be some overlap). The task API relates to tasks created by users or the cluster but these are not necessarily related to cluster state. The reason to be particularly concerned about cluster state is because delays to these tasks can cause the cluster to lose data coherence.
How to resolve
You can list the cluster pending tasks by running:
GET _cluster/pending_tasks
It is very common for this to return an empty list, because cluster tasks are usually carried out very quickly, and so the queue of tasks to be carried out is empty.
However if there are pending tasks in the queue, you may get a response like this:
{ "insert_order": 109, "priority": "URGENT", "source": "create-index [logs_23], cause [api]", "executing" : true, "time_in_queue_millis": 76, "time_in_queue": "76ms" }, { "insert_order": 36, "priority": "HIGH", "source": "shard-started ([logs_21][1], node[dMooQyuriet30A], [P], s[INITIALIZING]), reason [after recovery from shard_store]", "executing" : false, "time_in_queue_millis": 642, "time_in_queue": "642ms" }, { "insert_order": 66, "priority": "HIGH", "source": "shard-started ([logs_2][0], node[dMooQyuriet30A], [P], s[INITIALIZING]), reason [after recovery from shard_store]", "executing" : false, "time_in_queue_millis": 651, "time_in_queue": "651ms" } ] }
The insert_order indicates the order in which items were added to the queue, executing (true or false) indicates whether or not the job is being executed currently, while the source provides an explanation of why the task is required.
The Cluster Pending Tasks are usually carried out automatically and will resolve themselves with no operator intervention. Bear in mind that if you have pending tasks then this is likely to be the effect and not the cause of a problem with the cluster.
To diagnose the issue further it is useful to check the logs on the current active master node, looking for errors or issues which prevent the master node from carrying out tasks as expected.
Typical causes may be:
- Other processes running on a master node (non dedicated master nodes) that are preventing the master node from having the sufficient resources to carry out the tasks as required.
- Oversharding, as explained here: https://opster.com/guides/opensearch/opensearch-capacity-planning/opensearch-oversharding/
- Crashed nodes
- Full disks preventing shards from being allocated correctly