Elasticsearch Fsync Failed

By Opster Team

Updated: Jun 7, 2023

| 2 min read

Troubleshooting and Resolving Fsync Failed Errors in Elasticsearch and OpenSearch

Fsync failed errors can occur in Elasticsearch and OpenSearch clusters when the system is unable to synchronize the file system buffers with the storage device. This can lead to data loss and impact the stability of the cluster. In this article, we will discuss the common causes of fsync failed errors and provide solutions to resolve them.

Common Causes of Fsync Failed Errors

1. Insufficient Disk Space: When the storage device runs out of space, the system cannot perform fsync operations, leading to fsync failed errors.

2. I/O Errors: Hardware issues or file system corruption can cause I/O errors, preventing the system from synchronizing the file system buffers with the storage device.

3. Slow Storage Devices: If the storage device is slow or experiencing high latency, it may not be able to keep up with the fsync operations, resulting in errors.

4. High System Load: A high system load can cause the system to become unresponsive, making it unable to perform fsync operations.

Solutions to Resolve Fsync Failed Errors

1. Free Up Disk Space: Check the available disk space on the storage device and delete any unnecessary files or indices to free up space. You can use the `_cat/allocation` API to check the disk space usage:

GET /_cat/allocation?v

If necessary, consider adding more storage capacity or using the Index Lifecycle Management (ILM) feature to automatically manage the index lifecycle.

2. Check for I/O Errors: Inspect the system logs for any I/O errors or file system corruption issues. If you find any, resolve the underlying hardware or file system issues. You may need to replace the storage device or run a file system check to fix the corruption.

3. Optimize Storage Devices: If the storage device is slow or experiencing high latency, consider upgrading to faster storage devices, such as SSDs. Additionally, ensure that the storage devices are properly configured and optimized for performance.

4. Monitor and Optimize System Load: Monitor the system load using tools like `top`, `htop`, or `iostat`. Identify any processes that are causing high system load and optimize or terminate them as needed. You can also consider adding more resources (CPU, memory) to the system or distributing the load across multiple nodes.

5. Adjust Elasticsearch and OpenSearch Settings: You can adjust the `index.translog.durability` setting to `async` to reduce the frequency of fsync operations. However, this may increase the risk of data loss in case of a crash. Use this option with caution and ensure that you have a proper backup and recovery strategy in place.

PUT /_all/_settings
{
"index.translog.durability": "async"
}

Conclusion 

By addressing the common causes of fsync failed errors and implementing the suggested solutions, you can ensure the stability and reliability of your Elasticsearch and OpenSearch clusters. Always monitor your cluster’s health and performance to proactively identify and resolve any issues that may arise.

If you want to learn how to solve related issues to fsync files in Elasticsearch, check out this guide.