Remote file corruption on node recovering local checksum OK – How to solve this Elasticsearch error

Opster Team

Aug-23, Version: 6.8-7.15

Briefly, this error occurs when Elasticsearch detects a discrepancy between the checksum of a file on the primary shard and its replica during the recovery process. This could be due to network issues, disk errors, or bugs. To resolve this, you can try the following: 1) Restart the Elasticsearch node, which will trigger a new recovery process. 2) Delete and recreate the corrupted replica shard. 3) Check for any underlying hardware or network issues that might be causing the corruption. Always ensure you have a backup of your data to prevent data loss.

This guide will help you check for common problems that cause the log ” {} Remote file corruption on node {}; recovering {}. local checksum OK ” to appear. To understand the issues related to this log, read the explanation below about the following Elasticsearch concepts: recovery, indices, node.

Log Context

Log “{} Remote file corruption on node {}; recovering {}. local checksum OK” classname is RecoverySourceHandler.java.
We extracted the following from Elasticsearch source code for those seeking an in-depth context :

                throw localException;
            } else { // corruption has happened on the way to replica
                RemoteTransportException remoteException = new RemoteTransportException(
                    "File corruption occurred on recovery but checksums are ok"; null);
                remoteException.addSuppressed(e);
                logger.warn(() -> new ParameterizedMessage("{} Remote file corruption on node {}; recovering {}. local checksum OK";
                    shardId; request.targetNode(); mds); corruptIndexException);
                throw remoteException;
            }
        }
        throw e;

 

 [ratemypost]