Failed to clear cache for realms – How to solve this Elasticsearch error

Opster Team

Aug-23, Version: 6.8-8.2

Briefly, this error occurs when Elasticsearch is unable to clear the cache for security realms, which could be due to a configuration issue or a problem with the underlying system resources. To resolve this issue, you can try restarting the Elasticsearch service, checking the configuration files for any inconsistencies, or increasing system resources if they are insufficient. Additionally, ensure that the user running Elasticsearch has the necessary permissions to modify the cache.

To understand why Elasticsearch failed to clear cache for realms, we recommend you run the Elasticsearch Error Check-Up which can help resolve this error and many others.

This guide will help you check for common problems that cause the log “Failed to clear cache for realms” to appear. It’s important to understand the issues related to the log, so to get started, read the general overview on common issues and tips related to the Elasticsearch concepts: cache, mapping and plugin.

Overview

This warning message is related to both the security configuration of Elasticsearch and the configuration of user roles. Even though you could be facing an issue with your configuration, you’ll see that prior to version 7.2 of Elasticsearch this was a very common warning message that could often be considered a false positive and in most cases could be safely ignored.

What it means

Elasticsearch uses the concept of realms to authenticate users against many different service providers. If you don’t have a paid subscription, you can only use the Native and File realms. On the other hand, if you do happen to have a paid subscription you can configure your cluster so the user authentication is provided by an external service provider, such as Active Directory, Kerberos and so on.

You’ll get this warning message whenever Elasticsearch couldn’t successfully fulfill a refresh/synchronization operation with one of the configured external realms.

Security realms

There are two main types of security realm:

  • Internal: no need to communicate with an external service and there can be only one configured. For example: the native and file realms.
  • External: needs to communicate with external services and you can set up as many as you need. For example, the ldap, active_directory, saml, kerberos, oidc and pki realms.

Why it occurs

When you modify the user roles configuration in your cluster, Elasticsearch will need to propagate and synchronize the changes with any external security realms you may have set up.

This synchronization will involve calling whatever API the specific realm provides for that kind of operation and, if any of those requests fails for some of the realms, then Elasticsearch will output a warning message indicating which realms couldn’t be synchronized with the latest modifications in the user roles configuration.

However, prior to version 7.2, even if you didn’t have an external realm configured you would still get this warning message in your logs. That would happen because the piece of code that is responsible for refreshing the realms didn’t check whether or not there was a realm to refresh, and would simply execute every time, generating the warning message with an empty array of realms for which the refresh operation supposedly failed:

[2020-10-06T08:39:04,959][WARN ][o.e.x.s.a.s.m.NativeRoleMappingStore] [FkhZB6J] Failed to clear cache for realms [[]]

The fix for this was introduced in 7.2.0.

How to resolve it

If you are getting this message because you are running a version prior to 7.2, the only way to solve it is to upgrade. But notice that if you don’t actually have external realms configured and you are only getting the warning with an empty array, you can safely ignore it until you are ready to upgrade your cluster.

On the other hand, if you do have external realms configured and are getting the warning message with some of these realms listed in the message’s text, you should take a look at the logs of that specific service to get a better idea on what is causing the refresh operation to fail. 

Another thing you should try is connecting to the external service from one of the cluster’s nodes using a client for that service, so you can rule out any network and credentials issues.

Log Context

Log “Failed to clear cache for realms [{}]” classname is NativeRoleMappingStore.java.
We extracted the following from Elasticsearch source code for those seeking an in-depth context :

                        Arrays.toString(realmNames)
                    )
                );
                listener.onResponse(result);
            }; ex -> {
                logger.warn(new ParameterizedMessage("Failed to clear cache for realms [{}]"; Arrays.toString(realmNames)); ex);
                listener.onFailure(ex);
            })
        );
    }

 

 [ratemypost]