Failed to clear cache for realms - Common causes and quick fixes

Opster Team

Aug-23, Version: 6.8-8.2

Briefly, this error occurs when Elasticsearch is unable to clear the cache for security realms, which could be due to a configuration issue or a problem with the underlying system resources. To resolve this issue, you can try restarting the Elasticsearch service, checking the configuration files for any inconsistencies, or increasing system resources if they are insufficient. Additionally, ensure that the user running Elasticsearch has the necessary permissions to modify the cache.

To understand why Elasticsearch failed to clear cache for realms, we recommend you run the Elasticsearch Error Check-Up which can help resolve this error and many others.

This guide will help you check for common problems that cause the log “Failed to clear cache for realms” to appear. It’s important to understand the issues related to the log, so to get started, read the general overview on common issues and tips related to the Elasticsearch concepts: cache, mapping and plugin.

Overview

This warning message is related to both the security configuration of Elasticsearch and the configuration of user roles. Even though you could be facing an issue with your configuration, you’ll see that prior to version 7.2 of Elasticsearch this was a very common warning message that could often be considered a false positive and in most cases could be safely ignored.

What it means

Elasticsearch uses the concept of realms to authenticate users against many different service providers. If you don’t have a paid subscription, you can only use the Native and File realms. On the other hand, if you do happen to have a paid subscription you can configure your cluster so the user authentication is provided by an external service provider, such as Active Directory, Kerberos and so on.

You’ll get this warning message whenever Elasticsearch couldn’t successfully fulfill a refresh/synchronization operation with one of the configured external realms.

Security realms

There are two main types of security realm:

Internal: no need to communicate with an external service and there can be only one configured. For example: the native and file realms.
External: needs to communicate with external services and you can set up as many as you need. For example, the ldap, active_directory, saml, kerberos, oidc and pki realms.

Why it occurs

When you modify the user roles configuration in your cluster, Elasticsearch will need to propagate and synchronize the changes with any external security realms you may have set up.

This synchronization will involve calling whatever API the specific realm provides for that kind of operation and, if any of those requests fails for some of the realms, then Elasticsearch will output a warning message indicating which realms couldn’t be synchronized with the latest modifications in the user roles configuration.

However, prior to version 7.2, even if you didn’t have an external realm configured you would still get this warning message in your logs. That would happen because the piece of code that is responsible for refreshing the realms didn’t check whether or not there was a realm to refresh, and would simply execute every time, generating the warning message with an empty array of realms for which the refresh operation supposedly failed:

[2020-10-06T08:39:04,959][WARN ][o.e.x.s.a.s.m.NativeRoleMappingStore] [FkhZB6J] Failed to clear cache for realms [[]]

The fix for this was introduced in 7.2.0.

How to resolve it

If you are getting this message because you are running a version prior to 7.2, the only way to solve it is to upgrade. But notice that if you don’t actually have external realms configured and you are only getting the warning with an empty array, you can safely ignore it until you are ready to upgrade your cluster.

On the other hand, if you do have external realms configured and are getting the warning message with some of these realms listed in the message’s text, you should take a look at the logs of that specific service to get a better idea on what is causing the refresh operation to fail.

Another thing you should try is connecting to the external service from one of the cluster’s nodes using a client for that service, so you can rule out any network and credentials issues.

Overview

Elasticsearch uses three types of caches to improve the efficiency of operation.

Node request cache
Shard data cache
Field data cache

How they work

Node request cache maintains the results of queries used in a filter context. The results are evicted on a least recently used basis.

Shard data cache maintains the results of frequently used queries where size=0, particularly the results of aggregations. This cache is particularly relevant for logging use cases where data is not updated on old indices, and regular aggregations can be kept in cache to be reused.

The field data cache is used for sorting and aggregations. To keep these operations quick Elasticsearch loads these values into memory.

Examples

Elasticsearch usually manages cache behind the scenes, without the need for any specific settings. However, it is possible to monitor and limit the amount of memory being used on each node for a given cache type by putting the following in elasticsearch.yml :

indices.queries.cache.size: 10%

indices.fielddata.cache.size: 30%

Note, the above values are in fact the defaults, and there is no need to set them specifically. The default values are good for most use cases, and should rarely be modified.
You can monitor the use of caches on each node like this:

GET /_nodes/stats/indices/fielddata

GET /_nodes/stats/indices/query_cache

GET /_nodes/stats/indices/request_cache

Notes and good things to know

Construct your queries with reusable filters. There are certain parts of your query which are good candidates to be reused across a large number of queries, and you should design your queries with this in mind. Anything thing that does not need to be scored should go in the filter section of a bool query. For example, time ranges, language selectors, or clauses that exclude inactive documents are all likely to be excluded in a large number of queries, and should be included in filter parts of the query so that they can be cached and reused.

In particular, take care with time filters. “now-15m” cannot be reused, because “now” will continually change as the time window moves on. On the other hand “now-15/m” will round to the nearest minute, and can be re-used (via cache) for 60 seconds before rolling over to the next minute.

For example when a user enters the search term “brexit”, we may want to also filter on language and time period to return relevant articles. The query below leaves only the query term “brexit” in the “must” part of the query, because this is the only part which should affect the relevance score. The time filter and language filter can be reused time and time again for new queries for different searches.

POST results/_search
{
  "query": {
	"bool": {
  	"must": [
    	{
      	"match": {
        	"message": {
          	"query": "brexit"
        	}
      	}
    	}
  	],
  	"filter": [
    	{
      	"range": {
        	"@timestamp": {
          	"gte": "now-10d/d"
          	        	}
      	}
    	},
    	{
      	"term": {
        	"lang.keyword": {
          	"value": "en",
          	"boost": 1
        	}
      	}
    	}
  	]
	}
  }
}

Limit the use of field data. Be careful about using fielddata=true in your mapping where the number of terms will result in a high cardinality. If you must use fielddata=true, you can also reduce the requirement of fielddata cache by limiting the requirements for fielddata for a given index using a field data frequency filter.

POST results/_search
{
  "query": {
	"bool": {
  	"must": [
    	{
      	"match": {
        	"message": {
          	"query": "brexit"
        	}
      	}
    	}
  	],
  	"filter": [
    	{
      	"range": {
        	"@timestamp": {
          	"gte": "now-10d/d"
          	        	}
      	}
    	},
    	{
      	"term": {
        	"lang.keyword": {
          	"value": "en",
          	"boost": 1
        	}
      	}
    	}
  	]
	}
  }
}

Log Context

Log “Failed to clear cache for realms [{}]” classname is NativeRoleMappingStore.java.
We extracted the following from Elasticsearch source code for those seeking an in-depth context :

                        Arrays.toString(realmNames)
                    )
                );
                listener.onResponse(result);
            }; ex -> {
                logger.warn(new ParameterizedMessage("Failed to clear cache for realms [{}]"; Arrays.toString(realmNames)); ex);
                listener.onFailure(ex);
            })
        );
    }

[ratemypost]

Failed to clear cache for realms – How to solve this Elasticsearch error

Overview

What it means

Security realms

Why it occurs

How to resolve it

Overview

How they work

Examples

Notes and good things to know

Log Context