Quick links
- Background
- The main differences between cross-cluster search in Elasticsearch and OpenSearch
- Cross-cluster search in Elasticsearch
- Cross-cluster search in OpenSearch
- Summary
Background
Cross-cluster search enables users to execute a query across multiple clusters, performing one search request between one or more remote clusters. For instance, users can filter and analyze logs stored on clusters in several data centers using cross-cluster searches.
Cross-cluster search enables unified search queries throughout all the configured clusters, while allowing users to configure several remote clusters across various geographical locations and providers. Send a search request from a local cluster and receive responses from all connected remote clusters.
Cross-cluster search is possible in both Elasticsearch and OpenSearch. We’re going to review how it works in each and compare the two below.
The main differences between cross-cluster search in Elasticsearch and OpenSearch
Elasticsearch cross-cluster search | OpenSearch cross-cluster search |
---|---|
There is a remote cluster UI in Kibana to add the remote cluster to the local cluster. | There is no remote cluster UI to add the remote cluster to the coordinating cluster through the dashboard. |
Read and read cross-cluster privileges for target indices on the remote cluster, in addition, to the remote search role in the local cluster. | Users must have permission for the index indices:admin/shards/search_shards, in addition, to READ or SEARCH permissions. |
Users must have read and read_cross_cluster privileges for the remote index, in addition to the remote-search role in the local cluster. | Users must have READ or SEARCH permission for the remote index, and also indices:admin/shards/search_shards in case they use ccs_minimize_roundtrips=false |
Role and mapping are required on both clusters. | Role and mapping are only required in remote clusters. |
There are constraints on local and remote cluster versions to be able to perform a cross-cluster search. | There are constraints on local and remote cluster versions to be able to perform a cross-cluster search |
Privileges are configured after registering remote clusters. | Privileges are configured before registering remote clusters. |
Cross-cluster search in Elasticsearch
To perform a cross-cluster search in Elasticsearch, deployments should be version 6.7.x or later and be compatible. To perform a cross-cluster search, configure remote clusters in such a way that a connection between them can be made. Additionally, they must trust each other.
Prerequisites for performing cross-cluster searches in Elasticsearch
1. Remote clusters are necessary for cross-cluster searches. Ensure cross-cluster search is supported by your remote cluster configuration. Connect the local cluster to the remote cluster’s nodes using either proxy or sniff mode (default) to register a remote cluster. Next, configure privileges for cross-cluster searches.
- Sniff mode is the default connection mode. To establish a cluster in sniff mode, use a name and a list of seed nodes. Once a remote cluster is registered, up to three gateway nodes are chosen to be part of remote cluster requests, and the cluster state is obtained from one of the seed nodes. The local cluster must be able to access published addresses of the gateway node in order for this mode to work. Remote nodes’ versions have to be compatible with the cluster version that they are registered to. Any non-master-eligible node can function as a gateway node by default – keep in mind that dedicated master nodes are never chosen as gateway nodes. By setting cluster.remote.node.attr.gateway to true, users can specify which nodes serve as cluster gateway nodes.
- A remote cluster is established in proxy mode using a name and a single proxy address. When a remote cluster is registered, a customizable number of socket connections are opened to the proxy IP. These connections must be forwarded to the distant cluster through the proxy. Remote cluster nodes do not need to have reachable published addresses to operate in proxy mode. Proxy mode must be configured because it is not the default connection mode. The version of the cluster that remote nodes are registered to must be compatible with the versions of the remote nodes.
2. The remote_cluster_client node role is required for the local coordinating node.
node.roles: [ remote_cluster_client ]
3. When using sniff mode, the local coordinating node needs to be allowed to connect to both seed and gateway nodes on the remote cluster. It’s also recommended to employ gateway nodes that can act as coordinating nodes (a subset of these gateway nodes may serve as seed nodes).
4. The local coordinating node needs to connect to the configured proxy_address when using proxy mode. Connections to the remote cluster’s gateway and coordinating nodes must be routed by the proxy at this address.
5. Different security privileges are needed on local and remote clusters for cross-cluster search.
- The read and read_cross_cluster privileges for the target indices are necessary for the cross-cluster search role on the remote cluster. The authenticating user must have the run_as privilege on the remote cluster if requests will be made on behalf of other users.
POST /_security/role/remote-search { "indices": [ { "names": [ "target-indices" ], "privileges": [ "read", "read_cross_cluster" ] } ] }
- A user only needs the remote-search role with no specific privileges on the local cluster, which is the cluster used to begin the cross-cluster search. The following request is used to create a remote-search role on the local cluster:
POST /_security/role/remote-search {}
- After creating the remote-search role, create a user on the local cluster with that role. This user only has to be created on the local cluster. The remote-search role then enables users to search across both clusters.
POST /_security/user/cross-search-user { "password" : "Us3r-pA$$w0rd", "roles" : [ "remote-search" ] }
- A two-step authorization procedure governs the user’s access to data streams and indices on remote clusters when using Kibana to search across multiple clusters. The cluster to which Kibana is linked is the local cluster, and it first ascertains whether the user has permission to access remote clusters. The remote cluster then determines whether the user has access to the given data streams and indices. To set up, you need to assign Kibana users a local role with read privileges to the remote clusters’ indices to provide them access to remote clusters. In a remote cluster, specify data streams and indices as <remote_cluster_name>:<target>. A matching role must be established on the remote clusters that grants read_cross_cluster privilege with access to the relevant data streams and indices in order to give users read access to the remote data streams and indices.
Configuring remote clusters in Elasticsearch
The following is an example of the cluster update settings API request to add three remote clusters to the local cluster settings: cluster_one, cluster_two, and cluster_three. Now, those three clusters will be remote clusters for the cluster on which this API request is performed.
PUT _cluster/settings { "persistent": { "cluster": { "remote": { "cluster_one": { "seeds": [ "130.0.0.1:9302" ] }, "cluster_two": { "seeds": [ "130.0.0.1:9301" ] }, "cluster_three": { "seeds": [ "130.0.0.1:9300" ] } } } } }
Users can also configure remote clusters in Kibana starting from version 7.7.0. From stack management, select “Remote Clusters” from the side navigation, as shown in the image below. Then, add a remote cluster and specify the following properties: name, seed nodes, and node connections.
Example of cross-cluster search in Elasticsearch
To search an index on a remote cluster, prefix the index name with the remote cluster name, as shown in the example below. Clarify data streams and indices on the remote cluster as <remote_cluster_name>:<target> in the search request.
GET cluster_two:books/_search { "query": { "match": { "title": "Deep learning" } } }
The example above will search the books index on cluster_two for books with “Deep learning” titles.
To perform a search across multiple clusters, simply list cluster names and indices as shown in the example below:
GET books,ABC_cluster:books,cluster_*:books/_search { "query": { "match": { "title": "Deep learning" } } }
The example above will search the books index in the local cluster, ABC_cluster, and in all the clusters whose name starts with “cluster_” such as, cluster_one, cluster_two, and cluster_three. Wildcards can be used to name remote clusters.
All results retrieved from a remote index in the search response body will be prefixed with the remote cluster’s name in the _index field, as shown by the following result of the previous query. When a cluster name is not included in the document’s _index parameter,the document was created locally. See below:
"hits": [ { "_index": "ABC_cluster:books", "_id": "3s1CKnIBCLh5xF6i4Y2g", "_score": 4.8329377, "_source": { "title": "Deep learning", ... } }, { "_index": "books", "_id": "Mc1CKmISCLh8xF6i7Y", "_score": 4.561167, "_source": { "title": "Deep learning", ... } },
If a remote cluster included in the request either produces an error or is unavailable, the cross-cluster search will fail by default. To make a certain remote cluster optional for cross-cluster search, use the skip_unavailable cluster setting. Using the cluster update settings API request, users can change the cluster settings to set the skip_unavailable parameter to true,as shown in the example below:
PUT _cluster/settings { "persistent": { "cluster.remote.ABC_cluster.skip_unavailable": true } }
After performing the API request above, if the ABC_cluster is unavailable or disconnected through a cross-cluster search, matching documents from that cluster won’t be included in the final results returned by Elasticsearch.
When the skip_unavailable parameter is true, a cross-cluster search will skip the cluster if any of the remote cluster’s nodes are down during the search. A count of any skipped clusters is contained in the _cluster.skipped value of the response. Also, any errors that may have been returned by the remote cluster will be ignored, like those relating to unavailable shards or indices. This may involve issues with search parameters like ignore_unavailable and allow_no_indices. In addition, the allow_partial_search_results parameter and the related search.default_allow_partial_results cluster setting will be ignored when searching the remote cluster. As a result, searches on the remote cluster might only produce partial results.
Any network delays may slow down cross-cluster searches because these require sending queries to remote clusters. Cross-cluster search offers two ways to deal with network delays in order to prevent delayed searches:
- Minimize network round trips
Elasticsearch minimizes the number of network round trips between remote clusters by default. In doing so, the effect of network delays on search speed is lessened. For extensive search requests, like those that include a scroll or inner hits, Elasticsearch is unable to minimize network roundtrips.
- Don’t minimize network round trips
Elasticsearch sends numerous outgoing and incoming requests to each remote cluster for search requests containing a scroll or inner hits. By setting the ccs_minimize_roundtrips parameter to false, you can also select this option. This method, although often slower, might be effective in low latency networks.
Elasticsearch allows searches from local clusters to remote clusters that are running: the earlier minor version, the identical version, and the newer minor version in the same major version. For example, a local 7.17 cluster can search any remote 7. x cluster, such as 7.15, 7.17, and 7.18. Additionally, Elasticsearch enables searches from a local cluster running the most current minor version of a major version to a remote cluster running any minor version of the major version that comes after it. For example, a local 7.17 cluster can search any remote 8. x cluster.
Only features that are present in every searched cluster are supported. The use of unsupported features with a remote cluster will lead to unpredictable behavior. If users select an unsupported configuration, a cross-cluster search can still be successful. Elasticsearch does not test or guarantee the behavior of these types of searches.
Notes for cross-cluster search in Elasticsearch
Security needs to be enabled and configured on both local and remote clusters. An Elasticsearch superuser on the local cluster, such as the elastic user, gains complete read access to remote clusters when linking them from local clusters. Enable security on all associated clusters, and establish Transport Layer Security (TLS) on each node, at least at the transport level, in order to use cross-cluster searches in a secure manner.
Moreover, a remote cluster might theoretically be taken over by a local administrator at the operating system level with sufficient access to Elasticsearch configuration files and private keys. Make sure your security plan covers operating system-level security for both local and remote clusters.
To search across clusters, users are no longer required to use a specific cross-cluster search deployment template in Elastic Cloud. When utilizing stack version 6.7 or higher, users can configure remote clusters using any deployment template, performing searches across them. The cross-cluster search deployment template has been retired. Existing deployments produced with this template are not impacted by this modification, however, they must be switched to another template before upgrading to the 8.0 major version.
The ccs_minimize_roundtrips parameter is not supported by the vector tile search API, since it always minimizes network round trips.
When network round trips aren’t minimized, searches are carried out as if all the data were in the cluster of the coordinating node. It is recommended that cluster-level parameters that restrict searches, such as action.search.shard_count.limit, pre_filter_shard_size, and max_concurrent_shard_requests be updated. The search can be disregarded if these limitations are set too low.
Running the same version of Elasticsearch in each cluster is the simplest way to guarantee that clusters enable cross-cluster search. Users can keep an exclusive cluster for cross-cluster searches if they need to have clusters run various versions.
To search other clusters, keep this cluster updated to the most current version. If, for instance, users have both 7.17 and 8.x clusters, a separate 7.17 cluster used as the local cluster for cross-cluster searches can be used. Don’t space out each cluster with more than one minor version. This enables cross-cluster searches using any cluster as the local cluster.
While running a rolling upgrade on the local cluster, users can still search for a remote cluster. However, the gateway node of the remote cluster must support both the “upgrade from” and “upgrade to” versions of the local coordinating node.
Using different Elasticsearch versions in the same cluster once an upgrade is complete is not recommended.
Cross-cluster search in OpenSearch
In OpenSearch, a cross-cluster search enables any cluster node to conduct searches against other clusters. Cross-cluster search is not a native feature, but is supported by the security plugin.
The security plugin authenticates the user on the coordinating cluster when reaching a remote cluster from a coordinating cluster utilizing a cross-cluster search. The call, together with the authenticated user, is passed to the remote cluster when the security plugin retrieves the users backend role on the coordinating cluster. User permissions on the remote cluster are also assessed. Although the remote and coordinating cluster can have different authentication and authorization configurations, it is advised to set the same settings on both.
As shown in the example below: roles.yml file configuration, users must have the indices:admin/shards/search_shards privilege on the remote index, but only if they decide to run the search with ccs_minimize_roundtrips=false. In addition, they must also have either READ or SEARCH permissions on the queried indices present on the remote clusters. In OpenSearch, users have the ability to create roles using OpenSearch Dashboards, the REST API or the roles.yml configuration file.
artificialintelligence: cluster: - CLUSTER_TWO indices: 'artificialintelligence': '*': - READ - indices:admin/shards/search_shards
Unless users need to create new reserved or hidden users, it’s highly recommended to use OpenSearch Dashboards or the REST API to create new users, roles, and role mappings. The yaml files are for initial setup, not ongoing use. To add the previous role from OpenSearch dashboards, users have to choose Security, Roles, and Create Role. Then, give the role the name artificialintelligence, and specify the index pattern that they want to search. Add the READ and indices:admin/shards/search_shards permissions. Then, click submit.
Example of cross-cluster search in OpenSearch
In this example, cluster_two was used as the remote cluster with port 9300, and cluster_one was used as the coordinating cluster with port 9200. The goal is to search the artificialintelligence index in the remote cluster.
First, add the remote cluster IP address and name (with port 9300) for each seed node. When there’s one seed node, do the following:
curl -k -XPUT -H 'Content-Type: application/json' -u 'admin:admin' 'https://localhost:9200/_cluster/settings' -d ' { "persistent": { "cluster.remote": { "cluster_two": { "seeds": ["142.81.0.4:9300"] } } } }'
The example above will add cluster_two as a remote cluster to cluster_one.
Now, command below will search the artificialintelligence index in cluster two using the admin user:
curl -XGET -k -u 'admin:admin' 'https://localhost:9250/cluster_two:artificialintelligence/_search?pretty'
The following hits are returned from the search above, since there’s only one document in the index that has the “Data Mining” value in the title field.
{ ... "hits": [{ "_index": "cluster_two:artificialintelligence", "_id": "1", "_score": 1.0, "_source": { "Title": "Data Mining" } }] }
Notes for cross-cluster search in OpenSearch
- Users must exist in both clusters, but the role and mapping are only required in the remote cluster; in this scenario, the coordinating cluster manages authentication, while the remote cluster manages authorisation.
- Using API or Dashboards to add new users/roles is a best practice.
Summary
The table below summarizes the differences between cross-cluster searches in OpenSearch and in Elasticsearch.
Elasticsearch cross-cluster search | OpenSearch cross-cluster search |
---|---|
The cross-cluster search functionality is provided as a native feature. | The cross-cluster search functionality is provided by the security plugin. |
There is a remote cluster UI in Kibana to add the remote cluster to the local cluster. | There is NO remote cluster UI in OpenSearch Dashboards to add the remote cluster to the local cluster. The only way to do it is through the REST API. |
Users must have read and read_cross_cluster privileges for the remote index, in addition to the remote-search role in the local cluster. | Users must have READ or SEARCH permission for the remote index, and also indices:admin/shards/search_shards in case they use ccs_minimize_roundtrips=false. |
Users must have the run_as privilege on the remote cluster if requests are made on behalf of other users. Role and mapping are required on both clusters. | Users must exist in both clusters. Role and mapping are only required in remote clusters. |
There are constraints on local and remote cluster versions to be able to perform a cross-cluster search. | There are constraints on local and remote cluster versions to be able to perform a cross-cluster search. |
Privileges are configured after registering remote clusters. | Privileges are configured before registering remote clusters. |
Additional notes
Elasticsearch and OpenSearch are both powerful search and analytics engines, but Elasticsearch has several key advantages. Elasticsearch boasts a more mature and feature-rich development history, translating to a better user experience, more features, and continuous optimizations. Our testing has consistently shown that Elasticsearch delivers faster performance while using fewer compute resources than OpenSearch. Additionally, Elasticsearch’s comprehensive documentation and active community forums provide invaluable resources for troubleshooting and further optimization. Elastic, the company behind Elasticsearch, offers dedicated support, ensuring enterprise-grade reliability and performance. These factors collectively make Elasticsearch a more versatile, efficient, and dependable choice for organizations requiring sophisticated search and analytics capabilities.