Elasticsearch Elasticsearch Wildcard Search on Multiple Fields

By Opster Team

Updated: Jul 23, 2023

| 2 min read

Introduction

Elasticsearch provides a powerful and flexible search capability, allowing users to search for data across multiple fields using wildcard queries. Wildcard queries enable users to search for partial matches in the indexed data by using wildcard characters such as ‘*’ and ‘?’. In this article, we will discuss how to perform wildcard searches on multiple fields in Elasticsearch. If you want to learn about the misuse of wildcards in Elasticsearch, check out this guide. You should also take a look at this guide, which contains a detailed explanation of Elasticsearch wildcard queries.

Using the Query String Query with Wildcard Fields

A way to perform a wildcard search on multiple fields is by using the query_string query. The query_string query allows you to define a query using a simple syntax, which is similar to the Lucene query syntax. Here’s an example of how to use the query string query with wildcard fields:

GET /_search
{
  "query": {
    "query_string": {
      "query": "first_name:joh* OR last_name:joh*"
    }
  }
}

In this example, we are searching for documents where the `first_name` or `last_name` fields contain a term that starts with “joh”. The ‘*’ wildcard character is used to match any sequence of characters, and the ‘OR’ operator is used to combine the two wildcard queries.

Using the Simple Query String Query with Wildcard Fields

The simple_query_string query is a more user-friendly alternative to the query string query. It provides a simpler syntax and is more lenient with user input. Here’s an example of how to use the simple_query_string query with wildcard fields:

GET /_search
{
  "query": {
    "simple_query_string": {
      "query": "first_name:joh* | last_name:joh*",
      "default_operator": "or"
    }
  }
}

In this example, we are searching for documents where the `first_name` or `last_name` fields contain a term that starts with “joh”. The ‘*’ wildcard character is used to match any sequence of characters, and the ‘|’ operator is used to combine the two wildcard queries.

Using the Wildcard Query

The wildcard query is meant to search for wildcard values. Here’s an example of how to use the wildcard query:

GET /_search
{
  "query": {
    "bool": {
      "should": [
        {
          "wildcard": {
            "first_name": {
              "value": "joh*"
            }
          }
        },
        {
          "wildcard": {
            "last_name": {
              "value": "joh*"
            }
          }
        }
      ]
    }
  }
}

In this example, we are searching for documents where the `first_name` or `last_name` fields contain a term that starts with “joh”. The ‘*’ wildcard character is used to match any sequence of characters and can be located anywhere in the value. It is also possible to use the ‘?’ wildcard to match a single character. Beware, though, that searching for leading wildcards can be potentially expensive and put a huge burden on your cluster.

Using the Prefix Query

The prefix query is meant to specifically search for terms that start with a specific value. Here’s an example of how to use the prefix query:

GET /_search
{
  "query": {
    "bool": {
      "should": [
        {
          "prefix": {
            "first_name": {
              "value": "joh"
            }
          }
        },
        {
          "prefix": {
            "last_name": {
              "value": "joh"
            }
          }
        }
      ]
    }
  }
}

In this example, we are searching for documents where the `first_name` or `last_name` fields contain a term that starts with “joh”.

Performance Considerations

While wildcard searches can be powerful and flexible, they can also be resource-intensive, especially when used on large datasets or multiple fields. To optimize the performance of wildcard searches, consider the following best practices:

  1. Use leading wildcards sparingly: Leading wildcards (e.g., ‘*joh’) can be particularly resource-intensive, as they require Elasticsearch to scan all terms in the index. Whenever possible, avoid using leading wildcards in your queries.
  1. Limit the number of fields: Searching across a large number of fields can increase the complexity of the query and the resources required to execute it. Limit the number of fields in your wildcard queries to only those that are necessary for your use case.
  1. Use n-grams: N-grams are a technique for breaking down text into smaller, overlapping substrings. By indexing your data using n-grams, you can perform partial matching without the need for wildcard or prefix queries, which can improve search performance.
  2. Monitor and optimize your cluster: Regularly monitor the performance of your Elasticsearch cluster and make adjustments as needed to ensure optimal performance. This may include adjusting the number of shards, replicas, or hardware resources allocated to your cluster.

Conclusion 

In conclusion, Elasticsearch provides several ways to perform wildcard searches on multiple fields, including the query_string query, simple_query_string query, wildcard query and prefix query. By following best practices and optimizing your cluster, you can ensure that your wildcard searches are both powerful and efficient.