Elasticsearch Optimizing Vector Search in OpenSearch

By Opster Team

Updated: Aug 2, 2023

| 2 min read

1. Indexing Vectors in OpenSearch

To perform vector search in OpenSearch, you need to index the vector representations of your data. You can use the dense_vector field type to store vectors in OpenSearch. Here’s an example of how to create an index with a dense_vector field:

PUT /my_vector_index
{
  "mappings": {
    "properties": {
      "my_vector": {
        "type": "dense_vector",
        "dims": 128
      }
    }
  }
}

In this example, we create an index called “my_vector_index” with a dense_vector field named “my_vector” and a dimensionality of 128.

2. Indexing Documents with Vectors

After creating the index, you can index documents containing vector representations. Here’s an example of how to index a document with a vector:

PUT /my_vector_index/_doc/1
{
  "my_vector": [0.1, 0.2, 0.3, ..., 0.128]
}

In this example, we index a document with ID 1 and a vector representation in the “my_vector” field.

3. Performing Vector Search

To perform vector search in OpenSearch, you can use the script_score query, which allows you to compute a custom score for each document based on a script. You can use the cosineSimilarity or dotProduct functions to compute the similarity between vectors. Here’s an example of a vector search query using cosine similarity:

GET /my_vector_index/_search
{
  "query": {
    "script_score": {
      "query": {
        "match_all": {}
      },
      "script": {
        "source": "cosineSimilarity(params.query_vector, doc['my_vector']) + 1.0",
        "params": {
          "query_vector": [0.1, 0.2, 0.3, ..., 0.128]
        }
      }
    }
  }
}

In this example, we search for documents with similar vectors to the query vector using cosine similarity. The “+ 1.0” in the script ensures that the score is always positive.

4. Optimizing Vector Search Performance

To optimize vector search performance in OpenSearch, consider the following tips:

  • Use smaller vector dimensions: Reducing the dimensionality of your vectors can improve search performance. However, this may also affect the quality of search results. Experiment with different dimensions to find the best trade-off between performance and search quality.
  • Use filtering: If you can filter out irrelevant documents before performing the vector search, you can significantly improve performance. For example, you can use a bool query to combine a filter query with the script_score query.
  • Use pagination: Instead of returning all matching documents, use the “from” and “size” parameters to paginate the search results. This can reduce the amount of data returned and improve performance.
  • Optimize hardware resources: Ensure that your OpenSearch cluster has sufficient resources, such as CPU, memory, and disk space, to handle the vector search workload. Monitor the performance of your cluster and adjust resources as needed.
  • Use caching: OpenSearch caches the results of frequently executed queries. Make sure that the cache settings are properly configured to take advantage of this feature.

Conclusion

Vector search is a powerful technique for finding similar items in large datasets based on their vector representations. By following the steps and optimization tips outlined in this article, you can effectively perform and optimize vector search in OpenSearch. Keep in mind that the performance of vector search depends on various factors, such as the dimensionality of vectors, the size of the dataset, and the hardware resources available. Experiment with different configurations to find the best balance between search performance and search quality for your specific use case.