Elasticsearch OpenSearch Point-in-Time (PIT) API: Advanced Usage and Best Practices

By Opster Team

Updated: Nov 14, 2023

| 3 min read

Introduction

What is the OpenSearch Point-in-Time (PIT) API?

The Point-in-Time (PIT) API is a powerful feature in OpenSearch that allows users to maintain a consistent view of the data while performing search operations. This is particularly useful when dealing with large datasets or when executing multiple search requests that need to be consistent with each other.

In this article, we will discuss the advanced usage of the PIT API and share some best practices for optimizing its performance. If you want to learn about OpenSearch pagination – which technique to use depending on your use case, check out this guide.

Understanding Point-in-Time API

The PIT API works by creating a lightweight reference to the current state of the index at the time of the request. This reference, called a point-in-time ID, can be used in subsequent search requests to ensure that the results are consistent with the initial state. This is especially useful when paginating through large result sets, as it prevents the “search after” method from returning duplicate or missing documents due to changes in the index during the search process.

Creating a Point-in-Time

To create a point-in-time, you need to send a POST request to the `_pit` endpoint. The request should include the `keep_alive` parameter, which specifies the duration for which the point-in-time should be maintained. It is worth noting that the `keep_alive` duration doesn’t need to be long enough to process all the data, but just long enough until the next query. Here’s an example:

POST /my-index/_pit?keep_alive=1m

This request creates a point-in-time for the `my-index` index and keeps it alive for 5 minutes, which means you have 5 minutes until you make a second request to get the next batch of data. The response will include a `pit_id` that you can use in subsequent search requests:

{
  "id": "some_pit_id"
}

Using a Point-in-Time in Search Requests

To use the point-in-time ID in a search request, you need to include it in the request body as well as the `keep_alive` duration to extend the time to live of the point-in-time. Here’s an example:

POST /_search
{
  "pit": {
    "id": "some_pit_id",
    "keep_alive": "1m"
  },
  "query": {
    "match": {
      "field": "value"
    }
  }
}

This request searches the index using the point-in-time ID, ensuring that the results are consistent with the initial state.

Closing a Point-in-Time

When you no longer need a point-in-time, it’s important to close it to free up resources. To close a point-in-time, send a DELETE request to the `_pit` endpoint with the point-in-time `id` in the request body:

DELETE /_pit
{
  "id": "some_pit_id"
}

Best Practices for Using the PIT API

  1. Set an appropriate `keep_alive` duration: The `keep_alive` parameter determines how long the point-in-time should be maintained until the request for the next batch. Setting a longer duration can consume more resources, while setting a shorter duration may cause the point-in-time to expire before you complete your next search request. Choose a duration that balances resource usage and the expected time to your next search operation.
  1. Close the point-in-time when done: Always close the point-in-time when you no longer need it to free up resources. Failing to close a point-in-time can lead to increased resource usage and potential performance issues.
  1. Use the PIT API for large result sets: The PIT API is particularly useful when paginating through large result sets, as it ensures that the results are consistent across multiple search requests. If you’re dealing with small result sets or single search requests, you may not need to use the PIT API.
  1. Combine with the “search after” method: When paginating through large result sets, combine the PIT API with the “search after” method to efficiently retrieve subsequent pages of results. This approach avoids the performance issues associated with deep pagination using the `from` and `size` parameters.
  1. Monitor resource usage: Keep an eye on the resource usage of your OpenSearch cluster, especially when using the PIT API for long durations or with large datasets. If you notice increased resource usage or performance issues, consider adjusting the `keep_alive` duration or closing unused point-in-time references.

Conclusion

The OpenSearch Point-in-Time API is a powerful tool for maintaining a consistent view of your data during search operations. By following the best practices outlined in this article, you can optimize the performance of your search requests and ensure that your results are accurate and consistent. Remember to create, use, and close point-in-time references responsibly to minimize resource usage and maintain the health of your OpenSearch cluster.