Quick links
Efficiently Deleting Documents in Elasticsearch
In this article, we will discuss the process of deleting documents in Elasticsearch. We will cover different methods for deleting documents, as well as best practices and considerations to ensure optimal performance and maintain the integrity of your data.
Method 1: Delete by ID
To delete a document by its ID, you can use the DELETE API. This method is useful when you know the exact ID of the document you want to delete. Here’s an example:
DELETE /index_name/_doc/document_id
Replace `index_name` with the name of your index and `document_id` with the ID of the document you want to delete.
Method 2: Delete by Query
If you want to delete multiple documents based on a specific condition, you can use the Delete By Query API. This method allows you to delete documents that match a query. Here’s an example:
POST /index_name/_delete_by_query { "query": { "match": { "field_name": "value" } } }
Replace `index_name` with the name of your index, `field_name` with the name of the field you want to filter by, and `value` with the value you want to match.
Best Practices and Considerations
1. Bulk deletion:
When deleting a large number of documents, it’s recommended to use the Delete By Query API with the `slices` parameter to improve performance. This will divide the deletion process into multiple parallel tasks. For example:
POST /index_name/_delete_by_query?conflicts=proceed&slices=5 { "query": { "range": { "timestamp": { "lt": "now-30d" } } } }
This example deletes all documents older than 30 days, using 5 slices for parallel processing. The number of slices you pick mainly depends on how many primary shards you have. The ideal case is to set `slices=auto` and let Elasticsearch decide how many slices are needed.
2. Versioning:
When using the DELETE API, Elasticsearch checks the document’s version to avoid deleting a newer version of the document. If you want to bypass this check, you can set the `version_type` parameter to `force`. However, use this option with caution as it can lead to data loss.
3. Refresh interval:
Deleting documents can cause Elasticsearch to refresh the index more frequently, which can impact performance. You can control the refresh interval by updating the index settings. For example, to set the refresh interval to 30 seconds:
PUT /index_name/_settings { "index": { "refresh_interval": "30s" } }
4. Delete index:
If you want to delete all documents in an index, it’s more efficient to delete the entire index using the DELETE API:
DELETE /index_name
This method is faster than deleting documents individually or using the Delete By Query API.
By following these best practices and considerations, you can efficiently delete documents in Elasticsearch while maintaining the performance and integrity of your data.