Reducing costs of search operations has never been more critical. Optimizing your OpenSearch system can lead to significant savings in hardware expenses.
Here are 12 tips from Opster’s expert team, based on real experiences with Opster customers
How to reduce costs in OpenSearch:
- Plan data retention
Carefully adjust your ISM policies and move old data to Ultrawarm storage to reduce the amount of data stored on expensive hot nodes and use cheaper storage for infrequently accessed data. This can help reduce your hardware costs, and is especially useful for logs and time series data.
Skipping data tiers, such as going directly from hot to cold or hot to snapshot, can also be an effective strategy for reducing the cost of running OpenSearch.
Having at the same time hot, warm, and cold might be too much as data needs to move between all tiers. The line between warm and cold is very narrow and if no searchable snapshots are used, it’s usually not worth having a cold tier unless it’s really utilizing cheaper hardware.
In addition, when using hot nodes they do not need to be the most expensive hardware available. Though the newest and most advanced hardware is faster, you can continue to use regular SSDs if they are good enough for your ingest/search SLAs. - Try to stick to the latest OpenSearch versions
Every new version includes optimizations which could help reduce costs. Upgrading to a newer version can provide various performance improvements, bug fixes, and new features, which can improve the cluster’s performance and lower hardware requirements. This, in turn, can help to reduce the cost of running the cluster by avoiding the need to invest in more hardware and reducing the time spent on maintenance and troubleshooting.
- Optimize indices mappings and templates
By optimizing indices mappings and templates, you can reduce the amount of data stored and indexed, which can help to reduce storage requirements and the associated costs. For example, you can avoid indexing or storing fields that are not needed, remove complex analyzers to simplify indexing as much as possible, and use keywords instead of full-text search where applicable. Additionally, by avoiding the storage of the _source field, you can further reduce the storage requirements and costs.
- Denormalise your data
Denormalizing your data in OpenSearch can impact the cost of running the cluster by reducing the complexity of data relationships and improving query performance. In particular, you should avoid parent-type and nested relationships which create complex data structures.
Denormalization involves storing redundant data in multiple documents to avoid complex relationships between documents. This can improve query performance by reducing the number of queries needed to retrieve data and avoiding expensive join operations.
Additionally, treating OpenSearch as a relational database can create unnecessary complexity and may require more resources to execute queries. OpenSearch is designed to be a distributed search engine, and its performance and scalability benefits from a document-oriented approach that avoids the use of complex relationships between documents - Optimize queries
Avoiding certain actions in OpenSearch, such as nested queries and nested aggregations, looking back at too much data, and running aggregations over irrelevant data can impact the cost of running the cluster by reducing the amount of CPU/RAM resources needed to execute queries. Opster’s Search Gateway tracks all of these and more and automatically improves search performance.
- Optimize the shard size
Optimizing the shard size in OpenSearch can impact the cost of running the cluster by improving query and indexing speed and preventing hot spots in the cluster. A shard size should be between 10-50 GB. Shards that are too large or too small can create hot spots in the cluster, causing some nodes to work harder than others. This can slow down the querying and indexing speed, which can result in the need to add more nodes to the cluster. This is a temporary solution though, which results in additional costs.
Shard sizes can be optimized to minimize costs and leverage the power of the distributed system. - Monitor resource utilization and adjust hardware accordingly
Reduce costs by ensuring efficient resource usage and avoiding unnecessary expenses on hardware. For example, if you notice that some nodes are consistently idle or underutilized, you may be able to reduce hardware resources for those nodes without impacting performance. Make sure your disk utilization is efficient (not empty, but not too full), that CPUs and cluster load are showing that the nodes are working, and so on. If the cluster is underutilized, you can scale down the nodes’ hardware or remove some nodes.
- Remove node types not in use
Removing node types that are not in use in OpenSearch can impact the cost of running the cluster by reducing the number of nodes required and the associated hardware costs. Not every use case requires separate master nodes, ML nodes, etc. To learn more about this topic in OpenSearch, see here.
The client nodes, i.e. `ingest` and `coordinating` are overused and very often unnecessary. The `ingest` role is for heavy ingestion use cases where users are using `ingest` pipelines instead of Logstash. Beyond this, every node is a `coordinating` node. One singular case where dedicated coordinating nodes may be needed is for use cases with heavy nested bucket aggregations and even then, this is usually not a problem.
The basic cluster of 3 nodes should be everyone’s starting point for HA, but all roles should be shared until a need arises to separate. - Keep an eye on the deleted documents ratio
Using force merge with expunge deletes in OpenSearch can help to reduce the size of the index, meaning less disk space would be required to store the data, which in turn can reduce the cost of running the cluster. Eventually, this could allow for the reduction of data nodes, so long as this wouldn’t impact the reliability and availability of the data.
- Minimize inter-node communication
Inter-node communication is the process of data exchange between the nodes in an OpenSearch cluster. This communication is important for ensuring the consistency and reliability of the cluster’s data. However, excessive inter-node communication can lead to increased network traffic, potential performance issues and increased data transfer costs.
These costs can vary depending on the cloud provider or hosting service used, the volume of data transferred, and the location of the data transfer.
To minimize inter-node communication and reduce data transfer costs in OpenSearch, you can move to a Single-AZ deployment – if compliance allows and you have a different data recovery process available, deploying OpenSearch in a single availability zone can help to reduce inter-node communication and data transfer costs. This is also feasible if your cluster is not mission-critical. - Summarize older data using roll ups
The rollup APIs in OpenSearch allow you to summarize large amounts of data and store it more compactly. By “rolling up” the data into a single summary document, you can reduce the amount of storage space required to store the data. This can help to free up disk space, reduce hardware requirements, and lower the associated costs of running the cluster.
Rolling up older data can also help to improve query performance by reducing the amount of data that needs to be searched. Since rollups provide a summary of the data, they can be queried more quickly than the original data, which can help to improve query response times and reduce hardware requirements.
Once the data has been summarized and rolled up, you can archive or delete the original data to free up storage space and reduce costs. This can be particularly useful for time-series data, where older data may no longer be needed for analysis or reporting. - Apply for long-term discounts
Cloud storage providers often offer long-term discounts for customers who sign up for extended contracts. By signing up for long-term contracts, you can get significant discounts and lock in lower rates, reducing the overall cost of cloud storage services. This can be particularly useful for customers who have long-term needs for cloud storage and are confident with their preferred provider.