Defining Efficient Mapping in OpenSearch, Mapping Types & More

By Opster Team

Updated: Jun 27, 2023

| 4 min read

Quick links

Mapping in OpenSearch
Dynamic mapping VS. static mapping
Text VS. keyword
Index per document type
Prevent mapping explosion – limit settings
How to optimize your OpenSearch mapping to reduce costs
Conclusion

Mapping in OpenSearch

Mapping is the core element of index creation. Mapping acts as the skeleton structure that represents the document and the definition of each field showing how the document will be indexed or searched. Mappings are a set of key-value pairs, where the key is the field and the value is the type of the field and other parameters like index, store options.

OpenSearch doesn’t impose a strict structure for documents – any document can be stored. A document can be seen as equivalent to a row or a record in a relational database. When we index a document it creates a mapping with respect to the data. Each field in mapping can be either a metadata field or a custom field added. There are different types of mapping field types available to define various field types.

Dynamic mapping VS. static mapping

There are two different options for mapping in OpenSearch.

Dynamic mapping: In dynamic mapping, you index the document without defining the mappings. New fields will be added to the top-level. This helps to index data without defining the mapping. The following configurations will help optimize your indexing operations:

Enable date_detection and set the dynamic_date_formats.

By default, date_detection is enabled, saving the data “created_on”: “2021/01/01” will save in default format [“strict_date_optional_time”,”yyyy/MM/dd HH:mm:ss Z||yyyy/MM/dd Z”]. Use the custom format as required with

“dynamic_date_formats”: [“MM/dd/yyyy”]. Disabling will result in saving the date in text format.

Enable numeric detection, numeric_detection: true

By default numeric_detection is disabled, saving the data “experience_in_years“: “10” will be saved as text format. Enabling the numeric_detection will result in saving data either in float (Value is 1.0) or long (Value is 1).

Custom Analyzers should be tested with the Analyzer API before going into production. Standard analyzers are supported out-of-the-box for natural languages.

Coerce – indexing operations will fail if dirty values are indexed. For example, if you are trying to save a number with “10” it will either be saved as a floating-point or as a string but not as an integer. To make the field to be strict you can use coerce at an index level or field level.

You can configure a field not to be indexed if the string length is too large and surpasses the required number of characters. ignore_above can be configured according to your system requirements. You can configure a field not to be indexed if the document contains wrong data using “ignore_malformed”: true.

Dynamic mapping will produce poor results when you are indexing a document unaware. This behavior can be disabled, both at the document level and at the object level, by setting the dynamic parameter to false (to ignore new fields) or strict (to throw an exception if an unknown field is encountered). The default is true.

Static mapping: With static mapping, the mapping is defined before the document is indexed. All new fields are added using PUT Mappings API.

Text VS. keyword

Text: The field of type text will go through text analysis and then will be indexed into an inverted index. By default, OpenSearch uses a standard analyzer.

For example,

GET /test_tokenizer/_analyze?pretty
{
  "analyzer": "standard",
  "text": "The Lorem Ipsum, unknown typesetter in the 15th century"
}

Text is tokenized into 9 tokens with the following JSON format of each token, the example below shows the token:

 {
      "token" : "ipsum",  # Word from the string is created as a token and lowered 
      "start_offset" : 10, # Starting pointer position of the token 
      "end_offset" : 15, # Ending pointer position of the token
      "type" : "<ALPHANUM>", # Type of data, default is AlphaNumeric
      "position" : 2 # Word position in the string starting from 0
 }

OpenSearch allows you to create the custom analyzers with a combination of different filters, tokenizers and char-filters.

Keyword: The field of type keyword will be indexed as it is given by the client, and it will be a single string text in the inverted index.

Use Case:

The search will work for keywords with exact match queries, whereas text is used for full-text search or for auto-complete if you want to have fields. Analyzed and non-analyzed strings in the indexing and querying process will produce different results in the search.

Index per document type

In older versions of OpenSearch, every index could have multiple types per index, which led to ambiguity when the same name appeared in different mapping types that are backed by the same Lucene field internally.

Indexing the documents of different mapping types into one index, for example, customer_order and customer_reviews indexing into one Index will lead to redundancy when we have multiple reviews with the same customer_order, which leads to spread data thinly dispersed or scattered documents. So it is recommended to save one mapping type into one index. From OpenSearch version 6.0 by default index doesn’t allow multiple types per index. The better option is to always have one document type per index. The single responsibility of the index is to maintain the single document type/mapping type per index.

Prevent mapping explosion – limit settings

An unlimited number of mapping fields can lead to a mapping explosion, which will result in an Out_of_Memory exception. You should update the index mapping to limit the fields in the document so as to prevent the mapping explosion. See the recommended field configuration below.

Limit total fields

Set the total number of fields with index.mapping.total_fields.limit – the default is 1000.

Define max field depth

All fields at root, depth=1
If there is any object mapping to a field, depth=2

Set the index.mapping.depth.limit value to set the desired limit. The default value is 20.

Define max nested field limit

Nested fields are used when an array or an object needs to be indexed or queried. The best practice to define the number of nested mappings. You can set the index.mapping.nested_fields.limit value as desired. The default is 50.

Define max nested objects limit

A single document can have a maximum of 10000 nested objects by default across all the nested objects. Set the nested index.mapping.nested_objects.limit to limit the nested objects.

All of the above limits will help prevent the mapping explosion. You can limit or extend the limit as required, but observability is required if you are exceeding the default limits.

Query-time boost

Boost: To improve the relevance we can boost a particular field. Index time boost was deprecated in favor of query time boosting in Elasticsearch version 5.0.0.

How to optimize your OpenSearch mapping to reduce costs

Watch the video below to learn how to save money on your deployment by optimizing your mapping.

Conclusion

OpenSearch sets all of the defaults for the basic requirements of a deployment, allowing you to begin indexing and searching without updating any of the above mapping. To tune Elasticsearch to better suit specific needs, you should always apply the tiny tweaks needed to achieve better performance.

To optimize your system, improve your configuration and easily resolve issues, we recommend you try AutoOps for OpenSearch. AutoOps diagnoses problems by analyzing hundreds of metrics collected by a lightweight agent and offers guidance for resolving them.

Elasticsearch How to Define Efficient Mapping in OpenSearch

Quick links

Mapping in OpenSearch

Dynamic mapping VS. static mapping

Text VS. keyword

Index per document type

Prevent mapping explosion – limit settings

Limit total fields

Define max field depth

Define max nested field limit

Define max nested objects limit

Query-time boost

How to optimize your OpenSearch mapping to reduce costs

Conclusion