Elasticsearch OpenSearch Data Types: A Comprehensive Guide

By Opster Team

Updated: Jun 28, 2023

| 3 min read

Introduction

OpenSearch, a community-driven, open-source search and analytics suite, is built on Apache Lucene and provides a scalable and powerful search engine. One of the key aspects of OpenSearch is its ability to handle various data types. In this article, we will delve into the different data types supported by OpenSearch and how to use them effectively. If you want to learn about OpenSearch mapping, check out this guide.

Core Data Types

OpenSearch supports several core data types, which can be broadly categorized into the following groups:

  1. Text and Keyword
  2. Numeric
  3. Date
  4. Boolean
  5. Binary
  6. Range
  7. Geo
  8. IP
  9. Specialized

Text and Keyword Data Types

Text data types are used for full-text search, where the text is analyzed and indexed into tokens. The ‘text’ data type is suitable for searching large bodies of text, such as articles or blog posts. On the other hand, the ‘keyword’ data type is used for exact value searches and is suitable for filtering, sorting, and aggregations.

Numeric Data Types

OpenSearch supports various numeric data types, including:

1. ‘integer’: A 32-bit signed integer with a minimum value of -2^31 and a maximum value of 2^31-1.

2. ‘long’: A 64-bit signed integer with a minimum value of -2^63 and a maximum value of 2^63-1.

3. ‘short’: A 16-bit signed integer with a minimum value of -2^15 and a maximum value of 2^15-1.

4. ‘byte’: An 8-bit signed integer with a minimum value of -2^7 and a maximum value of 2^7-1.

5. ‘double’: A 64-bit double-precision floating-point number.

6. ‘float’: A 32-bit single-precision floating-point number.

7. ‘half_float’: A 16-bit half-precision floating-point number.

8. ‘scaled_float’: A floating-point value that is multiplied by the double scale factor and stored as a long value.

9. ‘unsigned_long’: An unsigned 64-bit integer with a minimum value of 0 and a maximum value of 2^64 − 1.

Date Data Types

The ‘date’ and ‘date_nanos’ data types are used to store dates and times in OpenSearch. Dates can be represented in various formats, such as epoch milliseconds, ISO 8601, or custom formats. OpenSearch also supports date math expressions for date range queries.

Boolean Data Types

The ‘boolean’ data type is used to store true or false values. It is useful for filtering and aggregations based on binary conditions.

Binary Data Types

The ‘binary’ data type is used to store binary data, such as images or files, encoded as Base64 strings. It is not analyzed or indexed and should be used sparingly due to its impact on storage and performance.

Range Data Types

OpenSearch supports range data types for efficient querying of ranges. The supported range data types include:

1. ‘integer_range’: Range of 32-bit signed integers.

2. ‘long_range’: Range of 64-bit signed integers.

3. ‘float_range’: Range of 32-bit single-precision floating-point numbers.

4. ‘double_range’: Range of 64-bit double-precision floating-point numbers.

5. ‘date_range’: Range of dates.

6. ‘ip_range’: Range of IP addresses.

Geo Data Types

OpenSearch provides geo data types for storing and querying geospatial data:

1. ‘geo_point’: Stores a latitude and longitude pair as a single point.

2. ‘geo_shape’: Stores complex shapes, such as polygons or multi-polygons, for spatial queries.

IP Data Types

The ‘ip’ data type is used to store IPv4 and IPv6 addresses. It supports CIDR notation for IP range queries and aggregations.

Specialized Data Types

OpenSearch also supports specialized data types for specific use cases:

1. ‘nested’: Allows storing arrays of objects as separate documents, enabling more complex queries and aggregations.

2. ‘object’: Stores JSON objects as a single document, useful for storing hierarchical data.

3. ‘join’: Enables parent-child relationships between documents for complex data modeling.

Conclusion

Understanding the various data types supported by OpenSearch is crucial for designing efficient and effective search and analytics solutions. By choosing the appropriate data types for your use case, you can optimize storage, indexing, and querying performance, ensuring a seamless and powerful search experience.