Elasticsearch How to Model Relationships Between Documents in OpenSearch Using Object

By Opster Expert Team

Updated: Oct 16, 2023

| 4 min read

Quick links

This article is part 1 of a 3 part series on modeling relationships between documents in OpenSearch.

Overview and background

OpenSearch is a document database, a type of No-SQL database that stores data as documents. “Documents” in OpenSearch refers to JSON documents, and since data is stored in JSON format, this database makes it simple to preserve the hierarchy and structure of complicated data. As an added perk, storing data in JSON format enables users to contain properties with different datatypes in a single document.  

Unlike SQL databases, OpenSearch does not perform join operations, which can slow down query time, and instead aims for real-time results with millisecond query response times.

OpenSearch utilizes an array of methods to define relationships between documents, including object types, nested documents, parent-child relationships, and denormalizing. 

Hierarchical objects (objects contained in other objects) are made using JSON. OpenSearch showcases a unique data type, called an object type, that is used to represent object hierarchy.

Uses of the object field type

The object field type allows users to have an object (with its own fields and values) as the field value in a document. 

For example, your event’s address field may be an object with its own fields for region, city, street, and so forth. If the same event occurs in several cities, you can program multiple addresses.

The object type is the simplest way to represent an interest group and associated events. This enables users to set field values as a single JSON object or as multiple JSON objects.

How to use object field type

JSON documents have a hierarchical structure and  may contain inner objects, which themselves may contain other inner objects, like the example below:

PUT books/_doc/1
{
  "title": "Machine Learning",
  "author": {
    "age": 30,
    "name": {
      "first": "Elie",
      "last": "John"
    }
  }
}

In this example, the books index contains documents, every document represents a book, and  documents are represented as JSON objects that contain the same properties as books. These include title and author, where the author is an inner object contained in the outer document object, having two properties, age and name. A name is an inner object contained in the author object that has two properties, first and last, representing the first and last name of the book’s author.

OpenSearch understands the fields and values of each object without knowledge of the object’s structure. 

The upper document is internally indexed as a straightforward, flat list of key-value pairs, similar to this:

{
  "title": "Machine Learning",
  "author.age": 30,
  "author.name.first": "Elie",
  "author.name.last": "John"
}

The document’s explicit mapping would be like this:

PUT books
{
  "mappings": {
    "properties": {
      "title": {
        "type": "text"
      },
      "author": {
        "properties": {
          "age": {
            "type": "float"
          },
          "name": {
            "properties": {
              "first": {
                "type": "text"
              },
              "last": {
                "type": "text"
              }
            }
          }
        }
      }
    }
  }
}

As shown above, the author field represents an inner object field and the author.name field represents an inner object field inside that field. Since the object is a default value, it is not necessary to explicitly assign the field type to the object. See below:

PUT books
{
  "mappings": {
    "properties": {
      "title": {
        "type": "text"
      },
      "author": {
        "type": "object",
        "properties": {
          "age": {
            "type": "float"
          },
          "name": {
            "type": "object",
            "properties": {
              "first": {
                "type": "text"
              },
              "last": {
                "type": "text"
              }
            }
          }
        }
      }
    }
  }
}

As long as the root object and the inner object have a one-to-one relationship, the inner object mapping discussed above will work. When books have only one “author,” this isn’t a problem, however,  if a book, in this case “Machine Learning” has two authors, and we add another book, “Artificial Intelligence” with one author, as follows, it could be problematic:

{
  "title": "Machine Learning",
  "author": [
    {
      "first_name": "John",
      "last_name": "Stefan"
    },
    {
      "first_name": "Sandy",
      "last_name": "Naily"
    }
  ]
}
{
  "title": "Artificial Intelligence",
  "author": [
    {
      "first_name": "John",
      "last_name": "Naily"
    }
  ]
}

When users search for the book written by “John Naily” using this query:

‘query: author.first_name=John AND author.last_name=Naily’

they think they will get the document for the “Artificial Intelligence” book. However, when users actually perform that query, both documents will be returned. Since OpenSearch internally flattens inner objects into a single object, the “Machine Learning” book entry will actually look like this:

{
  "title": "Machine Learning",
  "author.first_name": [
    "John",
    "Sandy"
  ],
  "author.last_name": [
    "Stefan",
    "Naily"
  ]
}

That explains why it is returned as a result. Since OpenSearch is built on a flat foundation, the documents are internally represented by flattened fields.

The object field accepts the following parameters:

  • Dynamic: this parameter determines whether or not to dynamically add additional properties to an existing object. This parameter can be set to runtime, strict, true (default value), and false.
  • Enabled: this parameter determines if the object field’s JSON value should be indexed and parsed (true by default) or entirely ignored (false).
  • Subobjects: this parameter determines the object’s ability to contain subobjects (true by default) or not (false). If not, sub-fields with dots in their names will be handled as leaves, moreover, their field names are expanded to their matching object structure.
  • Properties: this parameter represents the fields inside the object that can be of any data type, even object. New properties can be added to an existing object.

Object field type – advantages and disadvantages

Advantages:

  • These are  simple to use. In most circumstances, users don’t need to do anything extra up front to index objects because OpenSearch automatically detects them by default.
  • On objects, users may run queries and aggregations the same way they would on flat documents. This is because at the Lucene level, they are flat documents .
  • No joins are involved.

Disadvantages:

  • No boundaries between objects exist. If users require this capability, they should consider alternative approaches, such as nested, parent-child, and denormalizing, and ultimately combine them with objects if a use case calls for it.
  • The entire document will be reindexed if any object is updated.

Summary

  • Using the object field type is easy, fast, and performant. However, it is applicable only when one-to-one relationships are maintained.
  • OpenSearch does not require that the field data type be explicitly defined as an object in the mapping. When it comes across fields with hierarchical data, it dynamically sets the field’s data type to an object.
  • By giving the whole field path dot notation eg. author.name, users can do a search in the object’s field.
  • Because of how object type is indexed, objects work perfectly when users need to query just one field of the object at a time (typically one-to-one relationships), but when users need to query multiple fields (as is typically the case with one-to-many relationships), they may draw unexpected results. 
  • Utilizing the object field type limits users by  flattening inner objects instead of storing them as individual documents. The relationship between the objects indexed from an array is lost as a result of this activity, which is a drawback. However, the nested field type can be used to overcome this issue.