Opster's Search Gateway - Configuration and Customization

This is the configuration of Opster’s Search Gateway. Here you’ll be able to see an example of the configuration that will be customized to each individual user and explanations of the different terms and parameters.

The default configuration

Default.conf:
{
  opster.mclb: {
    port: 9200,
    healthChecker:{
            threadPool: 1,
            periodInMilliSeconds: 1000
        }
    route: {
      userParameter:"X-User-Id"
    }

    backends: [
      {
        id: 1,
        url: "http://localhost:9200",
                //authInfo: {
                //  type: "BASIC"
                //  credentials: {
                //    user: "shak"
                //    password: "shaked"
                //  }
                //}
        default: true
      }
    ],

  "searchGateway": {
    "heavySearchCostThreshold": 1000,
    "slowSearchTimeInMilliThreshold": 1000,
    "features": {
      "regex": {
        "cost": 100,
        "factors": {
          "HEAVY": 3,
          "MEDIUM": 2,
          "LIGHT": 1
        },
        "classifier": {
          "type": "string",
          "config": {
            "contains": {
              "cost": 10,
              "values": [
                "*",
                ".*"
              ]
            },
            "startsWith": {
              "cost": 100,
              "values": [
                ".*",
                "*"
              ]
            },
            "pattern": {
              "cost": 100,
              "values": [
                ":\\s?\\*[^\\s\"]"
              ]
            }
          },
          "thresholds": {
            "HEAVY": {
              "gte": 100
            },
            "MEDIUM": {
              "gt": 10,
              "lt": 100
            },
            "LIGHT": {
              "lte": 10
            }
          }
        }
      },
      "data": {
        "cost": 1,
        "factors": {
          "HEAVY": 3,
          "MEDIUM": 2,
          "LIGHT": 1
        },
        "classifier": {
          "type": "math",
          "config": {
            "expression": "{{size}}"
          },
          "thresholds": {
            "HEAVY": {
              "gt": 1000
            },
            "MEDIUM": {
              "gt": 100,
              "lte": 1000
            },
            "LIGHT": {
              "lte": 100
            }
          }
        }
      },
      "range": {
        "cost": 1,
        "factors": {
          "HEAVY": 1001,
          "MEDIUM": 500,
          "LIGHT": 1
        },
        "classifier": {
          "type": "math",
          "config": {
            "expression": "{{duration}}"
          },
          "thresholds": {
            "HEAVY": {
              "gte": 86400000
            },
            "MEDIUM": {
              "gt": 21600000,
              "lt": 86400000
            },
            "LIGHT": {
              "lte": 21600000
            }
          }
        }
      },
      "aggregation": {
        "cost": 100,
        "factors": {
          "HEAVY": 3,
          "MEDIUM": 2,
          "LIGHT": 1
        },
        "classifier": {
          "type": "math",
          "config": {
            "expression": "({{level}} * {{size}}) * (1 * (1 + {{hasScripts}}))"
          },
          "thresholds": {
            "HEAVY": {
              "gt": 100000
            },
            "MEDIUM": {
              "gt": 1000,
              "lte": 100000
            },
            "LIGHT": {
              "lte": 1000
            }
          }
        }
      },
      "script": {
        "cost": 100,
        "factors": {
          "HEAVY": 3
        },
        "classifier": {
          "type": "math",
          "config": {
            "expression": "{{hasScript}}"
          },
          "thresholds": {
            "HEAVY": {
              "gte": 1
            }
          }
        }
      }
    }
  }
  }
}

Explaining the default configuration

Breaking down each line of the configuration – the parameters marked with an asterisk are customized per user.


Param	Mandatory/Optional	Type	Explanation
Opster.mclb.port *	Mandatory	int	App listening port
opster.mclb.healthChecker.threadPool	Mandatory	int	The amount of threads, to monitor the backend clusters’ health. Usually, no more than 1 is needed
opster.mclb.healthChecker.periodInMilliSeconds	Mandatory	int	Health checks interval
opster.mclb.route.userParameter *	Mandatory	String	A header name, to be used to tag user prams
opster.mclb.cacheConfiguration.expensiveQueriesCacheSize	Optional	int	Expensive queries amount to cache. Default value: 1
opster.mclb.cacheConfiguration.slowQueriesCacheSize	Optional	int	Slow queries amount to cache. Default value:1
opster.mclb.cacheConfiguration.cacheFetchFromTimeInHours	Optional	int	Cache loads from time to time. Default value: Integer.MAX_VALUE
opster.mclb.cacheConfiguration.cacheBulkFetchSize	Optional	int	Cache query size param. Default value: 0 When set as default, no cache loading will occur
opster.mclb.cacheConfiguration.maxFetching	Optional	int	Scrolling amount (how many times scrolled to get more cache results). Default value: 0 When set as default, no cache loading will occur
opster.mclb.backends.id	Mandatory	int	Unique number. Indicates the ID of the user. There must at least 1 backend with an ID 1 and set to default
Opster.mclb.backends.url *	Mandatory	String	Full Elasticsearch url. For example, http://localhost:9200
opster.mclb.backends.authInfo *	Optional	Object	This section represents Elasticsearch authentication
opster.mclb.backends.authInfo.type *	Mandatory	enum	Available params: BASIC. BASIC - represents basic http authentication
opster.mclb.backends.authInfo.credentials.user *	Mandatory	String	Username for authentication
opster.mclb.backends.authInfo.credentials.password *	Mandatory	String	Password for authentication
opster.mclb.tenants	Optional	Object	This section represents the available tenants. If not set, supply all will be routed to the default backend
opster.mclb.tenants. *	Mandatory	Object	Dynamic names for the tenants
opster.mclb.tenants..patterns *	Mandatory	String	Regex pattern to identify index patterns to classify per tenant
opster.mclb.tenants..leader	Mandatory	int	Backend ID to route to by default
opster.mclb.tenants..follower	Optional	int	Backend ID to route to in case the default is not available
opster.mclb.errorHandling	Optional	Object	This section represents error handling functionality. Default is no error handling
opster.mclb.errorHandling.enabled	Mandatory	Boolean	Default is false. When set to default, there is no error handling
opster.mclb.errorHandling.kafka	Mandatory	Object	This section represents error handling by Kafka
opster.mclb.errorHandling.kafka.bootstrapServers *	Mandatory	String	Kafka server url
opster.mclb.errorHandling.kafka.groupId	Mandatory	String	Kafka group ID to identify with
opster.mclb.errorHandling.kafka.topic	Mandatory	String	Kafka topic prefix. Topics will be created if necessary with the given prefix and backend IDs

Explaining the log configuration

Logback.xml

<host>localhost</host> – The host for destination logs Elasticsearch.

<port>9200</port> – The port of destination logs Elasticsearch.

<index>opster-sg</index> – The index of destination logs Elasticsearch.

<batchSize>10</batchSize> – The log bulk size.

<batchTimeoutInMilliseconds>5000</batchTimeoutInMilliseconds> – The timeout for bulk insert.

<retry>3</retry> – The amount of bulk insert retries until log is thrown away.

<username>admin</username> – Optional – if basic authentication is needed, you can set the username here.

<password>admin</password> – Optional – if basic authentication is needed, you can set the password here.

</appender>

Explaining the search configuration

After installation, the Search_config is configured individually by each user with the help of the team according to use case and requirements.

Defining the relevant terms

Query – a single search execution.

Pattern – the query structure built from its terms and aggregations.

Expensive query – any query above ‘heavySearchCostThreshold’.

Slow query – any query above ‘slowSearchTimeInMilliThreshold’.

Heavy query – any pattern that was always expensive and slow until that point in time. This means that the first time this pattern runs below the slow query threshold it will not be considered heavy anymore.

Query – a single search execution.
Pattern – the query structure built from its terms and aggregations.
Expensive query – any query above ‘heavySearchCostThreshold’.
Slow query – any query above ‘slowSearchTimeInMilliThreshold’.
Heavy query – any pattern that was always expensive and slow until that point in time. This means that the first time this pattern runs below the slow query threshold it will not be considered heavy anymore.

Calculations of each parameter

This is how an expensive query is calculated.

There are 5 features taken into consideration:

Regex – represent query regex terms cost. Calculated by:
1. contains
2. startWith
3. pattern (regex)

The Search Gateway will only match representative regex terms. For example, a term with analyze_wildcard:false will be ignored but a prefix term with no wildcard will be calculated as leading wildcard(start with).

Range – represents the query range terms cost. Calculated by the range duration.
Data – represents the estimated amount of data that the search needs to process. The available params are:
1. size – the index size
2. shardsCount – the amount of shards needs to be queried.
3. docCount – how many documents are in the index searched
Script- represents whether the query contains script fields that are not under the aggregation.
Aggregation – represents the aggregation part in the search. The available params are:
1. level – max number of nested aggregation
2. size – estimated bucket size that will return
3. hasScripts – if the aggregation terms contains script field

Each feature has its own classifiers.

All classifiers under a feature are calculated by the formula in the expression field, summed and aggregated by threshold (threshold section under classifier section) into 3 buckets: HEAVY, MEDIUM and LIGHT.

Then each bucket has its own factor which will be multiplied by the cost field directly under the feature name.

Then all features costs are summed and compared to the ‘heavySearchCostThreshold’.

The default configuration for customization

{
    "heavySearchCostThreshold": 1000, 
    "slowSearchTimeInMilliThreshold": 1000, 
    "features": {
      "regex": {
        "cost": 100,
        "factors": {
          "HEAVY": 3,
          "MEDIUM": 2,
          "LIGHT": 1
        },
        "classifier": {
          "type": "string",
          "config": {
            "contains": {
              "cost": 10,
              "values": [
                "*",
                ".*"
              ]
            },
            "startsWith": {
              "cost": 100,
              "values": [
                ".*",
                "*"
              ]
            },
            "pattern": {
              "cost": 100,
              "values": [
                ":\\s?\\*[^\\s\"]"
              ]
            }
          },
          "thresholds": {
            "HEAVY": {
              "gte": 100
            },
            "MEDIUM": {
              "gt": 10,
              "lt": 100
            },
            "LIGHT": {
              "lte": 10
            }
          }
        }
      },
      "data": {
        "cost": 1,
        "factors": {
          "HEAVY": 3,
          "MEDIUM": 2,
          "LIGHT": 1
        },
        "classifier": {
          "type": "math",
          "config": {
            "expression": "{{size}}"
          },
          "thresholds": {
            "HEAVY": {
              "gt": 1000
            },
            "MEDIUM": {
              "gt": 100,
              "lte": 1000
            },
            "LIGHT": {
              "lte": 100
            }
          }
        }
      },
      "range": {
        "cost": 1,
        "factors": {
          "HEAVY": 1001,
          "MEDIUM": 500,
          "LIGHT": 1
        },
        "classifier": {
          "type": "math",
          "config": {
            "expression": "{{duration}}"
          },
          "thresholds": {
            "HEAVY": {
              "gte": 86400000
            },
            "MEDIUM": {
              "gt": 21600000,
              "lt": 86400000
            },
            "LIGHT": {
              "lte": 21600000
            }
          }
        }
      },
      "aggregation": {
        "cost": 100,
        "factors": {
          "HEAVY": 3,
          "MEDIUM": 2,
          "LIGHT": 1
        },
        "classifier": {
          "type": "math",
          "config": {
            "expression": "({{level}} * {{size}}) * (1 * (1 + {{hasScripts}}))"
          },
          "thresholds": {
            "HEAVY": {
              "gt": 100000
            },
            "MEDIUM": {
              "gt": 1000,
              "lte": 100000
            },
            "LIGHT": {
              "lte": 1000
            }
          }
        }
      },
      "script": {
        "cost": 100,
        "factors": {
          "HEAVY": 3
        },
        "classifier": {
          "type": "math",
          "config": {
            "expression": "{{hasScript}}"
          },
          "thresholds": {
            "HEAVY": {
              "gte": 1
            }
          }
        }
      }
    }
}

To book a demo of the Search Gateway, click here.

Search Gateway Configuration

The default configuration

Explaining the default configuration

Explaining the log configuration

Explaining the search configuration

Defining the relevant terms

Calculations of each parameter

The default configuration for customization