This is the configuration of Opster’s Search Gateway. Here you’ll be able to see an example of the configuration that will be customized to each individual user and explanations of the different terms and parameters.
The default configuration
Default.conf: { opster.mclb: { port: 9200, healthChecker:{ threadPool: 1, periodInMilliSeconds: 1000 } route: { userParameter:"X-User-Id" } backends: [ { id: 1, url: "http://localhost:9200", //authInfo: { // type: "BASIC" // credentials: { // user: "shak" // password: "shaked" // } //} default: true } ], "searchGateway": { "heavySearchCostThreshold": 1000, "slowSearchTimeInMilliThreshold": 1000, "features": { "regex": { "cost": 100, "factors": { "HEAVY": 3, "MEDIUM": 2, "LIGHT": 1 }, "classifier": { "type": "string", "config": { "contains": { "cost": 10, "values": [ "*", ".*" ] }, "startsWith": { "cost": 100, "values": [ ".*", "*" ] }, "pattern": { "cost": 100, "values": [ ":\\s?\\*[^\\s\"]" ] } }, "thresholds": { "HEAVY": { "gte": 100 }, "MEDIUM": { "gt": 10, "lt": 100 }, "LIGHT": { "lte": 10 } } } }, "data": { "cost": 1, "factors": { "HEAVY": 3, "MEDIUM": 2, "LIGHT": 1 }, "classifier": { "type": "math", "config": { "expression": "{{size}}" }, "thresholds": { "HEAVY": { "gt": 1000 }, "MEDIUM": { "gt": 100, "lte": 1000 }, "LIGHT": { "lte": 100 } } } }, "range": { "cost": 1, "factors": { "HEAVY": 1001, "MEDIUM": 500, "LIGHT": 1 }, "classifier": { "type": "math", "config": { "expression": "{{duration}}" }, "thresholds": { "HEAVY": { "gte": 86400000 }, "MEDIUM": { "gt": 21600000, "lt": 86400000 }, "LIGHT": { "lte": 21600000 } } } }, "aggregation": { "cost": 100, "factors": { "HEAVY": 3, "MEDIUM": 2, "LIGHT": 1 }, "classifier": { "type": "math", "config": { "expression": "({{level}} * {{size}}) * (1 * (1 + {{hasScripts}}))" }, "thresholds": { "HEAVY": { "gt": 100000 }, "MEDIUM": { "gt": 1000, "lte": 100000 }, "LIGHT": { "lte": 1000 } } } }, "script": { "cost": 100, "factors": { "HEAVY": 3 }, "classifier": { "type": "math", "config": { "expression": "{{hasScript}}" }, "thresholds": { "HEAVY": { "gte": 1 } } } } } } } }
Explaining the default configuration
Breaking down each line of the configuration – the parameters marked with an asterisk are customized per user.
Param | Mandatory/Optional | Type | Explanation |
Opster.mclb.port * | Mandatory | int | App listening port |
opster.mclb.healthChecker.threadPool | Mandatory | int | The amount of threads, to monitor the backend clusters’ health. Usually, no more than 1 is needed |
opster.mclb.healthChecker.periodInMilliSeconds | Mandatory | int | Health checks interval |
opster.mclb.route.userParameter * | Mandatory | String | A header name, to be used to tag user prams |
opster.mclb.cacheConfiguration.expensiveQueriesCacheSize | Optional | int | Expensive queries amount to cache. Default value: 1 |
opster.mclb.cacheConfiguration.slowQueriesCacheSize | Optional | int | Slow queries amount to cache. Default value:1 |
opster.mclb.cacheConfiguration.cacheFetchFromTimeInHours | Optional | int | Cache loads from time to time. Default value: Integer.MAX_VALUE |
opster.mclb.cacheConfiguration.cacheBulkFetchSize | Optional | int | Cache query size param. Default value: 0 When set as default, no cache loading will occur |
opster.mclb.cacheConfiguration.maxFetching | Optional | int | Scrolling amount (how many times scrolled to get more cache results). Default value: 0 When set as default, no cache loading will occur |
opster.mclb.backends.id | Mandatory | int | Unique number. Indicates the ID of the user. There must at least 1 backend with an ID 1 and set to default |
Opster.mclb.backends.url * | Mandatory | String | Full Elasticsearch url. For example, http://localhost:9200 |
opster.mclb.backends.authInfo * | Optional | Object | This section represents Elasticsearch authentication |
opster.mclb.backends.authInfo.type * | Mandatory | enum | Available params: BASIC. BASIC - represents basic http authentication |
opster.mclb.backends.authInfo.credentials.user * | Mandatory | String | Username for authentication |
opster.mclb.backends.authInfo.credentials.password * | Mandatory | String | Password for authentication |
opster.mclb.tenants | Optional | Object | This section represents the available tenants. If not set, supply all will be routed to the default backend |
opster.mclb.tenants. | Mandatory | Object | Dynamic names for the tenants |
opster.mclb.tenants. | Mandatory | String | Regex pattern to identify index patterns to classify per tenant |
opster.mclb.tenants. | Mandatory | int | Backend ID to route to by default |
opster.mclb.tenants. | Optional | int | Backend ID to route to in case the default is not available |
opster.mclb.errorHandling | Optional | Object | This section represents error handling functionality. Default is no error handling |
opster.mclb.errorHandling.enabled | Mandatory | Boolean | Default is false. When set to default, there is no error handling |
opster.mclb.errorHandling.kafka | Mandatory | Object | This section represents error handling by Kafka |
opster.mclb.errorHandling.kafka.bootstrapServers * | Mandatory | String | Kafka server url |
opster.mclb.errorHandling.kafka.groupId | Mandatory | String | Kafka group ID to identify with |
opster.mclb.errorHandling.kafka.topic | Mandatory | String | Kafka topic prefix. Topics will be created if necessary with the given prefix and backend IDs |
Explaining the log configuration
Logback.xml
<appender name=”elasticsearch” class=”com.opster.mclb.infrastructure.logs.ElasticSearchAppender”>
<protocol>http</protocol>
<host>localhost</host> – The host for destination logs Elasticsearch.
<port>9200</port> – The port of destination logs Elasticsearch.
<index>opster-sg</index> – The index of destination logs Elasticsearch.
<batchSize>10</batchSize> – The log bulk size.
<batchTimeoutInMilliseconds>5000</batchTimeoutInMilliseconds> – The timeout for bulk insert.
<retry>3</retry> – The amount of bulk insert retries until log is thrown away.
<username>admin</username> – Optional – if basic authentication is needed, you can set the username here.
<password>admin</password> – Optional – if basic authentication is needed, you can set the password here.
</appender>
Explaining the search configuration
After installation, the Search_config is configured individually by each user with the help of the team according to use case and requirements.
Defining the relevant terms
Query – a single search execution.
Pattern – the query structure built from its terms and aggregations.
Expensive query – any query above ‘heavySearchCostThreshold’.
Slow query – any query above ‘slowSearchTimeInMilliThreshold’.
Heavy query – any pattern that was always expensive and slow until that point in time. This means that the first time this pattern runs below the slow query threshold it will not be considered heavy anymore.
- Query – a single search execution.
- Pattern – the query structure built from its terms and aggregations.
- Expensive query – any query above ‘heavySearchCostThreshold’.
- Slow query – any query above ‘slowSearchTimeInMilliThreshold’.
- Heavy query – any pattern that was always expensive and slow until that point in time. This means that the first time this pattern runs below the slow query threshold it will not be considered heavy anymore.
Calculations of each parameter
This is how an expensive query is calculated.
There are 5 features taken into consideration:
- Regex – represent query regex terms cost. Calculated by:
- contains
- startWith
- pattern (regex)
The Search Gateway will only match representative regex terms. For example, a term with analyze_wildcard:false will be ignored but a prefix term with no wildcard will be calculated as leading wildcard(start with).
- Range – represents the query range terms cost. Calculated by the range duration.
- Data – represents the estimated amount of data that the search needs to process. The available params are:
- size – the index size
- shardsCount – the amount of shards needs to be queried.
- docCount – how many documents are in the index searched
- Script- represents whether the query contains script fields that are not under the aggregation.
- Aggregation – represents the aggregation part in the search. The available params are:
- level – max number of nested aggregation
- size – estimated bucket size that will return
- hasScripts – if the aggregation terms contains script field
Each feature has its own classifiers.
All classifiers under a feature are calculated by the formula in the expression field, summed and aggregated by threshold (threshold section under classifier section) into 3 buckets: HEAVY, MEDIUM and LIGHT.
Then each bucket has its own factor which will be multiplied by the cost field directly under the feature name.
Then all features costs are summed and compared to the ‘heavySearchCostThreshold’.
The default configuration for customization
{ "heavySearchCostThreshold": 1000, "slowSearchTimeInMilliThreshold": 1000, "features": { "regex": { "cost": 100, "factors": { "HEAVY": 3, "MEDIUM": 2, "LIGHT": 1 }, "classifier": { "type": "string", "config": { "contains": { "cost": 10, "values": [ "*", ".*" ] }, "startsWith": { "cost": 100, "values": [ ".*", "*" ] }, "pattern": { "cost": 100, "values": [ ":\\s?\\*[^\\s\"]" ] } }, "thresholds": { "HEAVY": { "gte": 100 }, "MEDIUM": { "gt": 10, "lt": 100 }, "LIGHT": { "lte": 10 } } } }, "data": { "cost": 1, "factors": { "HEAVY": 3, "MEDIUM": 2, "LIGHT": 1 }, "classifier": { "type": "math", "config": { "expression": "{{size}}" }, "thresholds": { "HEAVY": { "gt": 1000 }, "MEDIUM": { "gt": 100, "lte": 1000 }, "LIGHT": { "lte": 100 } } } }, "range": { "cost": 1, "factors": { "HEAVY": 1001, "MEDIUM": 500, "LIGHT": 1 }, "classifier": { "type": "math", "config": { "expression": "{{duration}}" }, "thresholds": { "HEAVY": { "gte": 86400000 }, "MEDIUM": { "gt": 21600000, "lt": 86400000 }, "LIGHT": { "lte": 21600000 } } } }, "aggregation": { "cost": 100, "factors": { "HEAVY": 3, "MEDIUM": 2, "LIGHT": 1 }, "classifier": { "type": "math", "config": { "expression": "({{level}} * {{size}}) * (1 * (1 + {{hasScripts}}))" }, "thresholds": { "HEAVY": { "gt": 100000 }, "MEDIUM": { "gt": 1000, "lte": 100000 }, "LIGHT": { "lte": 1000 } } } }, "script": { "cost": 100, "factors": { "HEAVY": 3 }, "classifier": { "type": "math", "config": { "expression": "{{hasScript}}" }, "thresholds": { "HEAVY": { "gte": 1 } } } } } }
To book a demo of the Search Gateway, click here.