Elasticsearch OpenSearch Search by Relevance Using SQL and PPL

By Opster Expert Team - Gustavo

Updated: Jun 28, 2023

| 5 min read

Quick links

Introduction

OpenSearch allows you to query your data using three different query languages: 

  • DSL (Domain-Specific Language)
  • SQL (Structured Query Language)
  • PPL (Piped Processing Language) 

The Query DSL offers impressive capabilities and speed, but its complexity can make it challenging for users to quickly craft spontaneous queries and examine their data.

SQL is a widely known language, so having it available gives a lot of advantages for transitioning users. 

Having PPL available can reduce the learning curve for new users, allowing them to create queries from day one.

In this article, we will focus on SQL and PPL, showing how OpenSearch includes full-text capabilities that enable them to return documents in a way that takes account of the relevance of each one.

Preparing data

First, let’s create some documents to query. Go to DevTools and run the following:

PUT people/_bulk?refresh
{"index":{"_id":"1"}}
{"firstname":"Gustavo","lastname":"Llermaly","age":33,"gender":"M","email":"gustavo@llermaly.com","city":"Santiago","Country":"Chile"}
{"index":{"_id":"2"}}
{"firstname":"Peter","lastname":"Smith","age":42,"gender":"M","email":"peter@smith.com","city":"Valparaíso","Country":"Chile"}
{"index":{"_id":"3"}}
{"firstname":"John","lastname":"Doe","age":28,"gender":"M","email":"john@doe.com","city":"Madrid","Country":"Spain"}
{"index":{"_id":"4"}}
{"firstname":"Amanda","lastname":"Johnson","age":35,"gender":"F","email":"amanda@johnson.com","city":"Barcelona","Country":"Spain"}
{"index":{"_id":"5"}}
{"firstname":"Cinthia","lastname":"Martinez","age":29,"gender":"F","email":"cinthia@martinez.com","city":"Santiago","Country":"Chile"}

Query tools 

There are two visual tools we can use to run our SQL and PPL queries, and we will try both in this hands-on article: 

  • Query Workbench
  • SQL and PPL CLI

Query workbench

For this tool, let’s achieve the following tasks:

  • Return all the documents
  • Older than 30 years, sort by last name
  • Return the average age
  • Group by country and show counts

The Query Workbench within OpenSearch Dashboards offers a specialized interface for querying your OpenSearch data utilizing SQL or PPL syntax. This user-friendly dashboard allows you to export query results in various formats such as JSON, JDBC, CSV, and plain text.

Query workbench tab on OpenSearch Dashboards.

In the top right, we can select which language to use. 

Return all the documents

Let’s start with SQL: 

SELECT * from people; 
Query editor, query workbench.

Perfect. The SQL we all know works, and we can even download the results in different formats.

Let’s try with PPL now:

PPL tryout.

Now, let’s look at all the other tasks that we wanted to complete: conditions and sorting, return the average age, and group by country. The examples below show you how to do each of them using SQL and PPL.  

Conditions and sorting

SQL:

SELECT * FROM people WHERE age > 30
ORDER BY lastname;

PPL:

search source=people
| where age > 30
| sort lastname

Return the average age

SQL:

SELECT avg(age) from people;

PPL:

search source=people
| stats avg(age)

Group by country

SQL:

search source=people
| stats avg(age)

PPL:

search source=people
| stats count(Country) by Country;

What’s important to note is that OpenSearch queries are being triggered under the hood in all of these examples. If you click the Explain button when grouping documents, you will see the underlying “terms” aggregation:

“terms” aggregation.

SQL and PPL CLI

Now we have a better understanding of the SQL and PPL basics, let’s solve some more interesting puzzles which involve search relevance:

  • Search for “gustabo” and return the document with name “Gustavo” without using SQL %LIKE
  • Search in the fields name and lastname for “Llermaly” or “John” or “Amanda” and boost the lastname by 2 
  • Search for “Pe” using a bool_prefix query 

The SQL and PPL command line interface (CLI) is a standalone Python application which you can launch with the opensearchsql command.

To install it, you must run:

pip3 install opensearchsql

Then to start it, run:

opensearchsql https://localhost:9200 --username admin --password admin

This will use the default url:port and credentials:

url:port and credentials.

It also includes a syntax helper:

Syntax helper.

To use PPL, you must run:

opensearchsql -l ppl --username admin --password admin https://localhost:9200
PPL usage.

You can open a new terminal, or exit openseasrchsql using CTRL+D.

Now, let’s solve the tasks using the full-text search capabilities in both SQL and PPL.

Full-text search with fuzziness

Let’s start with the following task: search for “gustabo” and return the document with name “Gustavo” without using SQL %LIKE.

For this one, we will use fuzziness. Here’s how to do it in SQL and PPL.

SQL:

SELECT *
FROM people
WHERE match(firstname, "gustabo", fuzziness="AUTO");

PPL:

SOURCE=people | WHERE match(firstname, "gustabo", fuzziness="AUTO");

Note the fuzziness parameter. Without it, no results would return.

Search in multiple fields with score boosting

Now, let’s turn to the next task: search in the fields name and lastname for “Llermaly” or “John” or “Amanda” and boost the lastname by 2. 

For this one, we can use the multi_match query, which is available for both SQL and PPL: 

SQL:

SELECT firstname, lastname
FROM people
WHERE multi_match(["firstname", "lastname" ^ 2], "Llermaly John Amanda", operator="or");

PPL:

SOURCE=people | WHERE multi_match(["firstname", "lastname" ^ 2], "Llermaly John Amanda", operator="or") | fields firstname, lastname;

Note, “Llermaly” is the first result because it was a match in the boosted field (i.e., lastname), and we use | fields to only return the first name and last name.

Now, let’s look at our final task: search for “Pe” using a bool_prefix query. 

We can use a bool_prefix query to get Peter when searching for “Pe” using both SQL and PPL:

SQL:

SELECT firstname
FROM people
WHERE match_bool_prefix(firstname, 'Pe');

PPL:

source=people | where match_bool_prefix(firstname, 'Pe') | fields firstname;

Conclusion

OpenSearch provides users with three query languages: DSL, SQL, and PPL. While DSL offers great capabilities and speed, it can be complex for users to quickly craft spontaneous queries. 

SQL is a widely known language, making it easy for transitioning users, while PPL reduces the learning curve for new users. 

OpenSearch includes full-text capabilities for both SQL and PPL, allowing users to find relevant document results. The Query Workbench and CLI provide visual tools to run SQL and PPL queries. 

OpenSearch’s full-text search capabilities enable users to solve relevance-related tasks by prioritizing results based on the quality of the match. OpenSearch features that can be used to achieve that include fuzziness, bool_query, match/multi_match, and boosting. 

Overall, OpenSearch provides a user-friendly interface and many alternative ways for users to easily query, examine, and analyze their data.