Elasticsearch Understanding and Managing Elasticsearch Machine Learning Jobs

By Opster Team

Updated: Jul 23, 2023

| 2 min read

Introduction 

Elasticsearch machine learning jobs, specifically data analysis jobs, are a crucial part of the Elasticsearch ecosystem. They are used to analyze time series data and detect anomalies, trends, and other patterns. This article will delve into the intricacies of Elasticsearch jobs, how to create and manage them, and how to interpret their results.

Types of Elasticsearch Machine Learning Jobs

There are two main types of jobs in Elasticsearch: anomaly detection jobs and data frame analytics jobs.

  1. Anomaly Detection Jobs: These jobs are used to identify unusual patterns or behaviors in your time series data. They use advanced machine learning algorithms to detect anomalies that could indicate problems such as system performance degradation, unusual user behavior, or cyber threats.
  1. Data Frame Analytics Jobs: These jobs are used to perform complex analyses on your data. They can be used for tasks such as classification, regression, and outlier detection. They are particularly useful for making predictions based on your data.

Creating Elasticsearch Machine Learning Jobs

Creating an Elasticsearch job involves defining the job configuration and then starting the job. The job configuration specifies the type of job, the data to analyze, and other settings.

Here is an example of how to create an anomaly detection job:

json
PUT _ml/anomaly_detectors/cpu_usage_job
{
  "analysis_config" : {
    "bucket_span":"5m",
    "detectors":[
      {
        "detector_description":"High CPU usage",
        "function":"high_mean",
        "field_name":"cpu_usage"
      }
    ]
  },
  "data_description" : {
    "time_field":"timestamp"
  }
}

In this example, the job is configured to analyze the `cpu_usage` field in the data and detect high mean values. The `bucket_span` setting specifies that the data should be analyzed in 5-minute intervals.

Managing Elasticsearch Machine Learning Jobs

Once you have created an Elasticsearch job, you can manage it using various API endpoints. For example, you can start and stop jobs, get information about jobs, and delete jobs.

Here is an example of how to start a job:

json
POST _ml/anomaly_detectors/cpu_usage_job/_start

And here is how to stop a job:

json
POST _ml/anomaly_detectors/cpu_usage_job/_stop

Interpreting Job Results

The results of an Elasticsearch job are stored in a set of result indices. These indices contain information about the analyzed data, the detected anomalies, and the model used by the job.

You can retrieve the results of a job using the GET Records API. For example, to get the results of the `cpu_usage_job`, you would use the following command:

json
GET _ml/anomaly_detectors/cpu_usage_job/results/records

The results include information such as the time of the anomaly, the actual and expected values, and the severity of the anomaly.

Conclusion 

In conclusion, Elasticsearch jobs are a powerful tool for analyzing your data and detecting anomalies. By understanding how to create, manage, and interpret these jobs, you can gain valuable insights into your data and make more informed decisions.