Elasticsearch SLM in Elasticsearch vs Snapshot Management in OpenSearch

By Opster Expert Team - Gustavo

Updated: Feb 21, 2024

| 4 min read

Quick links

Introduction

Snapshot Lifecycle Management (SLM) in Elasticsearch, and Snapshot Management (SM) in OpenSearch are both features that fulfill the same purpose: handling the automatic creation and deletion of snapshots. 

Both SLM and SM can schedule the creation and deletion of snapshots based on elapsed time or the number of snapshots taken.

The table below outlines the similarities and differences of SLM and SM.

SLM vs. SM comparison table – similarities and differences

FeatureElasticsearch Snapshot Lifecycle Management (SLM)OpenSearch Snapshot Management (SM)
InstallationNativePlugin
SchedulingYes (cron-like syntax)Yes (cron-like syntax)
Time LimitNoYes
Retention rulesYesYes
NotificationsYesYes
Partial SnapshotsYesYes
Ignore Unavailable IndicesYesYes
Global StateYesYes
Feature StatesYesNo
Policies InfoStats APIExplain API

Installation

While Elasticsearch snapshot lifecycle management is native to Elasticsearch, in OpenSearch users need to install the Snapshot Management plugin to enable this functionality.

Time limit

The time_limit parameter in OpenSearch ensures that the snapshot process is completed within a specified duration. If the time_limit is longer than the scheduled interval for taking snapshots, the system skips subsequent snapshots until the time_limit has passed.

For instance, if the time_limit is set to 35 minutes and snapshots are scheduled to be taken every 30 minutes starting at midnight, the system will capture the snapshots at 00:00 and 01:00. However, the snapshot scheduled for 00:30 will be skipped due to the active time_limit.

Notifications

OpenSearch sends notifications if certain events occur: 

  • When snapshot is created
  • When a snapshot is deleted
  • When the creation or deletion of a snapshot fails
  • When a snapshot operation takes more than the defined time_limit

You can also set up notifications in Elasticsearch.As the outcome of all snapshot lifecycle operations is stored in a data stream, you can leverage the Alerting feature in Kibana and configure any alerts you need based on the documents in this data stream. Concretely, you can head over to Kibana > Observability > Alerts and create a new alerting rule based on the content of the .slm-history-* data stream, like you would do for creating alerts on any other index.

Feature states

Feature States are new additions from Elasticsearch to SLM.

Feature states allow users to handle system indices in a granular way, allowing them to take or restore snapshots from a subset of features. These features will be a mix of built in features, and those defined by plugins.

This could be useful because in order to improve the environment’s security, users may want to create a separate repository, which contains only certain feature states, such as security, excluding the cluster state from the main repository.

Run the following command to retrieve cluster features:

GET /_features

Example Output:

{
  "features": [
    {
      "name": "security",
      "description": "Manages configuration for Security features, such as users and roles"
    },
    {
      "name": "logstash_management",
      "description": "Enables Logstash Central Management pipeline storage"
    },
    {
      "name": "geoip",
      "description": "Manages data related to GeoIP database downloader"
    },
    {
      "name": "async_search",
      "description": "Manages results of async searches"
    },
    {
      "name": "fleet",
      "description": "Manages configuration for Fleet"
    },
    {
      "name": "enrich",
      "description": "Manages data related to Enrich policies"
    },
    {
      "name": "searchable_snapshots",
      "description": "Manages caches and configuration for searchable snapshots"
    },
    {
      "name": "tasks",
      "description": "Manages task results"
    },
    {
      "name": "machine_learning",
      "description": "Provides anomaly detection and forecasting functionality"
    },
    {
      "name": "transform",
      "description": "Manages configuration and state for transforms"
    },
    {
      "name": "watcher",
      "description": "Manages Watch definitions and state"
    },
    {
      "name": "kibana",
      "description": "Manages Kibana configuration and reports"
    }
  ]
}

In Elasticsearch, users can specify which feature states they want to save by listing them in the SLM policy creation API command (or via kibana interface)

PUT _slm/policy/my-snapshots
{
  "schedule": "0 50 2 * * ?",
  "name": "<my-snapshot-{now/d}>",
  "repository": "my_repo",
  "config": {
    "indices": "*",
    "include_global_state": true,
    "feature_states": [
      "kibana",
      "security"
    ]
  },
  "retention": {
    "expire_after": "7d",
    "min_count": 5,
    "max_count": 10
  }
}

Policies info

SLM and SM each have their own processes to retrieve the current status of a Snapshot Policy, let’s review these below.

Elasticsearch (Stats API)

With the following command:

GET /_slm/stats

Users can get global stats about SLM policies and stats per policy:

{
  "retention_runs": 1649,
  "retention_failed": 0,
  "retention_timed_out": 0,
  "retention_deletion_time": "3.7h",
  "retention_deletion_time_millis": 13439803,
  "total_snapshots_taken": 1650,
  "total_snapshots_failed": 1,
  "total_snapshots_deleted": 1550,
  "total_snapshot_deletion_failures": 0,
  "policy_stats": [
    {
      "policy": "my-snapshot-policy",
      "snapshots_taken": 1650,
      "snapshots_failed": 1,
      "snapshots_deleted": 1550,
      "snapshot_deletion_failures": 0
    }
  ]
}

You can get more information about the latest successful and failed executions by running: 

GET _slm/policy/cloud-snapshot-policy

{
  "my-snapshot-policy": {
    "version": 1,
    "modified_date": "2023-03-08T18:23:43.418Z",
    "modified_date_millis": 1678299823418,
    "policy": {
      "name": "<my-snapshot-{now/d}>",
      "schedule": "0 */30 * * * ?",
      "repository": "snapshots",
      "config": {
        "partial": true
      },
      "retention": {
        "expire_after": "259200s",
        "min_count": 10,
        "max_count": 100
      }
    },
    "last_success": {
      "snapshot_name": "my-snapshot-2023.04.12-7b63taketbumqiajdr8weg",
      "start_time_string": "2023-04-12T03:29:59.810Z",
      "start_time": 1681270199810,
      "time_string": "2023-04-12T03:30:12.310Z",
      "time": 1681270212310
    },
    "last_failure": {
      "snapshot_name": "my-snapshot-2023.03.25-enngoweiqbqo1rlosc5_bg",
      "time_string": "2023-03-25T03:00:12.051Z",
      "time": 1679713212051,
      "details": """{"type":"snapshot_exception","reason":"[snapshots:my-snapshot-2023.03.25-enngoweiqbqo1rlosc5_bg] failed to create snapshot successfully, 9 out of 95 total shards failed"}"""
    },
    "next_execution": "2023-04-12T04:00:00.000Z",
    "next_execution_millis": 1681272000000,
    "stats": {
      "policy": "my-snapshot-policy",
      "snapshots_taken": 1650,
      "snapshots_failed": 1,
      "snapshots_deleted": 1550,
      "snapshot_deletion_failures": 0
    }
  }
}

OpenSearch (Explain API)

OpenSearch (Explain API) flow.

OpenSearch exposes an explain API that focuses on the current state of the specified policies:

GET _plugins/_sm/policies/<policy_names>/_explain
{
  "policies" : [
    {
      "name" : "daily-policy",
      "creation" : {
        "current_state" : "CREATION_START",
        "trigger" : {
          "time" : 1656403200000
        }
      },
      "deletion" : {
        "current_state" : "DELETION_START",
        "trigger" : {
          "time" : 1656403200000
        }
      },
      "policy_seq_no" : 44696,
      "policy_primary_term" : 19,
      "enabled" : true
    }
  ]
}

You can get more information about a given SM policy by running: 

GET _plugins/_sm/policies/<policy_name>
{
  "_id" : "daily-policy-sm-policy",
  "_version" : 6,
  "_seq_no" : 44696,
  "_primary_term" : 19,
  "sm_policy" : {
    "name" : "daily-policy",
    "description" : "Daily snapshot policy",
    "schema_version" : 15,
    "creation" : {
      "schedule" : {
        "cron" : {
          "expression" : "0 8 * * *",
          "timezone" : "UTC"
        }
      },
      "time_limit" : "1h"
    },
    "deletion" : {
      "schedule" : {
        "cron" : {
          "expression" : "0 1 * * *",
          "timezone" : "America/Los_Angeles"
        }
      },
      "condition" : {
        "max_age" : "7d",
        "min_count" : 7,
        "max_count" : 21
      },
      "time_limit" : "1h"
    },
    "snapshot_config" : {
      "metadata" : {
        "any_key" : "any_value"
      },
      "ignore_unavailable" : "true",
      "include_global_state" : "false",
      "date_format" : "yyyy-MM-dd-HH:mm",
      "repository" : "s3-repo",
      "partial" : "true"
    },
    "schedule" : {
      "interval" : {
        "start_time" : 1656341042874,
        "period" : 1,
        "unit" : "Minutes"
      }
    },
    "enabled" : true,
    "last_updated_time" : 1656341042874,
    "enabled_time" : 1656341042874
  }
}

Conclusion

Both Snapshot Lifecycle Management (SLM) in Elasticsearch and Snapshot Management (SM) in OpenSearch serve the same primary purpose, automating the creation and deletion of snapshots. They share several common features, such as scheduling, notifications, snapshot retention, global state, and ignoring unavailable indices. However, there are key differences between the two.

OpenSearch’s Snapshot Management offers a time limit feature, ensuring the snapshot process does not exceed a specified duration. This can help avoid conflicts with subsequent snapshots in cases where the snapshot process takes longer than the scheduled interval. It is worth noting that this feature can be a blessing or a curse, as you’ll need to constantly revisit this parameter as your cluster grows, since more and more indexes will need to be snapshotted and snapshots can take longer to execute over time.

On the other hand, Elasticsearch’s Snapshot Lifecycle Management offers Feature States, allowing for more granular control over system indices when taking or restoring snapshots. This enables users to manage specific features and their associated data in a more targeted way, more so than the global state option.

Both SLM and SM have the option to track the current state of a policy, OpenSearch focuses more on the current state, while Elasticsearch’s focus is on the generated snapshots.