Snapshot support in Ozone

Learn about different scenarios where you can use snapshots, the snapshot APIs that are available for use, and the snapshot architecture.

Snapshot feature for Apache Ozone object store enables you to take point-in-time consistent image of a given bucket. Snapshot feature enables you to handle various use cases, including:
  • Backup and restore

    Create hourly, daily, weekly, or monthly snapshots for backup and recovery..

  • Archival and compliance

    Take snapshots for compliance purpose and archive them..

  • Replication and disaster recovery (DR)

    Snapshots provide frozen immutable images of the bucket on the source Ozone cluster. Snapshots can be used for replicating these immutable bucket images to remote DR sites.

  • Incremental replication

    DistCp with SnapshotDiff offers an efficient way to incrementally sync up source and destination buckets.

Snapshot APIs

Snapshot feature is available through ozone fs and ozone sh CLI. This feature can also be programmatically accessed from Ozone ObjectStore Java client. The feature provides following functionalities:
  • Createan instantenous snapshot for a given bucket.
    ozone sh snapshot create [-hV] <bucket> [<snapshotName>]
  • List all snapshots of a given bucket.
    ozone sh snapshot list [-hV] <bucket>
  • Delete a specific snapshot for a given bucket.
    ozone sh snapshot delete [-hV] <bucket> <snapshotName>
  • Given two snapshots, list all the keys that are different between them.- SnapshotDiff
    ozone sh snapshot diff [-chV] [-p=<pageSize>] [-t=<continuation-token>] <bucket> <fromSnapshot> <toSnapshot>

The SnapshotDiff functionality in CLI/API is asynchronous. The first time the API is invoked, Ozone Manager (OM) starts a background thread to calculate the SnapshotDiff, and returns Retry with suggested duration for the retry operation. After the SnapshotDiff is computed, this API returns the differences in multiple pages. Within each SnapshotDiff response, OM also returns a continuation token for the client to continue from the last batch of SnapshotDiff results. This API is safe to be called multiple times for a given snapshot source and destination pair. Internally, each OM computes SnapshotDiff only once and stores it for future invocations of the same SnapshotDiff API.

Snapshot architecture

Ozone snapshot architecture leverages the fact that data blocks once written, remain immutable for their lifetime. These data blocks are reclaimed only when the object key metadata that references them, is deleted from the Ozone namespace. All of this Ozone metadata are stored on the OM nodes in the Ozone cluster. When you take a snapshot of an Ozone bucket, internally the system takes snapshot of the Ozone metadata in OM nodes. Since Ozone does not allow updates to DataNode blocks, integrity of data blocks referenced by Ozone metadata snapshot in OM nodes remains intact. Ozone key deletion service is also aware of Ozone snapshots. Key deletion service does not reclaim any key as long as it is referenced by the active object store bucket or any of its snapshot. When the snapshots are deleted, a background garbage collection service reclaims any key that is not part of any snapshot or active object store. Ozone also provides the SnapshotDiff API. Whenever a user issues a SnapshotDiff between two snapshots, it efficiently calculates all the keys that are different between these two snapshots and returns paginated SnapshotDiff list result.