Snapshots and snapshot policies

You can create HDFS, HBase, and Ozone snapshots using Replication manager in CDP Private Cloud Base for data replication. Learn what data is backed up during replication and the methods available for replication.

What HDFS, HBase, and Ozone snapshots are

HDFS, HBase, and Ozone snapshots are point-in-time backups of HBase tables, HDFS directories, and Ozone buckets respectively. You can create HDFS, HBase, or Ozone snapshots in Cloudera Manager or using the command line as required. You can also create them at regular intervals using snapshot policies in CDP Private Cloud Base Replication Manager. HDFS and Hive replication policies leverage HDFS snapshots and Ozone replication policies leverage Ozone snapshots to implement incremental data replication. You can improve the reliability of replication policies by using snapshots.

HBase snapshots for tables and Ozone snapshots for buckets are enabled by default. However, you must enable HDFS snapshots for the required HDFS directories and subdirectories in Cloudera Manager.

Replication methods used by Replication Manager

The first HDFS, Hive, or Ozone replication policy job is a bootstrap job, that is the replication policy replicates all the data in the specified HDFS directories, Hive/Impala tables, or Ozone buckets respectively. Subsequent replication jobs use one of the following methods to replicate data:

Incremental replication method
In this method, Replication Manager uses the diff report to replicate data. The snapshot diff feature uses snapshots to generate the diff report to determine the changed or new data in the chosen directories or buckets in the source cluster. This method optimizes the replication jobs by using less time and resources during replication.
Non-incremental method
Replication Manager uses this method if the snapshot diff fails. In this method, Replication Manager performs the following high-level steps:
  1. Lists all the files.
  2. Performs a checksum and metadata check on them to identify the relevant files to copy. This step depends on the advanced options you choose during the replication creation process. During this identification process, some unchanged files are skipped if they do not meet the criteria set by the chosen advanced options.
  3. Copies the identified files from the source cluster to the target cluster.

You can create snapshot policies in CDP Private Cloud Base Replication Manager that define the HDFS directories, HBase tables, or Ozone buckets to be snapshotted, the intervals to take snapshots, and the number of snapshots to retain for each snapshot interval. For example, you can create a snapshot policy that takes daily and weekly snapshots, and also specify that only seven daily snapshots and five weekly snapshots must be maintained.

Minimum Required Role:Replication Administrator (also provided by Full Administrator)