Snapshot guidelines and considerations
Before enabling snapshots on data to be replicated, consider the following items.
- If the logged in Infra Admin user is an HDFS superuser, the source directory can be marked as snapshot enabled (snapshottable).
- For every replication, DLM Engine creates a new snapshot on the source. DistCp then compares this snapshot with the previous snapshot to determine the list of files to be copied. Note that the actual file changes are copied, and not the snapshots themselves.
- DLM Engine also creates snapshots on the destination HDFS after every replication. These snapshots are used to recover the destination HDFS state to a consistent state in case of failure.
- DLM Engine also handles retention of snapshots on both source and target, configurable through Ambari.