HDFS replication policy process overview
- The DLM App submits the replication policy to the DLM Engine on the destination cluster. The DLM Engine then schedules replication jobs at the specified frequency.
- At the specific frequency, DLM Engine submits a DistCp job that runs on destination YARN, reads data from source HDFS, and writes to destination HDFS.
- File length and checksums are used to determine changed files and validate that the data is copied correctly.
-
The Ranger policies for the HDFS directory are exported from source Ranger service and replicated to destination Ranger service.
DLM Engine also adds a deny policy on the destination Ranger service for the target directory so that the target is not writable.