HDFS data can be replicated using multiple entities.
The Replication Manager submits the replication policy to the DLM Engine on the on-premise cluster. The DLM Engine later schedules replication jobs at the specified frequency.
- At the specific frequency, DLM Engine submits a DistCp job that runs on On-premise YARN, reads data from source HDFS, and writes to destination HDFS.
- File length and checksums are used to determine changed files and validate that the data is copied correctly.