HDFS Replication

HDFS data can be replicated using multiple entities.

The Replication Manager submits the replication policy to the DLM Engine on the on-premise cluster. The DLM Engine later schedules replication jobs at the specified frequency.

  • At the specific frequency, DLM Engine submits a DistCp job that runs on On-premise YARN, reads data from source HDFS, and writes to destination HDFS.
  • File length and checksums are used to determine changed files and validate that the data is copied correctly.