HDFS data migration from CDH to CDP

To create a HDFS replication policy from on-premises to the cloud account, you must register your cloud account credentials with Replication Manager service, so that Replication Manager can access your cloud storage.

Before performing HDFS replication using CDH clusters, see Working with cloud credentials.

Replication Manager supports replication of HDFS data from cluster to cloud storage. The replication policy runs on the cluster and pushes the data from cloud storage. The cluster can be an on-premises or IaaS cluster with data on local HDFS. The cluster requires HDFS, YARN, Ranger, and Knox services to perform replication.

You can schedule taking HDFS snapshots for replication in the Replication Manager. HDFS snapshots are read-only point-in-time copies of the filesystem. You can enable snapshots on the entire filesystem, or on a subtree of the filesystem. In Replication Manager, you take snapshots at a dataset level. Understanding how snapshots work and some of the benefits and costs involved can help you to decide whether or not to enable snapshots.

For more information, see HDFS snapshots.