Replicating HDFS data from On-premise to Cloud

You must create a new replication policy to replicate data from on-premise to cloud.

Before you create a new replication policy, you must register cloud account with the Replication Manager service.
  1. Management Console > Replication Manager > Policies and click Add Policy.
  2. Select HDFS as the service in the Create Replication Policy page.
  3. Enter the HDFS replication Policy Name and Description. Click Next.
  4. Select Source Cluster from the drop-down.
  5. Enter the value for Source Path where the source data resides.
  6. Enter the Source User.
  7. Click Next.
  8. The destination Type is listed as S3 or ABFS.
  9. Select Cloud Credential from the drop-down.
  10. Provide a folder path bucket_name/path for S3 cloud storage.

    When you select ABFS as your target cloud storage, you must provide the storage container and the file system. For example:


  11. Click Validate Policy.
    The Replication Manager verifies the data with a status Validate Policy Source and Destination information.
  12. Click Next to proceed to Schedule the replication policy.
    The replication policy schedule page provides a couple of options:
    • Run Now (Default) - The replication policy is immediately submitted and processed.
    • Schedule Run - The replication policy can be scheduled to run at specified time interval.
  13. Click Next.
    The Additional Settings page appears. On this page you can enter values for:
    • YARN Queue Name
    • Maximum Maps Slots
    • Maximum Bandwidth
  14. Click Create.
    Once the newly created replication policy is successful, view the newly created replication job status from the Policies page. Verify that the job starts and runs as expected.