Replicating HDFS data from On-premise to Cloud
You must create a new replication policy to replicate data from on-premise to cloud.
- Management Console > Replication Manager > Policies and click Add Policy.
- Select HDFS as the service in the Create Replication Policy page.
- Enter the HDFS replication Policy Name and Description. Click Next.
- Select Source Cluster from the drop-down.
- Enter the value for Source Path where the source data resides.
- Enter the Source User.
- Click Next.
- The destination Type is listed as S3 or ABFS.
Select Cloud Credential from the drop-down.
Provide a folder path bucket_name/path for
S3 cloud storage.
When you select ABFS as your target cloud storage, you must provide the storage container and the file system. For example:
Click Validate Policy.
The Replication Manager verifies the data with a status Validate Policy Source and Destination information.
Click Next to proceed to Schedule the
The replication policy schedule page provides a couple of options:
- Run Now (Default) - The replication policy is immediately submitted and processed.
- Schedule Run - The replication policy can be scheduled to run at specified time interval.
The Additional Settings page appears. On this page you can enter values for:
- YARN Queue Name
- Maximum Maps Slots
- Maximum Bandwidth
Once the newly created replication policy is successful, view the newly created replication job status from the Policies page. Verify that the job starts and runs as expected.