Replicating HDFS data from On-premise to Amazon S3
You must create a new replication policy to replicate data from on-premise to Amazon S3.
- Management Console > Replication Manager > Policies and click Add Policy.
- Select HDFS as the service in the Create Replication Policy page.
- Enter the HDFS replication Policy Name and Description. Click Next.
- Select Source Cluster from the drop-down.
- Enter the value for Source Path where the source data resides.
- Enter the Source User.
- Click Next.
- The destination Type is listed as S3.
Select Cloud Credential from the drop-down.
- Provide a folder path bucket_name/path for S3 cloud storage.
Click Validate Policy.
The Replication Manager verifies the data with a status Validate Policy Source and Destination information.
Click Next to proceed to Schedule the
The replication policy schedule page provides a couple of options:
- Run Now (Default) - The replication policy is immediately submitted and processed.
- Schedule Run - The replication policy can be scheduled to run at specified time interval.
The Additional Settings page appears. On this page you can enter values for:
- YARN Queue Name
- Maximum Maps Slots
- Maximum Bandwidth
Once the newly created replication policy is successful, view the newly created replication job status from the Policies page. Verify that the job starts and runs as expected.