Replication of data on-premise to on-premise in HDFS
You must create a replication policy that specifies the data to replicate, the replication schedule, and other settings.
You must have Infra Admin or DLM Admin role to perform this set of tasks.
- Pair clusters for replication. Select the two clusters to use for replication and pair them, so they can communicate with each other. For more information, see Pair clusters for replication.
- Create a replication policy.
- Select Policies and click Add Policy. By default, HDFS is selected as the service in the Create Replication Policy page.
- Enter the replication policy name and description.
- Click SELECT SOURCE and select type and source cluster from the drop-down.
- Provide the data replication folder path and click SELECT DESTINATION.
Select the destination type from the drop-down.
You must select another cluster available in the DLM App as your destination.ImportantIf the target dataset is non-empty, a warning message appears - Target dataset directory /xxxx/xxx is not empty. You can proceed by selecting the supressWarnings check-box. Opting to select the check-box overwrites the target location, considering the conflict resolution between HDFS location and Hive External Table base location directory.
- Select the path and click VALIDATE.
- Once the validation is successful, click SCHEDULE.
- Configure the job settings for the replication policy.
- Click ADVANCED SETTINGS to set up the policy queue.
Click CREATE POLICY.
The data replication process is enabled.
View job status from the policies page. Verify that the job starts and runs as expected.