You must create a new data replication policy to replicate data from on-premise to
Amazon S3. You must setup target cluster before commencing the replication process.
Before you create a new replication policy, you must register Amazon S3 cloud account with
the DLP App. For more information, see Register cloud credentials. You must have
Infra Admin or DLM Admin role to perform
this set of tasks.
| Note |
---|
You can replicate data on-premise to Amazon S3 with a single cluster. The metastore
must be running on the cloud. There is no requirement to run the HiveServer 2 on the cloud
environment. |
-
Select Policies and click Add Policy.
Select HIVE as the service in the Create Replication
Policy page.
-
Enter the replication policy name and description.
-
Click SELECT SOURCE and choose Type,
Source Cluster, and Select Database.
-
Click SELECT DESTINATION and choose Type
and Destination Cluster.
-
Enter the Destination Database.
-
Provide the Hive External Table Base Directory path:
S3://bucket_name/path
The external table base directory path cannot be changed once the
policy is created.
-
Select Cloud Credential from the drop-down.
| Important |
---|
If the target dataset is non-empty, a warning message appears
Target dataset directory /xxxx/xxxis not empty. You can proceed by
selecting the check-box supressWarnings. Opting to select the check-box
overwrites the target location, considering the conflict resolution between HDFS
location and Hive External Table base location directory. |
-
Click VALIDATE.
-
Once the validation is successful, click SCHEDULE.
-
Configure the job settings for the replication policy.
-
Click ADVANCED SETTINGS to set up the policy queue.
-
Click CREATE POLICY.
The data replication process is enabled.
View job status from the policies page. Verify that the job starts and runs as
expected.