Cloud replication guidelines and considerations
DLM supports replication of HDFS and Hive data between underlying HDFS and AWS S3 cloud storage.
Access to AWS Services for Amazon S3
- Cloud bucket requirements
- You need a cloud bucket with user credentials that you can enter in DLM, so DLM can access the bucket.
- The bucket has to have enough space for the replicated data, and write permissions to copy the data.
- The bucket needs to support cloud storage encryption types supported by DLM (SSE-S3 & SSE-KMS).
- Authentication
- DLM supports access key and secret key authentication with AWS S3.
- Unregistered credentials in DLM are credentials associated with a cluster node that does
not have updated credentials.
An example of how this can arise is if a node was down when the credentials were changed on a bucket, and when the node is brought up it still has the old credentials.
- Impact of bucket changes
- Changes made to a bucket configuration (secret/access keys, bucket name/endpoint,
encryption type) can affect execution of the DLM policy and might require an update to DLM
cloud credentials.
Credential changes are picked up by the next run of the policy. Any policies being run when the credential changes are made could fail, but succeeding runs will pick up the changes.
- Users can delete cloud credentials, but this triggers failures of any policies based on the
deleted cloud credentials.
You must delete the DLM cloud policies associated with the deleted credentials and recreate the policies with the new credentials. You can view a list of policies associated with specific credentials on the Cloud Credentials page.
- Changes made to a bucket configuration (secret/access keys, bucket name/endpoint,
encryption type) can affect execution of the DLM policy and might require an update to DLM
cloud credentials.
- Cloud encryption
- When replicating data from cloud storage, the encryption algorithm specified by the user is used for validations on the replication policy.
- When replicating data to cloud storage, the encryption algorithm and encryption key
specified by the user are used for all the data written to the cloud storage.
This overrides any bucket level encryption set in the cloud provider.
- DLM does not allow replication of encrypted data to an unencrypted destination.