HDFS data migration from HDP to Cloudera

Use the DistCp tool to migrate the data.

An example DistCp command:

$ sudo -u hdfs hadoop distcp -Dfs.s3a.access.key=###########
          -Dfs.s3a.secret.key=######### -Dfs.s3a.server-side-encryption-algorithm=SSE-KMS
          -Dfs.s3a.server-side-encryption.key=arn:aws:kms:<region>:########### /data/datafiles/
          s3a://demo2-cdp-private-default-saas/cdp-saas/hdp-data/

Additional configuration:

The following AWS S3 specific configurations are recommended, apart from configuring the default settings like the numbers of mappers and bandwidth per mapper.

  1. fs.s3a.fast.upload=true, this configuration allows to speed up the data transfer
  2. Setting up s3 proxy server configurationsfs.s3a.proxy.* configs
  3. fs.s3a.connection.ssl.enabled=true, this configuration enables SSL connection to the S3 endpoint.