Using DistCp with Amazon S3
You can copy HDFS files to and from an Amazon S3 instance. You must provision an S3 bucket using Amazon Web Services and obtain the access key and secret key.
You can pass these credentials on the distcp
command line, or you can
reference a credential store to "hide" sensitive credentials so that they do not appear in the
console output, configuration files, or log files.
Amazon S3 block and native filesystems are supported with the
s3a://
protocol.
Example of an Amazon S3 Block Filesystem URI:
s3a://bucket_name/path/to/file
core-site.xml
):<property> <name>fs.s3a.access.key</name> <value>...</value> </property> <property> <name>fs.s3a.secret.key</name> <value>...</value> </property>
You can also enter the configurations in the Advanced Configuration
Snippet for core-site.xml
, which allows Cloudera Manager to
manage this configuration.
hadoop distcp -Dfs.s3a.access.key=... -Dfs.s3a.secret.key=... s3a://
hadoop distcp -Dfs.s3a.access.key=myAccessKey -Dfs.s3a.secret.key=mySecretKey /user/hdfs/mydata s3a://myBucket/mydata_backup