Using DistCp with Amazon S3
You can copy HDFS files to and from an Amazon S3 instance. You must provision an S3 bucket using Amazon Web Services and obtain the access key and secret key.
You can pass these credentials on the distcp
command line, or you can
reference a credential store to "hide" sensitive credentials so that they do not appear in the
console output, configuration files, or log files.
Amazon S3 block and native filesystems are supported with the
s3a://
protocol.
Example of an Amazon S3 Block Filesystem URI:
s3a://bucket_name/path/to/file
S3 credentials can be provided in a configuration file (for example,
core-site.xml
):<property>
<name>fs.s3a.access.key</name>
<value>...</value>
</property>
<property>
<name>fs.s3a.secret.key</name>
<value>...</value>
</property>
You can also enter the configurations in the Advanced Configuration
Snippet for core-site.xml
, which allows Cloudera Manager to
manage this configuration.
You can also provide the credentials on the command
line:
hadoop distcp -Dfs.s3a.access.key=... -Dfs.s3a.secret.key=... s3a://
For
example:
hadoop distcp -Dfs.s3a.access.key=myAccessKey -Dfs.s3a.secret.key=mySecretKey /user/hdfs/mydata s3a://myBucket/mydata_backup