Using a credential provider to secure S3 credentials
You can run the distcp
command without having to enter the access key
and secret key on the command line. This prevents these credentials from being exposed in
console output, log files, configuration files, and other artifacts.
Running the
distcp
command in this way requires that you provision a
credential store to securely store the access key and secret key. The credential store file
is saved in HDFS. After completing these steps, you can run the
distcp
command using the following syntax:
hadoop distcp source_path s3a://destination_path
You can also reference the credential store on the command line,
without having to enter it in a copy of the
core-site.xml
file. You also do not need to set a
value for HADOOP_CONF_DIR
. Use the following
syntax:hadoop distcp source_path s3a://bucket_name/destination_path
-Dhadoop.security.credential.provider.path=jceks://hdfspath_to_credential_store_file