Using a credential provider to secure S3 credentials
You can run the
distcp command without having to enter the access key
and secret key on the command line. This prevents these credentials from being exposed in
console output, log files, configuration files, and other artifacts.
distcpcommand in this way requires that you provision a credential store to securely store the access key and secret key. The credential store file is saved in HDFS.
- Provision the credentials by running the following commands:
hadoop credential create fs.s3a.access.key -value access_key -provider jceks://hdfs/path_to_credential_store_file hadoop credential create fs.s3a.secret.key -value secret_key -provider jceks://hdfs/path_to_credential_store_fileFor example:
hadoop credential create fs.s3a.access.key -value foobar -provider jceks://hdfs/user/alice/home/keystores/aws.jceks hadoop credential create fs.s3a.secret.key -value barfoo -provider jceks://hdfs/user/alice/home/keystores/aws.jceks
You can omit the
-valueoption and its value and the command will prompt the user to enter the value.
For more details on the
hadoop credentialcommand, see Credential Management (Apache Software Foundation).
- Copy the contents of the
/etc/hadoop/confdirectory to a working directory.
- Add the following to the
core-site.xmlfile in the working directory:
<property> <name>hadoop.security.credential.provider.path</name> <value>jceks://hdfs/path_to_credential_store_file</value> </property>
- Set the
HADOOP_CONF_DIRenvironment variable to the location of the working directory:
After completing these steps, you can run the
distcp command using the following syntax:
hadoop distcp source_path s3a://destination_path
core-site.xmlfile. You also do not need to set a value for
HADOOP_CONF_DIR. Use the following syntax:
hadoop distcp source_path s3a://bucket_name/destination_path -Dhadoop.security.credential.provider.path=jceks://hdfspath_to_credential_store_file