Protecting S3 Credentials with Credential Providers
The Hadoop credential provider framework allows secure credential providers to keep secrets outside Hadoop configuration files, storing them in encrypted files in local or Hadoop filesystems, and including them in requests.
The S3A configuration options with sensitive data (fs.s3a.secret.key
,
fs.s3a.access.key
, and fs.s3a.session.token
) can have their
data saved to a binary file, with the values being read in when the S3A filesystem URL is
used for data access. The reference to this credential provider is all that is passed as a
direct configuration option.
To protect your credentials with credential providers:
In addition, if you are using per-bucket credentials, refer to Customizing Per-Bucket Secrets Held in Credential Files.
Creating a Credential File
You can create a credential file on any Hadoop filesystem. When you create one on HDFS or a UNIX filesystem, the permissions are automatically set to keep the file private to the reader — though as directory permissions are not touched, you should verify that the directory containing the file is readable only by the current user. For example:
hadoop credential create fs.s3a.access.key -value 123 \ -provider jceks://hdfs@nn1.example.com:9001/user/backup/s3.jceks hadoop credential create fs.s3a.secret.key -value 456 \ -provider jceks://hdfs@nn1.example.com:9001/user/backup/s3.jceks
After creating the credential file, you can list it to see what entries are kept inside it. For example:
hadoop credential list -provider jceks://hdfs@nn1.example.com:9001/user/backup/s3.jceks Listing aliases for CredentialProvider: jceks://hdfs@nn1.example.com:9001/user/backup/s3.jceks fs.s3a.secret.key fs.s3a.access.key
After performing these steps. credentials are ready for use.
Configuring the Hadoop Security Credential Provider Path Property
The URL to the provider must be set in the configuration property
hadoop.security.credential.provider.path
, either in the
core-site.xml
configuration file or on the command line:
Example: Setting via Configuration File
<property> <name>hadoop.security.credential.provider.path</name> <value>jceks://hdfs@nn1.example.com:9001/user/backup/s3.jceks</value> </property>
Because this property only supplies the path to the secrets file, the configuration option itself is no longer a sensitive item.
Example: Setting via Command Line
hadoop distcp \ -D hadoop.security.credential.provider.path=jceks://hdfs@nn1.example.com:9001/user/backup/s3.jceks \ hdfs://nn1.example.com:9001/user/backup/007020615 s3a://glacier1/ hadoop fs \ -D hadoop.security.credential.provider.path=jceks://hdfs@nn1.example.com:9001/user/backup/s3.jceks \ -ls s3a://glacier1/
Because the provider path is not itself a sensitive secret, there is no risk from placing its declaration on the command line.
Once the provider is set in the Hadoop configuration, hadoop commands work exactly as if the secrets were in an XML file. For example:
hadoop distcp hdfs://nn1.example.com:9001/user/backup/007020615 s3a://glacier1/ hadoop fs -ls s3a://glacier1/
Customizing Per-Bucket Secrets Held in Credential Files
Although most properties which are set per-bucket are automatically propagated from
their fs.s3a.bucket.
custom entry to that of the base fs.s3a.
option, supporting secrets kept in Hadoop credential files is slightly more complex:
property values are kept in these files, and they cannot be dynamically patched.
Instead, callers need to create different configuration files for each bucket,
setting the base secrets, then declare the path to the appropriate credential file in a
bucket-specific version of the property
fs.s3a.security.credential.provider.path
.
Example
Set base properties for
fs.s3a.secret.key
andfs.s3a.access.key
incore-site.xml
or in your job submission.Set similar properties per-bucket for a bucket called "frankfurt-1". These will override the base properties when talking to the bucket "frankfurt-1".
When setting properties in a JCEKS file, you must use the base property names — even if you only intend to use them for a specific bucket.
For example, in the JCEKS file called
hdfs://users/steve/frankfurt.jceks
, set the base parameters fs.s3a.secret.key, fs.s3a.access.key to your "frankfurt-1" values from step 2.Next, set the path to the JCEKS file as a per-bucket option.
For example, fs.s3a.bucket.frankfurt-1.security.credential.provider.path should be set to
hdfs://users/steve/frankfurt.jceks
.When the credentials for "frankfurt-1" are set up, the property
fs.s3a.bucket.frankfurt-1.security.credential.provider.path
will be read, and the secrets from that file used to set the options to access the bucket.
Related Links