Cloud Data Access
Also available as:
PDF
loading table of contents...

Protecting S3 Credentials with Credential Providers

The Hadoop credential provider framework allows secure credential providers to keep secrets outside Hadoop configuration files, storing them in encrypted files in local or Hadoop filesystems, and including them in requests.

The S3A configuration options with sensitive data (fs.s3a.secret.key, fs.s3a.access.key, and fs.s3a.session.token) can have their data saved to a binary file, with the values being read in when the S3A filesystem URL is used for data access. The reference to this credential provider is all that is passed as a direct configuration option.

To protect your credentials with credential providers:

In addition, if you are using per-bucket credentials, refer to Customizing Per-Bucket Secrets Held in Credential Files.

Creating a Credential File

You can create a credential file on any Hadoop filesystem. When you create one on HDFS or a UNIX filesystem, the permissions are automatically set to keep the file private to the reader — though as directory permissions are not touched, you should verify that the directory containing the file is readable only by the current user. For example:

hadoop credential create fs.s3a.access.key -value 123 \
    -provider jceks://hdfs@nn1.example.com:9001/user/backup/s3.jceks

hadoop credential create fs.s3a.secret.key -value 456 \
    -provider jceks://hdfs@nn1.example.com:9001/user/backup/s3.jceks

After creating the credential file, you can list it to see what entries are kept inside it. For example:

hadoop credential list -provider jceks://hdfs@nn1.example.com:9001/user/backup/s3.jceks

Listing aliases for CredentialProvider: jceks://hdfs@nn1.example.com:9001/user/backup/s3.jceks
fs.s3a.secret.key
fs.s3a.access.key

After performing these steps. credentials are ready for use.

Configuring the Hadoop Security Credential Provider Path Property

The URL to the provider must be set in the configuration property hadoop.security.credential.provider.path, either in the core-site.xml configuration file or on the command line:

Example: Setting via Configuration File

<property>
  <name>hadoop.security.credential.provider.path</name>
  <value>jceks://hdfs@nn1.example.com:9001/user/backup/s3.jceks</value>
</property>

Because this property only supplies the path to the secrets file, the configuration option itself is no longer a sensitive item.

Example: Setting via Command Line

hadoop distcp \
  -D hadoop.security.credential.provider.path=jceks://hdfs@nn1.example.com:9001/user/backup/s3.jceks \
  hdfs://nn1.example.com:9001/user/backup/007020615 s3a://glacier1/

hadoop fs \
  -D hadoop.security.credential.provider.path=jceks://hdfs@nn1.example.com:9001/user/backup/s3.jceks \
  -ls s3a://glacier1/

Because the provider path is not itself a sensitive secret, there is no risk from placing its declaration on the command line.

Once the provider is set in the Hadoop configuration, hadoop commands work exactly as if the secrets were in an XML file. For example:

hadoop distcp hdfs://nn1.example.com:9001/user/backup/007020615 s3a://glacier1/
hadoop fs -ls s3a://glacier1/
Customizing Per-Bucket Secrets Held in Credential Files

Although most properties which are set per-bucket are automatically propagated from their fs.s3a.bucket. custom entry to that of the base fs.s3a. option, supporting secrets kept in Hadoop credential files is slightly more complex: property values are kept in these files, and they cannot be dynamically patched.

Instead, callers need to create different configuration files for each bucket, setting the base secrets, then declare the path to the appropriate credential file in a bucket-specific version of the property fs.s3a.security.credential.provider.path.

Example

  1. Set base properties for fs.s3a.secret.key and fs.s3a.access.key in core-site.xml or in your job submission.

  2. Set similar properties per-bucket for a bucket called "frankfurt-1". These will override the base properties when talking to the bucket "frankfurt-1".

  3. When setting properties in a JCEKS file, you must use the base property names — even if you only intend to use them for a specific bucket.

    For example, in the JCEKS file called hdfs://users/steve/frankfurt.jceks, set the base parameters fs.s3a.secret.key, fs.s3a.access.key to your "frankfurt-1" values from step 2.

  4. Next, set the path to the JCEKS file as a per-bucket option.

    For example, fs.s3a.bucket.frankfurt-1.security.credential.provider.path should be set to hdfs://users/steve/frankfurt.jceks.

  5. When the credentials for "frankfurt-1" are set up, the property fs.s3a.bucket.frankfurt-1.security.credential.provider.path will be read, and the secrets from that file used to set the options to access the bucket.

Related Links

Using Per-Bucket Credentials to Authenticate

Credential Provider API