Referencing S3 Credentials for YARN, MapReduce, or Spark Clients

If you have selected IAM authentication, no additional steps are needed. If you are not using IAM authentication, use one of the following three options to provide Amazon S3 credentials to clients.

Programmatic
Specify the credentials in the configuration for the job. This option is most useful for Spark jobs.
Make a modified copy of the configuration files
Make a copy of the configuration files and add the S3 credentials:
  1. For YARN and MapReduce jobs, copy the contents of the /etc/hadoop/conf directory to a local working directory under the home directory of the host where you will submit the job. For Spark jobs, copy /etc/spark/conf to a local directory under the home directory of the host where you will submit the job.
  2. Set the permissions for the configuration files appropriately for your environment and ensure that unauthorized users cannot access sensitive configurations in these files.
  3. Add the following to the core-site.xml file within the <configuration> element:
    <property>
        <name>fs.s3a.access.key</name>
        <value>Amazon S3 Access Key</value>
    </property>
    
    <property>
        <name>fs.s3a.secret.key</name>
        <value>Amazon S3 Secret Key</value>
    </property>
  4. Reference these versions of the configuration files when submitting jobs by running the following command:
    • YARN or MapReduce:
      export HADOOP_CONF_DIR=path to local configuration directory
    • Spark:
      export SPARK_CONF_DIR=path to local configuration directory
Reference the managed configuration files and add AWS credentials
This option allows you to continue to use the configuration files managed by Cloudera Manager. If you deploy new configuration files, the new values are included by reference in your copy of the configuration files while also maintaining a version of the configuration that contains the Amazon S3 credentials:
  1. Create a local directory under your home directory.
  2. Copy the configuration files from /etc/hadoop/conf to the new directory.
  3. Set the permissions for the configuration files appropriately for your environment.
  4. Edit each configuration file:
    1. Remove all elements within the <configuration> element.
    2. Add an XML <include> element within the <configuration> element to reference the configuration files managed by Cloudera Manager. For example:
      
      <include xmlns="http://www.w3.org/2001/XInclude"
               href="/etc/hadoop/conf/hdfs-site.xml">
         <fallback />
      </include>
  5. Add the following to the core-site.xml file within the <configuration> element:
    <property>
        <name>fs.s3a.access.key</name>
        <value>Amazon S3 Access Key</value>
    </property>
    
    <property>
        <name>fs.s3a.secret.key</name>
        <value>Amazon S3 Secret Key</value>
    </property>
  6. Reference these versions of the configuration files when submitting jobs by running the following command:
    • YARN or MapReduce:
      export HADOOP_CONF_DIR=path to local configuration directory
    • Spark:
      export SPARK_CONF_DIR=path to local configuration directory
Example core-site.xml file:
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
  <include xmlns="http://www.w3.org/2001/XInclude"
    href="/etc/hadoop/conf/core-site.xml">
    <fallback />
  </include>

  <property>
    <name>fs.s3a.access.key</name>
    <value>Amazon S3 Access Key</value>
  </property>

  <property>
    <name>fs.s3a.secret.key</name>
    <value>Amazon S3 Secret Key</value>
  </property>
</configuration>