Referencing S3 Credentials for YARN, MapReduce, or Spark Clients
If you have selected IAM authentication, no additional steps are needed. If you are not using IAM authentication, use one of the following three options to provide Amazon S3 credentials to clients.
- Programmatic
- Specify the credentials in the configuration for the job. This option is most useful for Spark jobs.
- Make a modified copy of the configuration files
- Make a copy of the configuration files and add the S3
credentials:
- For YARN and MapReduce jobs, copy the contents of the
/etc/hadoop/conf
directory to a local working directory under the home directory of the host where you will submit the job. For Spark jobs, copy/etc/spark/conf
to a local directory under the home directory of the host where you will submit the job. - Set the permissions for the configuration files appropriately for your environment and ensure that unauthorized users cannot access sensitive configurations in these files.
- Add the following to the
core-site.xml
file within the<configuration>
element:<property> <name>fs.s3a.access.key</name> <value>Amazon S3 Access Key</value> </property> <property> <name>fs.s3a.secret.key</name> <value>Amazon S3 Secret Key</value> </property>
- Reference these versions of the configuration
files when submitting jobs by running the following command:
-
YARN or MapReduce:
export HADOOP_CONF_DIR=path to local configuration directory
-
Spark:
export SPARK_CONF_DIR=path to local configuration directory
-
YARN or MapReduce:
- For YARN and MapReduce jobs, copy the contents of the
- Reference the managed configuration files and add AWS credentials
- This option allows you to continue to use the configuration files
managed by Cloudera Manager. If you deploy new configuration files, the new values are
included by reference in your copy of the configuration files while also maintaining a
version of the configuration that contains the Amazon S3 credentials:
- Create a local directory under your home directory.
- Copy the configuration files from
/etc/hadoop/conf
to the new directory. - Set the permissions for the configuration files appropriately for your environment.
- Edit each configuration file:
- Remove all elements within the
<configuration>
element. - Add an XML
<include>
element within the<configuration>
element to reference the configuration files managed by Cloudera Manager. For example:<include xmlns="http://www.w3.org/2001/XInclude" href="/etc/hadoop/conf/hdfs-site.xml"> <fallback /> </include>
- Remove all elements within the
- Add the following to the
core-site.xml
file within the<configuration>
element:<property> <name>fs.s3a.access.key</name> <value>Amazon S3 Access Key</value> </property> <property> <name>fs.s3a.secret.key</name> <value>Amazon S3 Secret Key</value> </property>
- Reference these versions of the configuration
files when submitting jobs by running the following command:
-
YARN or MapReduce:
export HADOOP_CONF_DIR=path to local configuration directory
-
Spark:
export SPARK_CONF_DIR=path to local configuration directory
-
YARN or MapReduce:
Examplecore-site.xml
file:<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <include xmlns="http://www.w3.org/2001/XInclude" href="/etc/hadoop/conf/core-site.xml"> <fallback /> </include> <property> <name>fs.s3a.access.key</name> <value>Amazon S3 Access Key</value> </property> <property> <name>fs.s3a.secret.key</name> <value>Amazon S3 Secret Key</value> </property> </configuration>