Configuring Oozie to Enable MapReduce Jobs To Read/Write from Amazon S3

Starting with CDH 5.9, MapReduce jobs controlled by Oozie as part of a workflow can read from and write to Amazon S3. The steps below show you how to enable this capability. Before you begin, you will need your AWS credentials (the appropriate Access key ID and Secret access key obtained from Amazon Web Services for your Amazon S3 bucket). After storing these credentials in the keystore (the JCEKS file), specify the path to this keystore in the Oozie workflow configuration.

In the steps below, replace the path/to/file with the HDFS directory where the .jceks file is located, and replace access_key_ID and secret_access_key with your AWS credentials.

  1. Create the credential store (.jceks) and add your AWS access key to it as follows:
    hadoop credential create fs.s3a.access.key -provider \
    jceks://hdfs/path/to/file.jceks -value access_key_id
    
    For example:
    hadoop credential create fs.s3a.access.key -provider \
    jceks://hdfs/user/root/awskeyfile.jceks -value AKIAIPVYH....
    
  2. Add the AWS secret to this same keystore:
    hadoop credential create fs.s3a.secret.key -provider \
    jceks://hdfs/path/to/file.jceks -value secret_access_key
    
  3. Set hadoop.security.credential.provider.path to the path of the .jceks file in Oozie's workflow.xml file in the MapReduce Action's <configuration> section so that the MapReduce framework can load the AWS credentials that give access to Amazon S3.
    <action name="S3job">
        <map-reduce>
            <job-tracker>${jobtracker}</job-tracker>
            <name-node>${namenode}</name-node>
            <configuration>
                <property>
                    <name>hadoop.security.credential.provider.path</name>
                    <value>jceks://hdfs/path/to/file.jceks</value>
                </property>  
                ....
                ....
    </action>

For more information about Amazon Web Services (AWS) credentials and Amazon S3, see How To Configure Security for Amazon S3.