Configuring the Amazon S3 Connector

You can securely configure your cluster to authenticate with Amazon Simple Storage Service (S3) using the Cloudera S3 Connector Service. This configuration enables Impala queries to access data in S3 and also enables the Hue S3 Browser. Impala and Hue are automatically configured to authenticate with S3, but applications such as YARN, MapReduce, or Spark must provide their own AWS credentials when submitting jobs. You can define only one Amazon S3 service for each cluster.

Cloudera Manager stores these values securely and does not store them in world-readable locations. The credentials are masked in the Cloudera Manager Admin console, encrypted in the configurations passed to processes managed by Cloudera Manager, and redacted from the logs.

To access this storage, you define AWS Credentials in Cloudera Manager, and then you add the S3 Connector service and configure it to use the AWS credentials.

Adding AWS Credentials

Minimum Required Role: User Administrator (also provided by Full Administrator)

To connect to Amazon S3, obtain an Access Key and Secret Key from Amazon Web Services, and then add AWS credentials in Cloudera Manager. These keys should permit access to all data in S3 that you want to query with Impala or browse with Hue.

Adding the S3 Connector Service

Minimum Required Role: Cluster Administrator (also provided by Full Administrator)

To add the S3 Connector service using the Cloudera Manager Admin Console:
  1. If you have not defined AWS Credentials, add AWS credentials in Cloudera Manager.
  2. Go to the cluster where you want to add the Amazon S3 service.
  3. Click Actions > Add Service.
  4. Select S3 Connector.
  5. Click Continue.

    The Add S3 Connector Service to Cluster Name wizard displays.

    The wizard checks your configuration for compatibility with S3 and reports any issues. The wizard does not allow you to continue if you have an invalid configuration. Fix any issues, and then repeat these steps to add the S3 Connector service.

  6. Click Continue.
  7. Select previously-defined AWS credentials from the Name drop-down list.
  8. Click Continue.

    The Restart Dependent Services page displays and indicates the dependent services that need to be restarted.

  9. Select Restart Now to restart these services. You can also restart these services later. Impala and Hue will not be able to authenticate with S3 until you restart the services.
  10. Click Continue to complete the addition of the Amazon S3 service. If Restart Now is selected, the dependent services are restarted.