Configuring the Amazon S3 Connector

You can securely configure your cluster to authenticate with Amazon Simple Storage Service (S3) using the Cloudera S3 Connector Service. This configuration enables Impala queries to access data in S3 and also enables the Hue S3 Browser. Impala and Hue are automatically configured to authenticate with S3, but applications such as YARN, MapReduce, or Spark must provide their own AWS credentials when submitting jobs. You can define only one Amazon S3 service for each cluster.

Cloudera Manager stores these values securely and does not store them in world-readable locations. The credentials are masked in the Cloudera Manager Admin console, encrypted in the configurations passed to processes managed by Cloudera Manager, and redacted from the logs.

To access this storage, you define AWS Credentials in Cloudera Manager, and then you add the S3 Connector service and configure it to use the AWS credentials.

Consider using the S3Guard feature to address possible issues with the "eventual consistency" guarantee provided by Amazon for data stored in S3. To use the S3Guard feature, you provision an Amazon DynamoDb for use as an additional metadata store to improve performance and guarantee that your queries return the most current data. See Configuring and Managing S3Guard.

Adding AWS Credentials

Minimum Required Role: User Administrator (also provided by Full Administrator)

To connect to Amazon S3, obtain an Access Key and Secret Key from Amazon Web Services, and then add AWS credentials in Cloudera Manager. These keys should permit access to all data in S3 that you want to query with Impala or browse with Hue.

Adding the S3 Connector Service

Minimum Required Role: Cluster Administrator (also provided by Full Administrator)

To add the S3 Connector service using the Cloudera Manager Admin Console:
  1. If you have not defined AWS Credentials, add AWS credentials in Cloudera Manager.
  2. Go to the cluster where you want to add the Amazon S3 service.
  3. Click Actions > Add Service.
  4. Select S3 Connector.
  5. Click Continue.

    The Add S3 Connector Service to Cluster Name wizard displays.

    The wizard checks your configuration for compatibility with S3 and reports any issues. The wizard does not allow you to continue if you have an invalid configuration. Fix any issues, and then repeat these steps to add the S3 Connector service.

  6. Select a Credentials Protection Policy. (Not applicable when IAM Role-Based Authentication is used.)
    Choose one of the following:
    • Less Secure

      Credentials can be stored in plain text in some configuration files for specific services (currently Hive, Impala, and Hue) in the cluster.

      This configuration is appropriate for unsecure, single-tenant clusters that provide fine-grained access control for data stored in S3.

    • More Secure

      Cloudera Manager distributes secrets to a limited set of services (currently Impala and Hue) and enables those services to access S3. It does not distribute these credentials to any other clients or services. See S3 Credentials Security.

    Other configurations that are not sensitive, such as the S3Guard configuration, are included in the configuration of all services and clients as needed.

  7. Click Continue.
  8. Select previously-defined AWS credentials from the Name drop-down list.
  9. Click Continue.

    The Restart Dependent Services page displays and indicates the dependent services that need to be restarted.

  10. Select Restart Now to restart these services. You can also restart these services later. Impala and Hue will not be able to authenticate with S3 until you restart the services.
  11. Click Continue to complete the addition of the Amazon S3 service. If Restart Now is selected, the dependent services are restarted.