Adding access to external S3 buckets for Cloudera Data Warehouse clusters on AWS

You can add and query data in S3 buckets you add to Cloudera Data Warehouse (CDW) service clusters running on AWS environments.

When you create a Virtual Warehouse in the CDW service, a cluster is created in your AWS account. This cluster has two buckets. One bucket is used for managed data and the other is used for external data. The naming convention for these two S3 buckets that are created by the CDW service is:

  • <s3-bucket-name>-<random-string>-dwx-managed

    For example, if you specified the bucket name dwx-data when you registered your environment with Management Console, the managed data S3 bucket might be named something like: dwx-data-t8hq-dwx-managed

  • <s3-bucket-name>-<random-string>-dwx-external

    Continuing the above scenario where you specified dwx-data as the bucket name during environment registration, the external S3 bucket might be named: dwx-data-8nhs-dwx-external

Access to these two buckets is controlled by AWS instance profiles. To access S3 buckets you add to your CDW service cluster, you must edit the instance profile to add read/write permissions to the additional buckets.