Adding access to external S3 buckets for Cloudera Data Warehouse clusters on AWS
You can add and query data in S3 buckets you add to Cloudera Data Warehouse (CDW) service clusters running on AWS environments.
When you create a Virtual Warehouse in the CDW service, a cluster is created in your AWS account. This cluster has two buckets. One bucket is used for managed data and the other is used for external data. The naming convention for these two S3 buckets that are created by the CDW service is:
-
<s3-bucket-name>-<random-string>-dwx-managed
For example, if you specified the bucket name
dwx-data
when you registered your environment with Management Console, the managed data S3 bucket might be named something like:dwx-data-t8hq-dwx-managed
-
<s3-bucket-name>-<random-string>-dwx-external
Continuing the above scenario where you specified
dwx-data
as the bucket name during environment registration, the external S3 bucket might be named:dwx-data-8nhs-dwx-external
Access to these two buckets is controlled by AWS instance profiles. To access S3 buckets you add to your CDW service cluster, you must edit the instance profile to add read/write permissions to the additional buckets.