Adding Cloudera Data Warehouse cluster access to S3 buckets in the same AWS account
In certain scenarios, you might need to interact with data that resides outside of the data lake S3 buckets. You can add a bucket to S3, enable access to the bucket, and then, define external tables based on the data, such as a CSV file, you put into the bucket.
The S3 bucket you add to hold the data outside your Data Lake must be in the same AWS account as your Cloudera Data Warehouse (CDW) service cluster.
- Required role: DWAdmin
- In your managed policy, locate the sid "putgetmybucketpaths" for editing.
Append resources to the resource section for the buckets you added.
For example, you added a bucket more-sales-data. To enable access to the more-sales-data bucket, you append resources to the end of the "resource" section, as shown in the last two resource names:
"Resource":[ ... "arn:aws:s3:::roohi-dl-bucket/backup/*", "arn:aws:s3:::more-sales-data", "arn:aws:s3:::more-sales-data/*" ],
Click Review policy in the lower right corner of the page,
and then click Save changes.
You can now access the more-sales-data bucket outside your data lake from Hue in your CDW service cluster. For example, you can create external Hive tables that point to the bucket, and join those external tables with tables already in your data lake. You can govern CDW user access to this external S3 bucket using Ranger Hadoop SQL Policies.