Cloud storage buckets

Cloudera Data Warehouse is integrated with the Data Lake Storage Cloud provider storage, such as AWS S3 or Azure Storage. During Data Lake creation, CDP creates storage locations for your data, logs, and backups.

For more information about AWS storage buckets, see S3 bucket and IAM roles and policies for logs, backup, and data storage. For information about which logs are stored in which directories, see Locations of Impala log files in S3.

Typically, during cluster creation, a managed policy is created automatically by CDW and attached to a node instance role. Alternatively, if you need different permissions than those specified during cluster creation, you can create or modify the managed policy and attach it to the node instance role. Whether you create the policy manually, or CDW creates the policy automatically, the policy must specify the paths to the log, backup, and data buckets in the Resources array of the s3readwriteownbuckets object in the managed policy JSON.
"arn:aws:s3:::${LogBucket}/clusters",
"arn:aws:s3:::${LogBucket}/clusters/*",
"arn:aws:s3:::${LogBucket}/<Your configured log path>",
"arn:aws:s3:::${LogBucket}/<Your configured log path>/*",
"arn:aws:s3:::${BackupBucket}/<Your configured backup path>",
"arn:aws:s3:::${BackupBucket}/<Your configured backup path>/*",
"arn:aws:s3:::${DataBucket}/<Your configured data path>",
"arn:aws:s3:::${DataBucket}/<Your configured data path>/*"
"arn:aws:s3:::${DataBucket}/backup",
"arn:aws:s3:::${DataBucket}/backup/*",

You get the path and name of the bucket, which was specified during Data Lake creation. To get the paths and names of the buckets, navigate to Environments > Data Lake. Click Summary.

The name and paths of your logs and backup Data Lake buckets appear:

One S3 bucket with a sub-directory named after your data lake such as s3a://my-bucket/my-dl is created when your Data Lake is created. This bucket is called the Storage Location Base and is intended for your data.

Node instance roles need access to data, logs, and backup buckets in your Data Lake. To configure a node instance role to access these buckets, you attach the managed policy to the node instance role.