Cloud storage buckets
Cloudera Data Warehouse is integrated with the Data Lake Storage Cloud provider storage, such as AWS S3 or Azure Storage. During Data Lake creation, CDP creates storage locations for your data, logs, and backups.
For more information about AWS storage buckets, see S3 bucket and IAM roles and policies for logs, backup, and data storage. For information about which logs are stored in which directories, see Locations of Impala log files in S3.
Typically, during cluster creation, a managed policy is created automatically by CDW and
attached to a node instance role. Alternatively, if you need different permissions than those
specified during cluster creation, you can create or modify the managed policy and attach it
to the node instance role. Whether you create the policy manually, or CDW creates the policy
automatically, the policy must specify the paths to the log, backup, and data buckets in the
Resources array of the s3readwriteownbuckets object in the managed policy
JSON.
"arn:aws:s3:::${LogBucket}/clusters",
"arn:aws:s3:::${LogBucket}/clusters/*",
"arn:aws:s3:::${LogBucket}/<Your configured log path>",
"arn:aws:s3:::${LogBucket}/<Your configured log path>/*",
"arn:aws:s3:::${BackupBucket}/<Your configured backup path>",
"arn:aws:s3:::${BackupBucket}/<Your configured backup path>/*",
"arn:aws:s3:::${DataBucket}/<Your configured data path>",
"arn:aws:s3:::${DataBucket}/<Your configured data path>/*"
"arn:aws:s3:::${DataBucket}/backup",
"arn:aws:s3:::${DataBucket}/backup/*",
You get the path and name of the bucket, which was specified during Data Lake creation. To get the paths and names of the buckets, navigate to
. Click Summary.The name and paths of your logs and backup Data Lake buckets appear: