Using S3 Express One Zone for data storage
You can use S3 Express One Zone (S3 Express) with CDP.
If you have additional data buckets that you would like to use with CDP and you do not need zone redundancy, you may use S3 Express buckets, for example for faster processing of temporary data.
The following limitations apply when using S3 Express buckets:
- You can only use S3 Express buckets with Data Hubs running Runtime 7.2.18 or newer. Data services do not currently support it.
- S3 Express buckets may not be used for logs and backups.
If you would like to use an S3 Express bucket for running Data Hub workloads, you should:
- Add the required permissions in AWS
- Set bucket region in core-site.xml
Permissions
In order to use an S3Express bucket, you should add the following permissions to the
DATALAKE_ADMIN_ROLE
role described in the Minimal setup for AWS cloud
storage:{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "VisualEditor2",
"Effect": "Allow",
"Action": "s3express:CreateSession",
"Resource": "arn:aws:s3express:region:account-id:bucket/base-bucket-name--azid--x-s3"
}
]
}
Set bucket region in core-site.xml
Set the AWS region of the S3Epress bucket in the core-site.xml as
follows:
<property>
<name>fs.s3a.bucket.<bucket-name>.endpoint.region</name>
<value>us-west-2</value>
</property>
The following screenshot illustrates how to perform this configuration in Cloudera
Manager: