Using S3 Express One Zone for data storage

You can use S3 Express One Zone (S3 Express) with CDP.

If you have additional data buckets that you would like to use with CDP and you do not need zone redundancy, you may use S3 Express buckets, for example for faster processing of temporary data.

The following limitations apply when using S3 Express buckets:

  • You can only use S3 Express buckets with Data Hubs running Runtime 7.2.18 or newer. Data services do not currently support it.
  • S3 Express buckets may not be used for logs and backups.

If you would like to use an S3 Express bucket for running Data Hub workloads, you should:

  1. Add the required permissions in AWS
  2. Set bucket region in core-site.xml

Permissions

In order to use an S3Express bucket, you should add the following permissions to the DATALAKE_ADMIN_ROLE role described in the Minimal setup for AWS cloud storage:
{
	"Version": "2012-10-17",
	"Statement": [
		{
			"Sid": "VisualEditor2",
			"Effect": "Allow",
			"Action": "s3express:CreateSession",
			"Resource": "arn:aws:s3express:region:account-id:bucket/base-bucket-name--azid--x-s3"
		}
	]
}

Set bucket region in core-site.xml

Set the AWS region of the S3Epress bucket in the core-site.xml as follows:
<property>
  <name>fs.s3a.bucket.<bucket-name>.endpoint.region</name>
  <value>us-west-2</value>
</property>
The following screenshot illustrates how to perform this configuration in Cloudera Manager: