Replicating Iceberg tables stored in OBS and FSO buckets
Learn how to replicate Iceberg tables stored in OBS and FSO buckets created using S3 gateway APIs.
-
Install the AWS CLI on only one host in the source cluster.
curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o awscliv2.zip unzip awscliv2.zip sudo ./aws/install - Install AWS CLI on only one host of the target cluster.
-
If you are using secure clusters, import the certificate from the Cloudera Manager
global truststore (cm-auto-global_truststore.jks) into the default
Java truststore (cacerts) on all the hosts of the source
cluster.
- Search for ssl.client.truststore.location and ssl.client.truststore.password in the /etc/ozone/conf.cloudera.OZONE-1/ozone-site.xml file to get the Cloudera Manager global truststore location and password.
-
Get the alias name for the cluster certificate using the
/usr/java/default/bin/keytool -list -v -keystore [*** SSL CLIENT TRUSTSTORE LOCATION ***]command.For example, the alias namecmrootca-0in the cmrootca-/root/cert_cmrootca-0/[*** REMOTE HOST NAME***] location would be available on all the Auto-TLS clusters for replication. -
Import the certificate to the Java default truststore using the
/usr/java/default/bin/keytool -importkeystore -destkeystore /usr/java/default/lib/security/cacerts -srckeystore [*** SSL.CLIENT.TRUSTSTORE.LOCATION ***] -srcalias [*** ALIAS FOUND IN PREVIOUS COMMAND ***]command.
-
If you are using secure clusters, import the certificate from the Cloudera Manager
global truststore (cm-auto-global_truststore.jks) into the default
Java truststore (cacerts) on all the hosts of the target
cluster.
- Search for ssl.client.truststore.location and ssl.client.truststore.password in the /etc/ozone/conf.cloudera.OZONE-1/ozone-site.xml file to get the Cloudera Manager global truststore location and password.
-
Get the alias name for the cluster certificate using the
/usr/java/default/bin/keytool -list -v -keystore [*** SSL.CLIENT.TRUSTSTORE.LOCATION ***]command. -
Import the certificate to the Java default truststore using the
/usr/java/default/bin/keytool -importkeystore -destkeystore /usr/java/default/lib/security/cacerts -srckeystore [*** SSL.CLIENT.TRUSTSTORE.LOCATION ***] -srcalias [*** ALIAS FOUND IN PREVIOUS COMMAND ***]command.
-
Generate and configure the S3 secrets on any one of the hosts of the source cluster to get the S3 secrets from Ozone Manager. Record the AWS access key and AWS secret key.
-
On a Kerberos-enabled cluster, run the
kinit -kt /cdep/keytabs/om.keytab omcommand: -
Search for the om.service.id property using the
cat /etc/ozone/conf.cloudera.OZONE-1/ozone-site.xmlcommand: -
Get the secret using the
ozone s3 getsecret --om-service-id=[*** OM SERVICE ID***]command.
-
On a Kerberos-enabled cluster, run the
-
Generate and configure the S3 secrets on any one of the hosts of the target cluster to
get the S3 secrets from Ozone Manager. Record the AWS access key and AWS secret key.
-
On a Kerberos-enabled cluster, run the
kinit -kt /cdep/keytabs/om.keytab omcommand: -
Search for the om.service.id property using the
cat /etc/ozone/conf.cloudera.OZONE-1/ozone-site.xmlcommand: -
Get the secret using the
ozone s3 getsecret --om-service-id=[*** OM SERVICE ID***]command.
-
On a Kerberos-enabled cluster, run the
-
Configure the AWS access key and AWS secret key on one of the source cluster hosts,
using one of the following set of commands:
- To set the credentials using environment variables, use the following
commands:
export AWS_ACCESS_KEY_ID=[*** ACCESS KEY ***] export AWS_SECRET_ACCESS_KEY=[*** SECRET ACCESS KEY ***] - To set the credentials using the AWS CLI, use the following
commands:
aws configure set aws_access_key_id [*** ACCESS KEY ***] aws configure set aws_secret_access_key [ *** SECRET ACCESS KEY ***]
- To set the credentials using environment variables, use the following
commands:
-
Configure the AWS access key and AWS secret key on one of the target cluster hosts,
using one of the following set of commands:
- To set the credentials using environment variables, use the following
commands:
export AWS_ACCESS_KEY_ID=[*** ACCESS KEY ***] export AWS_SECRET_ACCESS_KEY=[*** SECRET ACCESS KEY ***] - To set the credentials using the AWS CLI, use the following
commands:
aws configure set aws_access_key_id [*** ACCESS KEY ***] aws configure set aws_secret_access_key [ *** SECRET ACCESS KEY ***]
- To set the credentials using environment variables, use the following
commands:
-
Retrieve the S3 gateway endpoint URL from Cloudera Manager.
- Go to Cloudera Manager > Clusters > OZONE service > Instances > S3 Gateway > S3 Gateway Web UI.
- Record the endpoint URL. On a secure cluster, the endpoint format is https://[*** HOST NAME ***]:9879. On an unsecure cluster, the format is http://[*** HOST NAME ***]:9878.
-
Create Ozone buckets, and then provide access to Hive, Spark, and Impala to create
Iceberg databases and tables.
For secure clusters, you require the SSL client truststore location available in the /etc/ozone/conf.cloudera.OZONE-1/ozone-site.xml file.
-
To create the OBS buckets using AWS CLI on non-Kerberos clusters, run the
aws s3api --endpoint [*** S3 GATEWAY ENDPOINT ***] create-bucket --bucket [*** BUCKET NAME ***]command. -
To create the OBS buckets using AWS CLI on Kerberos-enabled clusters, run the
following commands:
kinit -kt /cdep/keytabs/om.keytab om aws s3api --endpoint [*** S3 GATEWAY ENDPOINT ***] create-bucket --bucket [*** BUCKET NAME ***] --ca-bundle=[*** SSL.CLIENT.TRUSTSTORE.LOCATION ***] -
To create OBS buckets using the Ozone shell, run the
ozone sh bucket create s3v/[*** BUCKET NAME ***] --layout OBJECT_STOREcommand. -
To create FSO buckets using the Ozone shell, run the
ozone sh bucket create s3v/command.[*** BUCKET NAME ***]--layout FILE_SYSTEM_OPTIMIZED
-
To create the OBS buckets using AWS CLI on non-Kerberos clusters, run the
-
Disable filesystem path in the Ozone service configuration by adding the following
key-value pair to the Cloudera Manager > Clusters > OZONE service > Configuration > Ozone Service Advanced Configuration Snippet (Safety Valve) for
ozone-conf/ozone-site.xml property:
ozone.om.enable.filesystem.paths = false -
Configure the S3A client properties to provide Hive, Spark, and Impala access to the
bucket.
- Go to Cloudera Manager > Clusters > HDFS service > Configuration > Cluster-wide Advanced Configuration Snippet (Safety Valve) for core-site.xml property.
-
Add the following key-value pairs:
fs.s3a.bucket.[*** BUCKET NAME ***].access.key = [*** AWS ACCESS KEY ***] fs.s3a.bucket.[*** BUCKET NAME ***].secret.key = [*** AWS SECRET KEY ***] fs.s3a.endpoint = [*** S3 ENDPOINT ***] fs.s3a.bucket.probe = 0 fs.s3a.change.detection.version.required = false Fs.s3a.path.style.access = true Fs.s3a.change.detection.mode = none fs.s3a.impl.disable.cache = true - Save your changes and refresh the stale configurations.
-
Create the Iceberg table.
create table tb1(id int, val int) stored by iceberg location 's3a://[*** BUCKET ***]/[*** KEY ***]'; - Enable the ‘Iceberg on Ozone replication’ feature flag.
- Add the source cluster as peer for replication.
-
Create the Iceberg replication policy by providing the following mandatory details in
the Create Iceberg replication policy wizard:
- On the General tab, set the Source Storage Filter to S3.
- On the Advanced tab, set the Location Mapping field to s3a://[*** SOURCE BUCKET ***] ---> s3a://[*** TARGET BUCKET ***].
