Provisioning Iceberg Replication Data Hub
Before you replicate Iceberg tables between Data Lakes, you must deploy a source Data Hub in the source Data Lake and target Data Hub in the target Data Lake. You can use CDP CLI or Cloudera Management Console to provision the source and target Iceberg Replication Data Hub.
- Compute resources for Iceberg replication.
- Source and target data locations.
- Access control on source and target data.
- Iceberg replication policy metadata management.
- Use 7.3.2 and 7.13.2 and higher versions to create the Data Hub.
- The admin server port 2288 must be open on the Data Hub. To verify whether the
port is open and available, perform the following steps:
- Get the security group ID of the Data Lake on the tab. For example,
sg-0015f2ed3f497520ed. - Go to the AWS management console and select the required region.
- Go to the tab.
- Search for the security group by ID for the group ID obtained in Step 1.
- Add an inbound rule.
Figure 1. Inbound rule example
- Get the security group ID of the Data Lake on the tab. For example,
- Assign the DataHubCreator role to the user creating the Data Hub. For more information about roles, see Understanding account roles and resource roles.
