Creating Iceberg tables

Apache Iceberg is an open, high-performance table format for organizing datasets that can contain petabytes of data. Iceberg can be used to add tables to computing engines, such as Apache Hive and Apache Flink, from which the data can be queried using SQL.

As Iceberg is integrated as a connector to Flink, you can use the table format the same way for SQL Stream Builder (SSB). The following table summarizes the Iceberg features that are supported in SSB:

Table 1. SQL feature support for the Iceberg and SSB integration
Feature	SSB
Create catalog	Supported
Create database	Supported
Create table	Supported
Alter table	Supported note Only the altering table properties is supported, which does not inclide column and partition changes.
Drop table	Supported
Select	Supported
Insert into	Supported
Insert overwrite	Supported
Upsert	Technical preview¹
Equality delete	Technical preview
Metadata tables	Not supported
Rewrite files action	Supported

You can use a Hive service that is located on the same cluster as SSB, and you can use a Hive service that is available in a Cloudera Data Warehouse (CDW) cluster.

Using Hive for Iceberg integration🔗

When using the Hive service located on your cluster, you can add it as a catalog on Streaming SQL Console. Before creating the Iceberg table, ensure that you have added Hive as a catalog using the steps described in Adding catalogs documentation.

After setting up Hive for SSB, you can define Iceberg as a connector in the

CREATE
          TABLE

statement as the example shows below:

CREATE TABLE  iceberg_hive_table (
  col_str STRING,
  col_int INT
) WITH (
  'connector' = 'iceberg',
  'catalog-database' = 'test_db',
  'catalog-type' = 'hive',
  'catalog-name' = 'iceberg_hive_catalog',
  'catalog-table' = 'iceberg_hive_table',
  'ssb-hive-catalog' = 'ssb_hive_catalog',
  'engine.hive.enabled' = 'true'
);

The following properties are mandatory when using the Iceberg connector:


Property	Example	Description
`catalog-database`	`test_db`	The Iceberg database name in the backend catalog, uses the current Flink database name by default. It will be created automatically if it does not exist when writing records into the Flink table
`catalog-type`	`hive`	Type of the catalog
`catalog-name`	`iceberg_hive_catalog`	User-specified catalog name. It is required as the connector does not have any default value.
`catalog-table`	`iceberg_hive_table`	Name of the Iceberg table in the backend catalog.
`ssb-hive-catalog`	`ssb_hive_catalog`	The name of the Hive catalog you have provided when adding Hive as a catalog.
`engine.hive.enabled`	`true`	The `engine.hive.enabled` configuration is required to enable Hive compatibility. The configuration is automatically set to `true` if there is no custom value specified.

Before using Iceberg table as a sink, ensure that the checkpointing is enabled in the Job Settings as the Iceberg connector only commits data on checkpointing.

Access the Job Settings page from the SQL Editor.
Enable Checkpointing using the toggle, and configure the checkpointing settings based on your requirement.

Using the Hive service from Cloudera Data Warehouse (CDW)🔗

To access Iceberg tables stored in Cloudera Data Warehouse, follow the configuration steps below.

Before you begin

Ensure that the ssb user has access to the all - database, table policy under the Hadoop SQL service.
For more informationn, see the Configuring Hive policies documentation.
Create ID Broker mapping for the workload user.
For more informationn, see the Creating IDBroker mapping documentation.

Steps

Access Cloudera Manager of the CDW environment.
Click Cluster > View Client Configuration URLs > Hive Metastore.
Download the hive-conf tarball.
The downloaded tarball needs to be uploaded to the manager node of the Flink cluster.

Copy the hive-conf to the manager node of the Flink cluster.

scp <location>/hive-conf <workload_username>@<manager_node_FQDN>:.
Password:<your_workload_password>

Extract the file under the /tmp directory that is accessible by the ssb user using the following commands:
```
mkdir /opt/hive-conf
cd /opt/hive-conf
tar -xvf /path/to/hive-conf.tar.gz
chmod a+x .
chmod -R a+r .
```

Access the Streaming SQL Console and create the table using the following example:

CREATE TABLE my_ssb_table (
    id   INT
) WITH (
    'connector' = 'iceberg',
    'catalog-name' = 'internal-use',
    'catalog-database' = 'default',
    'catalog-table' = 'my_iceberg_table',
    'hive-conf-dir' = '/opt/hive-conf',
    'engine.hive.enabled' = 'true'
);

¹ The Upsert feature of the Iceberg and SSB integration is in Technical Preview and not ready for production deployment. Cloudera encourages you to explore these features in non-production environments and provide feedback on your experiences through the Cloudera Community Forums.

Creating Iceberg tables

Using Hive for Iceberg integration🔗

Using the Hive service from Cloudera Data Warehouse (CDW)🔗

We want your opinion

How can we improve this page?