Creating Iceberg tables
Apache Iceberg is an open, high-performance table format for organizing datasets that can contain petabytes of data. Iceberg can be used to add tables to computing engines, such as Apache Hive and Apache Flink, from which the data can be queried using SQL.
As Iceberg is integrated as a connector to Flink, you can use the table format the same way for SQL Stream Builder (SSB). Both the V1 and V2 version specifics are supported by the Flink connector.For more information, about the version changes, see the Apache Iceberg documentation.
Feature | SSB |
---|---|
Create catalog | Supported |
Create database | Supported |
Create table | Supported |
Alter table | Supported |
Drop table | Supported |
Select | Supported |
Insert into | Supported |
Insert overwrite | Supported |
Metadata tables | Supported |
Rewrite files action | Supported |
Upsert | Technical preview1 |
Equality delete | Not supported |
Using Hive for Iceberg integration
When using the Hive service located on your cluster, you can add it as a catalog on Streaming SQL Console. Before creating the Iceberg table, ensure that you have added Hive as a catalog using the steps described in documentation.
CREATE
TABLE
statement as the example shows
below:CREATE TABLE ‘ssb’.’ssb_default’.’iceberg_hive’ (
‘column_int’ INT,
‘column_str’ STRING,
) WITH (
‘connector’ = ‘iceberg’,
‘catalog-database’ = ‘test_db’,
‘catalog-type’ = ‘hive’,
‘catalog-name’ = ‘iceberg_hive_catalog’,
‘catalog-table’ = ‘iceberg_hive_table’,
‘ssb-hive-catalog’ = ‘ssb_hive_catalog’
‘engine.hive.enabled’ = ‘true’
)
Property | Example | Description |
---|---|---|
catalog-database |
test_db |
The Iceberg database name in the backend catalog, uses the current Flink database name by default. It will be created automatically if it does not exist when writing records into the Flink table |
catalog-type |
hive |
Type of the catalog |
catalog-name |
iceberg_hive_catalog |
User-specified catalog name. It is required as the connector does not have any default value. |
catalog-table |
iceberg_hive_table |
Name of the Iceberg table in the backend catalog. |
ssb-hive-catalog |
ssb_hive_catalog |
The name of the Hive catalog that you have registered in SSB. The configuration can be used when creating Iceberg catalogs as well. |