Creating Iceberg tables
Apache Iceberg is an open, high-performance table format for organizing datasets that can contain petabytes of data. Iceberg can be used to add tables to computing engines, such as Apache Hive and Apache Flink, from which the data can be queried using SQL.
As Iceberg is integrated as a connector to Flink, you can use the table format the same way for SQL Stream Builder (SSB). When creating an Iceberg table in SSB, you need to also have a Hive catalog set up and registered.
After setting up Hive for SSB, you can define Iceberg as a connector in the
CREATE
TABLE
statement as the example shows
below:CREATE TABLE ‘ssb’.’ssb_default’.’iceberg_hive’ (
‘column_int’ INT,
‘column_str’ STRING,
) WITH (
‘connector’ = ‘iceberg’,
‘catalog-database’ = ‘test_db’,
‘catalog-type’ = ‘hive’,
‘catalog-name’ = ‘iceberg_hive_catalog’,
‘catalog-table’ = ‘iceberg_hive_table’,
‘ssb-hive-catalog’ = ‘ssb_hive_catalog’
)
The following properties are mandatory when using the Iceberg connector:
Property | Example | Description |
---|---|---|
catalog-database |
test_db |
The Iceberg database name in the backend catalog, uses the current Flink database name by default. It will be created automatically if it does not exist when writing records into the Flink table |
catalog-type |
hive |
Type of the catalog, in case of Iceberg this must be Hive |
catalog-name |
iceberg_hive_catalog |
Name of the user-specified, internal catalog that is used by the connector. It is required as the connector does not have any default value. |
catalog-table |
iceberg_hive_table |
Name of the Iceberg table in the backend catalog. |
ssb-hive-catalog |
ssb_hive_catalog |
The name of the Hive catalog you have provided when adding Hive as a catalog |