Creating a new Iceberg table from Spark 3
In Cloudera Data Engineering (CDE), you can create a Spark job that creates a new Iceberg table or import an existing Hive table. Once created, the table can be used for subsequent operations.
An example Spark SQL creation command to create a new Iceberg table is as
follows:
spark.sql("""CREATE EXTERNAL TABLE ice_t (idx int, name string, state string)
USING iceberg
PARTITIONED BY (state)""")
For information about creating tables, see the Iceberg documentation.
Creating an Iceberg table format v2
To use the Iceberg table format v2, set the format-version property to
2
as shown below:
CREATE TABLE logs (app string, lvl string, message string, event_ts timestamp) USING iceberg TBLPROPERTIES ('format-version' = '2')
<delete-mode>
<update-mode>
and <merge-mode>
can be specified
during table creation for modes of the respective operation. If unspecified, they default to
merge-on-read
.
Unsupported Feature: Create table … like
The
create table … like
feature is not supported in
Spark:CREATE TABLE <target> LIKE <source> USING iceberg
Here, <source> is an existing Iceberg table. This operation may appear
to succeed and does not display errors and only warnings, but the resulting table is not a
usable table.