Insert table data feature

From Hive and Impala, you can insert data into Iceberg tables using the standard INSERT INTO a single table. INSERT statements work for V1 and V2 tables.

You can replace data in the table with the result of a query. To replace data, Hive and Impala dynamically overwrite partitions that have rows returned by the SELECT query. Partitions that do not have rows returned by the SELECT query, are not replaced. Using INSERT OVERWRITE on tables that use the BUCKET partition transform is not recommended. Results are unpredictable because dynamic overwrite behavior would be too random in this case.

From Hive, CDP also supports inserting into multiple tables as a technical preview; however, this operation is not atomic, so data consistency of Iceberg tables is equivalent to that of Hive external tables. Changes within a single table will remain atomic.

Inserting, deleting, or updating table data generates a snapshot. A new snapshot corresponds to a new manifest list. Manifest lists are named snap-*.avro.

Iceberg specification defines sort orders. At this point, Hive doesn’t support defining sort orders. But if there are sort orders defined by using other engines Hive can utilize them on write operations. For more information about sorting, see sort orders specification.

Hive or Impala syntax

INSERT INTO TABLE tablename VALUES values_row [, values_row ...]

INSERT INTO TABLE tablename1 select_statement1 FROM tablename2

INSERT OVERWRITE TABLE tablename1 select_statement1 FROM tablename2

Hive or Impala examples

CREATE TABLE ice_10 (i INT, s STRING, b BOOLEAN) STORED BY ICEBERG;
INSERT INTO ice_10 VALUES (1, 'asf', true);

CREATE TABLE ice_11 (i INT, s STRING, b BOOLEAN) STORED BY ICEBERG;
INSERT INTO ice_11 VALUES (2, 'apache', false);
INSERT INTO ice_11 SELECT * FROM ice_10;
SELECT * FROM ice_11;

INSERT OVERWRITE ice_11 SELECT * FROM ice_10;

Hive example

FROM customers
    INSERT INTO target1 SELECT customer_id, first_name;
    INSERT INTO target2 SELECT last_name, customer_id;