Merge feature

From Hive and Impala, you can perform actions on an Iceberg table based on the results of a join between a target and source v2 Iceberg table.

The MERGE statement supports multiple WHEN clauses, where each clause can specify actions like UPDATE, DELETE, or INSERT. Actions are applied based on the join conditions defined between the source and target tables.

Hive and Impala syntax

MERGE INTO <target table> AS T USING <source expression/table> AS S
ON <boolean expression1>
WHEN MATCHED [AND <boolean expression2>] THEN UPDATE SET <set clause list>
WHEN MATCHED [AND <boolean expression3>] THEN DELETE
WHEN NOT MATCHED [AND <boolean expression4>] THEN INSERT VALUES <value list>

Hive and Impala example

Use the MERGE INTO statement to update an Iceberg table based on a staging table:

MERGE INTO customer USING new_customer_stage source ON source.id = customer.id 
    WHEN MATCHED THEN UPDATE SET name = source.name, state = source.new_state 
    WHEN NOT MATCHED THEN INSERT VALUES (source.id, source.name, source.state);

Create an Iceberg table and merge it with a non-Iceberg table.

create external table target_ice(a int, b string, c int) partitioned by spec (bucket(16, a), truncate(3, b)) stored by iceberg stored as orc tblproperties ('format-version'='2');
create table source(a int, b string, c int);

...

merge into target_ice as t using source src ON t.a = src.a
when matched and t.a > 100 THEN DELETE
when matched then update set b = 'Merged', c = t.c + 10
when not matched then insert values (src.a, src.b, src.c);
The Impala MERGE INTO statement supports the following capabilities:
  • WHEN NOT MATCHED BY SOURCE merge clause - Useful in situations when a source table's rows do not match the target table rows. For example:
    MERGE INTO target_ice t using source s on t.id = s.id
    WHEN NOT MATCHED BY SOURCE THEN UPDATE set t.column = "a";
  • Supports INSERT * syntax for the WHEN NOT MATCHED clause and UPDATE SET * syntax for the WHEN MATCHED clause.

    INSERT * enumerates all expressions from the source table or subquery to simplify inserting for target tables with large number of columns. The semantics is the same as the regular WHEN NOT MATCHED THEN INSERT clause.

    UPDATE SET * creates assignments for each target table column by enumerating the table columns and assigning source expressions by index.