Merge feature
From Hive and Impala, you can perform actions on an Iceberg table based on the results of a join between a target and source v2 Iceberg table.
The MERGE
statement supports multiple WHEN
clauses, where
each clause can specify actions like UPDATE
, DELETE
, or
INSERT
. Actions are applied based on the join conditions defined between the
source and target tables.
Hive and Impala syntax
MERGE INTO <target table> AS T USING <source expression/table> AS S
ON <boolean expression1>
WHEN MATCHED [AND <boolean expression2>] THEN UPDATE SET <set clause list>
WHEN MATCHED [AND <boolean expression3>] THEN DELETE
WHEN NOT MATCHED [AND <boolean expression4>] THEN INSERT VALUES <value list>
Hive and Impala example
Use the MERGE INTO
statement to update an Iceberg table based on a staging
table:
MERGE INTO customer USING new_customer_stage source ON source.id = customer.id
WHEN MATCHED THEN UPDATE SET name = source.name, state = source.new_state
WHEN NOT MATCHED THEN INSERT VALUES (source.id, source.name, source.state);
Create an Iceberg table and merge it with a non-Iceberg table.
create external table target_ice(a int, b string, c int) partitioned by spec (bucket(16, a), truncate(3, b)) stored by iceberg stored as orc tblproperties ('format-version'='2');
create table source(a int, b string, c int);
...
merge into target_ice as t using source src ON t.a = src.a
when matched and t.a > 100 THEN DELETE
when matched then update set b = 'Merged', c = t.c + 10
when not matched then insert values (src.a, src.b, src.c);
MERGE INTO
statement supports the following capabilities:WHEN NOT MATCHED BY SOURCE
merge clause - Useful in situations when a source table's rows do not match the target table rows. For example:MERGE INTO target_ice t using source s on t.id = s.id WHEN NOT MATCHED BY SOURCE THEN UPDATE set t.column = "a";
- Supports
INSERT *
syntax for theWHEN NOT MATCHED
clause andUPDATE SET *
syntax for theWHEN MATCHED
clause.INSERT *
enumerates all expressions from the source table or subquery to simplify inserting for target tables with large number of columns. The semantics is the same as the regularWHEN NOT MATCHED THEN INSERT
clause.UPDATE SET *
creates assignments for each target table column by enumerating the table columns and assigning source expressions by index.