Delete data feature
Cloudera supports row-level deletes in Iceberg V2 tables.
- position deletes
- equality deletes
- If you are using Apache Flink or Apache NiFi to write equality deletes, then ensure that you provide a PRIMARY KEY for the table. This is required for engines to know which columns to write into the equality delete files.
- If the table is partitioned then the partition columns have to be part of the PRIMARY KEY
- For Apache Flink, the table should be in 'upsert-mode' to write equality deletes
- Partition evolution is not allowed for Iceberg tables that have PRIMARY KEYs
- file_path, which is a full URI
- pos, the file position of the row
| Partition | Partition information | Delete file | 
|---|---|---|
| YEAR(ts)=2022 | ts_year=2022/data-abcd.parquet ts_year=2022/data-bcde.parquet | ts_year=2022/delete-wxyz.parquet | 
| YEAR(ts)=2023 | ts_year=2023/data-efgh.parquet | ts_year=2023/delete-hxkw.parquet | 
| MONTH(ts)=2023-06 | ts_month=2023-06/data-ijkl.parquet | ts_month=2023-06/delete-uzwd.parquet | 
| MONTH(ts)=2023-07 | ts_month=2023-07/data-mnop.parquet | ts_month=2023-07/delete-udqx.parquet | 
Inserting, deleting, or updating table data generates a snapshot.
You use a WHERE clause in your DELETE statement. For example:
delete from tbl_ice where a <= 2,1;Hive and Impala evaluate rows from one table against a WHERE clause, and delete all the rows that match WHERE conditions. If you want delete all rows, use the Truncate feature. The WHERE expression is similar to the WHERE expression used in SELECT. The conditions in the WHERE clause can refer to any columns.
Concurrent operations that include DELETE do not introduce inconsistent table states. Iceberg runs validation checks to check for concurrent modifications, such as DELETE+INSERT. Only one will succeed. On the other hand, DELETE+DELETE, and INSERT+INSERT can both succeed, but in the case of a concurrent DELETE+UPDATE, UPDATE+UPDATE, DELETE+INSERT, UPDATE+INSERT from Hive, only the first operation will succeed.
From joined tables, you can delete all matching rows from one of the tables. You can join tables of any kind, but the table from which the rows are deleted must be an Iceberg table. The FROM keyword is required in this case, to separate the name of the table whose rows are being deleted from the table names of the join clauses.
Hive or Impala syntax
delete from tablename [where expression]delete joined_tablename from [joined_tablename, joined_tablename2, ...] [ where expression ] Hive or Impala examples
create external table tbl_ice(a int, b string, c int) stored by iceberg tblproperties ('format-version'='2');
insert into tbl_ice values (1, 'one', 50), (2, 'two', 51), (3, 'three', 52), (4, 'four', 53), (5, 'five', 54), (111, 'one', 55), (333, 'two', 56);
delete from tbl_ice where a <= 2,1;The following example deletes 0, 1, or more rows of the table. If col1 is a primary key, 0 or 1 rows are deleted:
delete from ice_table where col1 = 100;