Delete data feature

You can delete data in a V2 Iceberg table.

Hive and Impala delete Iceberg V2 table data using position delete files, one type of formatting defined by the Iceberg Spec. Cloudera supports Row-level deletes. Position delete files contain the following information:
  • file_path, which is a full URI
  • pos, the file position of the row
Delete files are sorted by file_path and pos. The following table shows an example of delete files in a partitioned table:
Table 1.
Partition Partition information Delete file
YEAR(ts)=2022

ts_year=2022/data-abcd.parquet

ts_year=2022/data-bcde.parquet

ts_year=2022/delete-wxyz.parquet

YEAR(ts)=2023 ts_year=2023/data-efgh.parquet ts_year=2023/delete-hxkw.parquet
MONTH(ts)=2023-06 ts_month=2023-06/data-ijkl.parquet ts_month=2023-06/delete-uzwd.parquet
MONTH(ts)=2023-07 ts_month=2023-07/data-mnop.parquet ts_month=2023-07/delete-udqx.parquet
If you have a problem with deletes in the following situations, an equality delete file in the table is the likely cause:
  • In Change Data Capture (CDC) applications
  • In upserts from Apache Flink
  • From a third-party engine

Inserting, deleting, or updating table data generates a snapshot.

You use a WHERE clause in your DELETE statement. For example:

delete from tbl_ice where a <= 2,1;

Hive and Impala evaluate rows from one table against a WHERE clause, and delete all the rows that match WHERE conditions. If you want delete all rows, use the Truncate feature. The WHERE expression is similar to the WHERE expression used in SELECT. The conditions in the WHERE clause can refer to any columns.

Concurrent operations that include DELETE do not introduce inconsistent table states. Iceberg runs validation checks to check for concurrent modifications, such as DELETE+INSERT. Only one will succeed. On the other hand, DELETE+DELETE, and INSERT+INSERT can both succeed, but in the case of a concurrent DELETE+UPDATE, UPDATE+UPDATE, DELETE+INSERT, UPDATE+INSERT from Hive, only the first operation will succeed.

From joined tables, you can delete all matching rows from one of the tables. You can join tables of any kind, but the table from which the rows are deleted must be an Iceberg table. The FROM keyword is required in this case, to separate the name of the table whose rows are being deleted from the table names of the join clauses.

Hive or Impala syntax

delete from tablename [where expression]
delete joined_tablename from [joined_tablename, joined_tablename2, ...] [ where expression ] 

Hive or Impala examples

create external table tbl_ice(a int, b string, c int) stored by iceberg tblproperties ('format-version'='2');

insert into tbl_ice values (1, 'one', 50), (2, 'two', 51), (3, 'three', 52), (4, 'four', 53), (5, 'five', 54), (111, 'one', 55), (333, 'two', 56);

delete from tbl_ice where a <= 2,1;

The following example deletes 0, 1, or more rows of the table. If col1 is a primary key, 0 or 1 rows are deleted:

delete from ice_table where col1 = 100;

For more information, including examples, see Using Impala with Iceberg Tables.