Delete data feature
Cloudera supports row-level deletes in Iceberg V2 tables.
- position deletes
- equality deletes
From Hive and Impala, you can read/write position deletes, but you can only read equality deletes. You cannot write equality deletes.
- As a sink for streaming data from Flink to Cloudera Data Warehouse (CDW) Public Cloud
- As an Apache NiFi data flow to ingest data in Iceberg table format into CDW
If you cannot read equality deletes, the output of the read is incorrect with regard to any deletions.
- If you are using Apache Flink or Apache NiFi to write equality deletes, then ensure that you provide a PRIMARY KEY for the table. This is required for engines to know which columns to write into the equality delete files.
- If the table is partitioned then the partition columns have to be part of the PRIMARY KEY
- For Apache Flink, the table should be in 'upsert-mode' to write equality deletes
- Partition evolution is not allowed for Iceberg tables that have PRIMARY KEYs
- Creating Iceberg tables in SQL Stream Builder
- Ingesting data into CDW using Iceberg
- file_path, which is a full URI
- pos, the file position of the row
Partition | Partition information | Delete file |
---|---|---|
YEAR(ts)=2022 |
ts_year=2022/data-abcd.parquet ts_year=2022/data-bcde.parquet |
ts_year=2022/delete-wxyz.parquet |
YEAR(ts)=2023 | ts_year=2023/data-efgh.parquet | ts_year=2023/delete-hxkw.parquet |
MONTH(ts)=2023-06 | ts_month=2023-06/data-ijkl.parquet | ts_month=2023-06/delete-uzwd.parquet |
MONTH(ts)=2023-07 | ts_month=2023-07/data-mnop.parquet | ts_month=2023-07/delete-udqx.parquet |
Inserting, deleting, or updating table data generates a snapshot.
You use a WHERE clause in your DELETE statement. For example:
delete from tbl_ice where a <= 2,1;
Hive and Impala evaluate rows from one table against a WHERE clause, and delete all the rows that match WHERE conditions. If you want delete all rows, use the Truncate feature. The WHERE expression is similar to the WHERE expression used in SELECT. The conditions in the WHERE clause can refer to any columns.
Concurrent operations that include DELETE do not introduce inconsistent table states. Iceberg runs validation checks to check for concurrent modifications, such as DELETE+INSERT. Only one will succeed. On the other hand, DELETE+DELETE, and INSERT+INSERT can both succeed, but in the case of a concurrent DELETE+UPDATE, UPDATE+UPDATE, DELETE+INSERT, UPDATE+INSERT from Hive, only the first operation will succeed.
From joined tables, you can delete all matching rows from one of the tables. You can join tables of any kind, but the table from which the rows are deleted must be an Iceberg table. The FROM keyword is required in this case, to separate the name of the table whose rows are being deleted from the table names of the join clauses.
Hive or Impala syntax
delete from tablename [where expression]
delete joined_tablename from [joined_tablename, joined_tablename2, ...] [ where expression ]
Hive or Impala examples
create external table tbl_ice(a int, b string, c int) stored by iceberg tblproperties ('format-version'='2');
insert into tbl_ice values (1, 'one', 50), (2, 'two', 51), (3, 'three', 52), (4, 'four', 53), (5, 'five', 54), (111, 'one', 55), (333, 'two', 56);
delete from tbl_ice where a <= 2,1;
The following example deletes 0, 1, or more rows of the table. If col1 is a primary key, 0 or 1 rows are deleted:
delete from ice_table where col1 = 100;