Prerequisites and limitations for using Iceberg
Learn about the supported versions for Cloudera Data Engineering, Spark, and Data Lake to use with Apache Iceberg.
To use Apache Iceberg in Cloudera Data Engineering, you'll need the following prerequisites:
- Spark 3.2 or higher
- A compatible version of Data Lake as listed in Cloudera Data Engineering and Data Lake compatibility linked below
- Cloudera Data Engineering 1.16 or higher
- AWS or Azure is supported starting in Cloudera Data Engineering 1.17-h1 (which supports Iceberg 0.14)
Iceberg table format version 2
Iceberg table format version 2 (v2) is available starting in Iceberg 0.14. Iceberg table format v2 uses row-level UPDATE and DELETE operations that add deleted files to encoded rows that were deleted from existing data files. The DELETE, UPDATE, and MERGE operations function by writing delete files instead of rewriting the affected data files. Additionally, upon reading the data, the encoded deletes are applied to the affected rows that are read. This functionality is called merge-on-read.
To use Iceberg table format v2, you'll need the following prerequisites:
- Cloudera Data Engineering 1.17-h1 or higher
- Iceberg 0.14
- Spark 3.2 or higher