Apache Iceberg in Cloudera Data Platform
Apache Iceberg is a cloud-native, high-performance open table format for organizing petabyte-scale analytic datasets on a file system or object store. Combined with Cloudera Data Platform (CDP), users can build an open data lakehouse architecture for multi-function analytics and to deploy large scale end-to-end pipelines.
Open data lakehouse on CDP simplifies advanced analytics on all data with a unified platform for structured and unstructured data and integrated data services to enable any analytics use case from ML, BI to stream analytics and real-time analytics. Apache Iceberg is the secret sauce of the open lakehouse.
The following table shows the support for Iceberg in CDP and below the table Iceberg versions v1 and v2 are defined:
SQL Engine | ||||||
---|---|---|---|---|---|---|
Release |
Iceberg support level | Impala | Hive | Spark | NiFi | Flink |
Public Cloud Data Services 1.7.1 2023.0.15.0-243 | GA | v1, v2: read, insert, and delete | v1, v2: read, insert, update, and delete | v1, v2: read, insert, update, and delete | v1, v2: read and insert | N/A |
Public Cloud Data Services other supported releases | GA | v1, v2: read and insert | v1, v2: read, insert, update, and delete | v1, v2: read, insert, update, and delete | v1, v2: read and insert | N/A |
Data Hub 7.2.16.2 | GA | v1, v2: read | v1: read, insert, update, delete | v1, v2: read, insert, update, and delete | v1, v2: read and insert | v1: read and insert |
Data Hub 7.2.17 | GA | v1, v2: read | v1: read, insert, update, delete | v1, v2: read, insert, update, and delete | v1, v2: read and insert | v1, v2: read, append, overwrite *** |
Private Cloud Data Services 1.5.1 2023.0.13.0-20 | Technical Preview (7.1.7 Base, 7.1.8 Base) | v1, v2: read | v1, v2: read, insert, update, and delete | v1, v2: read, insert, update, and delete | No Private Cloud support | No Private Cloud support |
Private Cloud Data Services 1.5.2 | GA (7.1.9 Base) Technical Preview (7.1.7 Base, 7.1.8 Base) | v1, v2: read, insert, and delete | v1, v2: read, insert, update, and delete | v1, v2: read, insert, update, and delete | v1, v2: read and insert (7.1.9 Base) | v1, v2: read and insert (7.1.9 Base) |
Base 7.1.7 SP2, 7.1.8 | No Iceberg support | |||||
Base 7.1.9 | GA | v1, v2: read and insert | No Iceberg support | v1, v2: read, insert, update, and delete |
v1, v2: read and insert |
v1, v2: read and insert |
** The support for delete operations, except from Flink, shown in this table is limited to position deletes. Equality deletes are not supported in these releases except from Flink.
*** Iceberg v2 updates and deletes from Flink are a technical preview in CDP Public Cloud 7.2.17.
- v1
Defines large analytic data tables using open format files.
- v2
Specifies ACID compliant tables including row-level deletes and updates.
Release | Docs | Iceberg Support Level |
---|---|---|
Open Data Lakehouse (Cloudera Private Cloud Base 7.1.9) | Iceberg in Open Data Lakehouse | GA |
Iceberg support for Atlas | GA | |
SQL Stream Builder with Iceberg (CSA 1.11) and Flink with Iceberg (CSA 1.11) Iceberg replication policies | GA | |
Data Engineering (CDE) Public Cloud | Using Iceberg | GA |
Data Warehouse (CDW) Public Cloud | Iceberg features | GA |
Data Engineering (CDE) Private Cloud | Using Iceberg | Technical Preview |
Data Warehouse (CDW) Private Cloud | Iceberg introduction | GA (7.1.9 Base), Technical Preview (7.1.7-7.1.8 Base) |
Public Cloud Data Hub 7.2.16 and later | Iceberg features | Technical Preview |
Public Cloud Data Hub 7.2.17 and later | Iceberg in Apache Atlas | Technical Preview |
Streaming Analytics Iceberg support in Flink | GA | |
Flink/Iceberg connector | GA | |
Using NiFi to ingest Iceberg data | ||
DataFlow in CDP Public Cloud Data Hub | Using the PutIceberg processor | GA |
Flow Management for CDP Private Cloud | Technical preview features | Technical Preview |
Cloudera Machine Learning (CML) Public Cloud | Connection to Iceberg | GA |