What's New in Apache Iceberg

Learn about the new features of Iceberg in Cloudera Runtime 7.3.1.

Apache Iceberg support for Hive

Cloudera Data Platform (CDP) supports a Data Lakehouse architecture by pre-integrating and unifying the capabilities of Data Warehouses and Data Lakes, to support data engineering, business intelligence, and machine learning – all on a single platform.

Starting from this release, CDP Private Cloud Base supports queries of Iceberg tables from the Apache Hive compute engine. You can run SQL queries to create and query Iceberg tables. Hive queries are table-format agnostic. You can run nested, correlated, or analytic queries on all supported table types. Hive on Iceberg supports and enables you to use the following Apache Iceberg features:

ACID transactions with Iceberg V2 tables
Point in time queries using Iceberg Time travel
Rollback table
Position deletes
Schema evolution
Flexible partitioning using partition evolution and partition transform
Support for materialized views
Snapshot expiry
Merge table
Multi-engine concurrent read and write

For more information about the Apache Iceberg features supported in CDP, see Using Apache Iceberg.

If you want to migrate your existing Hive tables to Iceberg tables, you can use the ALTER TABLE statement. For more information, see Migrate Hive table to Iceberg.

CDP supports the integration of Iceberg and Atlas that helps you identify the Iceberg tables to scan data and provide lineage support. Learn how Atlas works with Iceberg and what schema evolution, partition specification, partition evolution are with examples.