Iceberg overview

Apache Iceberg is a table format for huge analytics datasets on the storage systems such as HDFS. You can efficiently query large Iceberg tables on object stores. Iceberg supports concurrent reads and writes on all storage media.

You create Iceberg tables and run queries from Hive or Impala in CDP. The Hive metastore stores Iceberg metadata, including the location of the table.

Hive metastore plays a lightweight role in the Catalog operations. Iceberg relieves Hive metastore (HMS) pressure by storing partition information in metadata files on the file system/object store instead of within the HMS. This architecture supports rapid scaling without performance hits.

By default, Hive and Impala use the Iceberg HiveCatalog. Cloudera recommends the default HiveCatalog to create an Iceberg table.

You can use Iceberg when a single table contains tens of petabytes of data, and you can read these tables without compromising performance. From Apache Hive and Apache Impala, you can query Iceberg tables. The following features are included:

Listing table snapshot and history
Expiring snapshots from Hive and Impala
Migrating external tables to iceberg in Hive
Iceberg table rollback from Hive
Creating an Iceberg table from Hive with a metadata location
Expiring and removing old snapshots
Performance and scalability enhancements

The following table lists Iceberg features you can access from Hive and Impala in CDW Private Cloud 1.5.0:


Feature name	Hive	Impala
Create table	✔	✔
Read Iceberg V1 tables	✔	✔
Schema evolution and partition evolution	✔	✔
Load data	✔	✔
Create table as select (CTAS)	✔	✔
Insert into select	✔	✔
Insert overwrite	✔	✔
Update, Delete, Merge with Iceberg V2 tables	✔	𐄂
Time travel using timestamps and snapshot IDs	✔	𐄂
Compaction	𐄂	𐄂
Snapshot expiration	𐄂	✔

Apache Iceberg integrates Apache Ranger for security. You can use Ranger integration with Hive and Impala to apply fine-grained access control to sensitive data in Iceberg tables. Iceberg is also integrated with Data Visualization for creating dashboards and other graphics of your Iceberg data.