Iceberg in Cloudera Data Warehouse

Cloudera Data Warehouse (CDW) supports Apache Iceberg, a new table format for huge analytics datasets in the cloud. You can work with large Iceberg tables on object stores. Iceberg supports concurrent reads and writes on all storage media.

You can use Iceberg when a single table contains tens of petabytes of data, and you can read these huge tables without any compromise on performance. Cloudera Data Warehouse service provides Apache Hive (GA) and Apache Impala (GA) to query Iceberg tables.

You create Iceberg tables and run queries from a CDW Virtual Warehouse, including a Unified Analytics-enabled Virtual Warehouse. The Hive metastore stores Iceberg metadata, including the location of the table.

Hive metastore plays a lightweight role in the Catalog operations. Iceberg relieves Hive metastore (HMS) pressure by storing partition information in metadata files on the file system/object store instead of within the HMS. This architecture supports rapid scaling without performance hits.

Iceberg catalogs HiveCatalog, HadoopTables, and HadoopCatalog support the full range of SQL DDL commands. By default, Hive and Impala use the Iceberg HiveCatalog. Cloudera recommends the default HiveCatalog to create an Iceberg table. You can set the Iceberg catalog for the table by setting table property hive.catalog to one of the following values:
  • hive.catalog
  • hadoop.tables
  • hadoop.catalog

For more information about Iceberg catalogs, see Apache Software Foundation Iceberg Docs.

Apache Iceberg in CDW integrates Apache Ranger for security. You can use Ranger integration with Hive and Impala to apply fine-grained access control to sensitive data in Iceberg tables. Iceberg in CDW is also integrated with Data Visualization for creating dashboards and other graphics of your Iceberg data.