Migrating Hive tables to Iceberg tables

Using Iceberg tables facilitates multi-cloud open lakehouse implementations. You can move Iceberg-based workloads in Cloudera Data Platform (CDP) across deployment environments on AWS and Azure. You can migrate existing external Hive tables from Hive to Iceberg in Cloudera Data Warehouse (CDW) or from Spark to Iceberg in Cloudera Data Engineering (CDE).

Cloudera has chosen Apache Iceberg as the foundation for an open lakehouse in CDP. Any compute engine can insert, update, and delete data into Iceberg tables. Any compute engine can read Iceberg tables.

The following CDP data services support Iceberg for performing multi-function analytics for the open lakehouse with Cloudera SDX shared security and governance:
  • Cloudera Data Warehouse (CDW): Batch ETL, SQL and BI analytic workloads, row-level database operations like updates, deletes, and merge
  • Cloudera Data Engineering (CDE): Batch ETL, row-level database operations, table maintenance
  • Cloudera Machine Learning (CML): Data Science through Python, R, and other languages, ML model training and inferencing, table maintenance
  • Cloudera Data Flow (CDF): Nifi streaming ingestion
  • Cloudera Stream Processing (CSP): Unified streaming ingestion with SQL