What is Open Data Lakehouse?

CDP supports a Data Lakehouse architecture by pre-integrating and unifying the capabilities of Data Warehouses and Data Lakes, to support data engineering, business intelligence, and machine learning – all on a single platform. Cloudera’s support for an open data lakehouse brings high-performance, self-service reporting and analytics to your business – simplifying data management for both for data practitioners and administrators.

Open Data Lakehouse components

  • Support for Apache Iceberg 1.3 access and processing in CDP Private Cloud Base 7.1.9 and higher versions
  • Compute engines (Hive, Impala, Spark, Flink) integration for accessing and processing Iceberg datasets concurrently
  • SDX integration with Iceberg catalog
  • Iceberg table maintenance from Spark and replication
  • Iceberg Catalog set to HiveCatalog for Metastore management of Iceberg Tables
  • Certified HDFS and Ozone storage