What is Open Data Lakehouse?
CDP supports a Data Lakehouse architecture by pre-integrating and unifying the capabilities of Data Warehouses and Data Lakes, to support data engineering, business intelligence, and machine learning – all on a single platform. Cloudera’s support for an open data lakehouse brings high-performance, self-service reporting and analytics to your business – simplifying data management for both for data practitioners and administrators.
Open Data Lakehouse components
- Support for Apache Iceberg 1.3 access and processing in CDP Private Cloud Base 7.1.9
- Compute engines (Impala, Spark, Flink, Nifi) integration for accessing and processing Iceberg datasets concurrently
- SDX integration with Iceberg catalog
- Iceberg table maintenance from Spark and replication
- Iceberg Catalog set to HiveCatalog for Metastore management of Iceberg Tables
- Certified HDFS and Ozone storage