Cloudera Data Warehouse service architecture

Administrators and IT teams can get a high-level view of the Cloudera Data Warehouse (CDW) service components and how they are integrated within the CDP stack.

The CDW service is composed of Database Catalogs (storage prepared for use with a Virtual Warehouse) and Virtual Warehouses (compute environments that can access a Database Catalog) and they are decoupled by design. Multiple Virtual Warehouses of differing sizes and types can be configured to operate on the same Database Catalog, providing workload diversity and isolation on the same data at the same time.

This service architecture diagram shows the components of the CDW private cloud service, their deployment in your private cloud environment, and how they interact with other services and components of the CDP stack.

Database Catalog

A Database Catalog is a logical collection of table and view metadata, security permissions, and other information. The data context is comprised of table and view definitions, transient user and workload contexts from the Virtual Warehouse, security permissions, and governance artifacts that support functions like auditing. Multiple Virtual Warehouses can query a Database Catalog. An environment can have multiple Database Catalogs.

When you activate an environment from the Data Warehouse, a default Database Catalog is created automatically and named after your environment. The default Database Catalog shares the HMS database with HMS on the base cluster. This enables you to access any of the objects or data sets created in the base clusters from CDW Virtual Warehouses and vice versa. Queries and query history saved in the Hue database are stored in the Database Catalog and not deleted when you delete a Virtual Warehouse.

You can load demo data to use in Hue when you add a non-default Database Catalog to your environment.

Virtual Warehouses

A Virtual Warehouse is an instance of compute resources running in Kubernetes to execute the queries. From a Virtual Warehouse, you access tables and views of your data in a Database Catalog's Data Lake. Virtual Warehouses bind compute and storage by executing authorized queries on tables and views through the Database Catalog. Virtual Warehouses can scale automatically, and ensure performance even with high concurrency. All JDBC/ODBC compliant tools connect to the virtual warehouse to run queries. Virtual Warehouses also expose HS2-compatible endpoints for CLI tools such as Beeline, Impala-Shell, and Impyla.

Data Visualization

In addition to Database Catalogs and Virtual Warehouses that you use to access your data, CDW integrates Data Visualization for building graphic representations of data, dashboards, and visual applications based on CDW data, or other data sources you connect to. You, and authorized users, can explore data across the entire CDP data lifecycle using graphics, such as pie charts and histograms. You can arrange visuals on a dashboard for collaborative analysis.