Cloudera Data Warehouse Public Cloud service overview

In Cloudera Data Warehouse (CDW) Public Cloud service, you can create independent data warehouses and data marts for teams of business analysts without the overhead of bare metal deployments. CDW includes Database Catalogs and Virtual Warehouses that you use to access your data.

  • Database Catalogs:

    A Database Catalog is a logical collection of table and view metadata, security permissions, and other information. Behind each Database Catalog is a Hive metastore (HMS) instance that collects your definitions about data in cloud storage. An object store in a secure data lake contains all the actual data for your enviroment. A Database Catalog includes transient user and workload contexts from the Virtual Warehouse and governance artifacts that support functions such as auditing. Multiple Virtual Warehouses can query a Database Catalog. An environment can have multiple Database Catalogs.

    When you activate an environment from the Data Warehouse, a default Database Catalog is created automatically and named after your environment. The default Database Catalog shares the HMS database with HMS in the Data Hub cluster. You can access any objects or data sets created in the Data Mart or the Data Engineering clusters from CDW Virtual Warehouses and vice versa.

    Queries and query history saved in Hue database is stored in the Database Catalog and not deleted when you delete a Virtual Warehouse.

    You can load demo data to use in Hue when you add a non-default Database Catalog to your environment.

  • Virtual Warehouses:

    A Virtual Warehouse is an instance of compute resources, equivalent to a cluster. From a Virtual Warehouse, you access tables and views of your data in the data lake of a Database Catalog. A Virtual Warehouse binds compute and storage by executing authorized queries on tables and views through the Database Catalog.