Setting up your first Database Catalog

How you activate the environment determines key capabilties of CDW, such as what data you can access. You learn how to set up the Database Catalog by activating your environment from the Data Warehouse UI.

The type of Data Lake used by the Database Catalog for your Virtual Warehouse determines whether or not you can access data in Data Hubs, and other clusters, from CDW. A Database Catalog can use different Data Lake types, including Shared Data Experience (SDX) and Cloudera Data Warehouse (CDW) Data Lake types:

If you register an environment and then start (activate) the environment from Environments, the Database Catalog gives you access from CDW to an SDX Data Lake, as indicated above.

If you activate an environment from the CDW service, the Database Catalog gives you access from CDW to a CDW Data Lake, as indicated below:

If you activate an environment from the CDW service, a default Database Catalog is created automatically and named after your environment. The default Database Catalog shares the HMS database with HMS in the Data Hub cluster. You can access any objects or data sets created in the Data Mart or the Data Engineering clusters from CDW Virtual Warehouses and vice versa. Activating an environment from the CDW service sets up the Kubernetes cluster, which provides the computing resources for the Database Catalog. In addition, activating an environment enables the CDW service to use the existing data lake that was set up for the environment, including all data, metadata, and security.

The following procedure shows you steps to follow to activate your environment from CDW and get the benefits of using a CDW Data Lake.

  1. Assuming you just registered an environment, navigate away from Environments to Cloudera Data Warehouse Overview.
  2. Click Activate to activate the environment for CDW.
  3. Click Data Warehouse > Database Catalog > ADD NEW, and create the new Database Catalog. See Adding a new Database Catalog.
    You see that the Database Catalog has a CDW Data Lake.