Exploring a Data Lake

In Cloudera Data Warehouse (CDW) Private Cloud, a default Database Catalog is linked to the Hive MetaStore (HMS) on the CDP Private Cloud Base cluster. As soon as you activate an environment in CDW, you can explore and query databases and tables that are present on your base cluster. If you are new to CDP, then you can also import data using Hue and run queries.

  • You must have the DWAdmin role.
  • Your must have activated an environment in CDW.
  • You must have a default Database Catalog running.
You set up a Virtual Warehouse with the minimum, default configurations for learning how to explore a Data Lake. You do not need to configure auto-scaling and other features to explore a Data Lake. Ensure you delete the Virtual Warehouse after your exploration.
  1. Log in to the Data Warehouse service as DWAdmin.
  2. Click Virtual Warehouses > Add New.
  3. Set up the Virtual Warehouse:
    • Specify a Name for the Virtual Warehouse.
    • In Type, click the SQL engine you prefer: Hive or Impala.
    • Select your Database Catalog and User Group if you have been assigned a user group.
    • In Size, select the number of executors, for example xsmall-2Executors.
    • Accept default values for other settings.
  4. Click Create.
  5. After your Virtual Warehouse starts running, click Hue, and expand Tables to explore available data.
  6. Explore data lake contents by running queries.
    For example, select all data from the airlines table.