Exploring a data lake

In Cloudera Data Warehouse, you explore sample airline database tables in your data lake from a Virtual Warehouse. You learn how to load the airline data. You view and query the tables.

  • You obtained permissions to access a running environment for creating a Database Catalog and Virtual Warehouse.
  • You obtained the DWAdmin role to perform Data Warehouse tasks.
  • You logged into the CDP web interface.
  • You activated the environment from Cloudera Data Warehouse.
  • You set up a Hadoop SQL policy in Ranger to access data in the data lake.
For more information about meeting prerequisites, see Getting started in CDW.
In this task, you set up a minimal Virtual Warehouse for learning how to explore a data lake. You do not need to configure Virtual Warehouse executors for auto-scaling. You can skip optional configurations to explore a data lake. Plan to delete the Virtual Warehouse after your exploration.
  1. Navigate to Data Warehouse > Database Catalogs > Add New.
  2. In Create Database Catalog, in Name, specify a Database Catalog name.
  3. In Environments, select the name of your environment activated from Cloudera Data Warehouse.
  4. In Select Size, accept the default (small) for this exploration of demo data.
  5. Accept default value for the Database Catalog Image Version.
  6. Turn on Load Demo Data to explore sample airline data from Hue, and click Create Database Catalog.
  7. Click Virtual Warehouses > Add New.
  8. Set up the Virtual Warehouse:
    • Specify a Name for the Virtual Warehouse.
    • In Type, click the SQL engine you prefer: Hive or Impala.
    • Select your Database Catalog and User Group if you have been assigned a user group.
    • In Size, select the number of executors, for example xsmall-2Executors.
    • Accept default values for other settings.
  9. Click Create.
  10. After your Virtual Warehouse starts running, click Hue, and expand Tables to explore available data.
  11. Explore data lake contents by running queries.
    For example, select all data from the airlines table.