Exploring a data lake

In Cloudera Data Warehouse, you explore sample airline database tables in your data lake from a Virtual Warehouse. You learn how to load the airline data. You view and query the tables.

  • You obtained permissions to access a running environment for creating a Database Catalog and Virtual Warehouse.
  • You obtained the DWAdmin role to perform Data Warehouse tasks.
  • You logged into the CDP web interface.
  • You activated the environment from Cloudera Data Warehouse.
  • You set up a Hadoop SQL policy in Ranger to access data in the data lake.
For more information about meeting prerequisites, see Getting started in CDW.
In this task, you set up a minimal Virtual Warehouse for learning how to explore a data lake. You do not need to configure auto-scaling and optional features to explore a data lake. Plan to delete the Virtual Warehouse after your exploration.
  1. Navigate to Data Warehouse > Database Catalogs > New Database Catalog.
  2. In New Database Catalog, in Name, specify a Database Catalog name.
  3. In Environments, select the name of your environment activated from Cloudera Data Warehouse.
  4. Accept default values for the image version and data lake type (SDX).
  5. Turn on Load Demo Data to explore sample airline data from Hue, and click Create.
  6. Click Virtual Warehouses > New Virtual Warehouse.
  7. Set up the Virtual Warehouse:
    • Specify a Name for the Virtual Warehouse.
    • In Type, click the SQL engine you prefer: Hive or Impala.
    • Select your Database Catalog and User Group if you have been assigned a user group.
    • In Size, select the number of executors, for example xsmall-2Executors.
    • Accept default values for other settings.
  8. Click Create.
  9. After your Virtual Warehouse starts running, click Hue, and expand Tables to explore available data.
  10. Explore data lake contents by running queries.
    For example, select all data from the airlines table.