Test driving Iceberg from Hive

You learn how to access the Hive demo data, which you can use to get hands-on experience running Iceberg queries.

A DWAdmin can optionally load demo data in Hue when you create a new Database Catalog.
The Admin enables Load Demo Data when creating the Database Catalog. Users can then query sample airline demo data in Hue.
  • You must meet the prerequisites to query Iceberg tables from a Virtual Warehouse mentioned earlier.
  • You have access to a Hive Virtual Warehouse, having a Database Catalog in which demo data has been loaded.
  • You obtained the required role for querying the Virtual Warehouse: DWUser
  1. In Cloudera Data Warehouse Overview, select a Hive Virtual Warehouse, and click Hue.
  2. In Hue, expand the default database and verify that the airline demo data is available in your Virtual Warehouse.
    You see the following list of demo databases:
  3. Select airline_ontime_iceberg to use the airline_ontime_iceberg database.
  4. Take a look at the tables in the airline_ontime_iceberg database.
    Flights is the fact table. It has 100M rows and three dirmensions, ariline, airports, and planes. This records flights for more than 10 years in the US, and includes the following details:
    • origin
    • destination
    • delay
    • air time
  5. Become familiar with the Iceberg airline queries to set up this database. See the next topic.
  6. Query the demo data from Hive.
    For example, find the flights that departed each year, by IATA code, airport, city, state, and country. Find the average departure delay.
    SELECT f.month, a.iata, a.airport, a.city, a.state, a.country
    FROM flights f,
    airports a
    WHERE f.origin = a.iata
    GROUP BY 
    f.month,
    a.iata,
    a.airport,
    a.city,
    a.state,
    a.country
    HAVING COUNT(*) > 10000
    ORDER BY AVG(f.DepDelay) DESC
    LIMIT 10;                        
    Output appears as follows:
    +----------+---------+------------------------------------+---------------+----------+------------+
    | f.month  | a.iata  |             a.airport              |    a.city     | a.state  | a.country  |
    +----------+---------+------------------------------------+---------------+----------+------------+
    | 12       | ORD     | Chicago O'Hare International       | Chicago       | NULL     | USA        |
    | 6        | EWR     | Newark Intl                        | Newark        | NULL     | USA        |
    | 7        | JFK     | John F Kennedy Intl                | New York      | NULL     | USA        |
    | 6        | IAD     | Washington Dulles International    | Chantilly     | NULL     | USA        |
    | 7        | EWR     | Newark Intl                        | Newark        | NULL     | USA        |
    | 6        | PHL     | Philadelphia Intl                  | Philadelphia  | NULL     | USA        |
    | 1        | ORD     | Chicago O'Hare International       | Chicago       | NULL     | USA        |
    | 6        | ORD     | Chicago O'Hare International       | Chicago       | NULL     | USA        |
    | 7        | ATL     | William B Hartsfield-Atlanta Intl  | Atlanta       | NULL     | USA        |
    | 12       | MDW     | Chicago Midway                     | Chicago       | NULL     | USA        |
    +----------+---------+------------------------------------+---------------+----------+------------+
    10 rows selected (103.812 seconds)
  7. Split the partitioned data into manageable files.
    ALTER TABLE airports SET PARTITION SPEC (iata,BUCKET(15, country));