Running Hive queries

You learn how to query Hive tables from the command line of your cluster. Before attempting to run queries, you must meet a few prerequisites.

  • You have SSH access to the command line of the cluster.
  • As administrator, you set up a Ranger policy for running SQL queries.
  1. In Environment, accept the default environment or select another environment.
  2. Click All Services.
  3. Click Connect to Hive.
  4. Copy the JDBC String URL.
  5. Connect to the gateway node of the cluster using SSH.
  6. On the command line of the cluster, start Beeline.
    For example:
    $ beeline
  7. At the Beeline prompt, enter ! connect, a space, and then paste the JDBC String URL.
    ! connect jdbc:hive2://demodh-manager0.cdpsaasd.eu55-rsdu.cloudera.site/;ssl=true;transportMode=http;httpPath=demodh/cdp-proxy-api/hive
  8. At the username prompt enter your workload user name, and then enter your workload password.
    The Hive prompt appears when the connection succeeds:
    Connected to: Apache Hive (version 3.1.3000.7.2.12.4-1)
    Driver: Hive JDBC (version 3.1.3000.7.2.12.4-1)
    Transaction isolation: TRANSACTION_REPEATABLE_READ
    0: jdbc:hive2://demodh-manager0.cdpsaasd.eu55>
  9. Run Hive queries.
    For example:
    show databases;           
    The output looks something like this:
    +--------------------------------------+
    |            database_name             |
    +--------------------------------------+
    | airlines_demo                        |                          
    | default                              |                             
    | information_schema                   |                           
    | sys                                  |
    +--------------------------------------+
  10. Query the airlines_demo data:
    For example:
    USE airlines_demo;
    
    SELECT f.month, a.iata, a.airport, a.city, a.state, a.country
    FROM flights f,
    airports a
    WHERE f.origin = a.iata
    GROUP BY 
    f.month,
    a.iata,
    a.airport,
    a.city,
    a.state,
    a.country
    HAVING COUNT(*) > 10000
    ORDER BY AVG(f.DepDelay) DESC
    LIMIT 10;
    The output looks something like this:
    +----------+---------+---------------------------+---------------+----------+------------+
    | f.month  | a.iata  |             a.airport     |    a.city     | a.state  | a.country  |
    +----------+---------+---------------------------+---------------+----------+------------+
    | 12       | ORD     | Chicago O'Hare            | Chicago       | NULL     | USA        |
    | 6        | EWR     | Newark Intl               | Newark        | NULL     | USA        |
    | 7        | JFK     | John F Kennedy Intl       | New York      | NULL     | USA        |
    | 6        | IAD     | Washington Dulles         | Chantilly     | NULL     | USA        |
    | 7        | EWR     | Newark Intl               | Newark        | NULL     | USA        |
    | 6        | PHL     | Philadelphia Intl         | Philadelphia  | NULL     | USA        |
    | 1        | ORD     | Chicago O'Hare            | Chicago       | NULL     | USA        |
    | 6        | ORD     | Chicago O'Hare            | Chicago       | NULL     | USA        |
    | 7        | ATL     | William B Hartsfield-Atlanta | Atlanta       | NULL     | USA     |
    | 12       | MDW     | Chicago Midway               | Chicago       | NULL     | USA     |
    +----------+---------+------------------------------+---------------+----------+---------+
    10 rows selected (103.812 seconds)