Running Impala queries
You learn how to query Impala tables from the command line of your cluster. Before attempting to run queries, you must meet a few prerequisites.
- You have SSH access to the command line of the cluster.
- As administrator, you set up a Ranger policy for running SQL queries.
- In Environment, accept the default environment or select another environment.
- Click All Services.
-
Click Connect to Impala.
-
Copy the host name.
- Connect to the gateway node of the cluster using SSH.
-
On the command line of the cluster, type the connection string to connect to
Impala, substituting your JDBC String URL.
For example:
impala-shell -i demodh-manager0.cdpsaasd.eu55-rsdu.cloudera.site:443 --ssl --ldap --http_path=demodh/cdp-proxy-api/impala --protocol=hs2-http
Output looks something like this:Starting Impala Shell using LDAP-based authentication SSL is enabled. Impala server certificates will NOT be verified (set --ca_cert to change) LDAP password for max: …
- At the LDAP password prompt, enter your workload password.
Starting Impala Shell using LDAP-based authentication SSL is enabled. Impala server certificates will NOT be verified (set --ca_cert to change) LDAP password for max: … *********************************************************************************** Welcome to the Impala shell. (Impala Shell v4.0.0-SNAPSHOT (f4b9678) built on Wed Jan 26 04:52:34 UTC 2022) To see a summary of a query's progress that updates in real-time, run 'set LIVE_PROGRESS=1;'. *********************************************************************************** [demodh-manager0.cdpsaasd.eu55-rsdu.cloudera.site:443] default>
-
Run Impala queries.
For example:
show databases;
The output looks something like this:Query: show databases +-------------------------------------+----------------------------------------------+ | name | comment | +-------------------------------------+----------------------------------------------+ | _impala_builtins | System database for Impala builtin functions | | adaptertest | | | airlines_demo | | …
-
Query the airlines_demo data.
For example:
USE airlines_demo; SELECT f.month, a.iata, a.airport, a.city, a.state, a.country FROM flights f, airports a WHERE f.origin = a.iata GROUP BY f.month, a.iata, a.airport, a.city, a.state, a.country HAVING COUNT(*) > 10000 ORDER BY AVG(f.DepDelay) DESC LIMIT 10;
Output looks something like this:Query progress can be monitored at: https://demodh-coordinator0.cdpsaasd.eu55-rsdu.cloudera.site:25000/query_plan?query_id=624836910191fe2e:c195d9dd00000000 +-------+------+-----------------------------------+--------------+-------+---------+ | month | iata | airport | city | state | country | +-------+------+-----------------------------------+--------------+-------+---------+ | 12 | ORD | Chicago O'Hare International | Chicago | NULL | USA | | 6 | EWR | Newark Intl | Newark | NULL | USA | | 7 | JFK | John F Kennedy Intl | New York | NULL | USA | | 6 | IAD | Washington Dulles International | Chantilly | NULL | USA | | 7 | EWR | Newark Intl | Newark | NULL | USA | | 6 | PHL | Philadelphia Intl | Philadelphia | NULL | USA | | 1 | ORD | Chicago O'Hare International | Chicago | NULL | USA | | 6 | ORD | Chicago O'Hare International | Chicago | NULL | USA | | 7 | ATL | William B Hartsfield-Atlanta Intl | Atlanta | NULL | USA | | 12 | MDW | Chicago Midway | Chicago | NULL | USA | +-------+------+-----------------------------------+--------------+-------+---------+ Fetched 10 row(s) in 21.06s