Monitoring Impala Queries

The Impala Queries page displays information about the Impala queries that are running and have run in your cluster. You can filter the queries by time period and by specifying simple filtering expressions.

Contents:

Viewing Queries
Configuring Impala Query Monitoring
Impala Best Practices
Queries List
Filtering Queries

Viewing Queries

Do one of the following:
- Select Clusters > Cluster name > Activities > Impala service name Queries.
- On the Home > Status tab, select Impala service name and click the Queries tab.
The Impala queries run during the last day are listed in the Queries list.

Configuring Impala Query Monitoring

You can configure the visibility of the Impala query results and the size of the storage allocated to Impala query results.

For information on how to configure whether admin and non-admin users can view all queries, only that user's queries, or no queries, see Configuring Query Visibility.

Query information is stored in-memory in a ring buffer. This has two consequences: if you restart Service Monitor, all queries are lost and older queries eventually will be dropped. For information on how to configure the query store, see Configuring Impala Query Data Store Maximum Size.

Impala Best Practices

To open the Impala Best Practices page, click the Best Practices tab on the Impala service page. The page contains charts to help identify whether Impala best practices are being followed. See the individual charts for a descriptions of each best practice and how to determine if it is being followed. Consult the Impala documentation for further detail on each best practice and for additional best practices.

Adjust the time range to see data on queries run at different times. Click on the charts to get more detail on individual queries. Use the filter box at the top right of the Best Practices page to adjust what data is shown on the page. For example, to see just the queries that took more than ten seconds, make the filter query_duration > 10s.

Create a trigger based on any best practice by choosing Create Trigger from the individual chart drop down menu.

Queries List

Queries appear in the list with the most recent at the top. Each query has summary and detail information. A query summary includes the following default attributes: start and end timestamps, statement, duration, rows produced, user, coordinator, database, and query type. For example:

You can add additional attributes to the summary with the Attribute Selector. In each query summary, the query statement is truncated if it is too long to display. To display the entire statement, hover over a query. The query entry will expand to display the entire query string. To collapse the query display, move the mouse cursor. To display information about query attributes and possible values, hover over a field in a query. For example:

A running query displays under the start timestamp. If an error occurred while processing the query, displays under the complete timestamp.

To kill a running query, select Actions > Cancel. Only an administrator can cancel queries and killing a query creates an audit event. When you cancel a query, replaces the label. Once the page is refreshed, the entry is removed from the list.

When a job fails, asdf replaces the label.

To display query details, select Actions > Details.

To display all the queries run by the same user, select Actions > User's Impala queries.

To display all the queries that used the same resource pool, select Actions > Queries in the same YARN pool.

Filtering Queries

You filter queries by selecting a time range and specifying a filter expression in the text box.

You can use the Time Range Selector or a duration link ( ) to set the time range. (See Time Line for details).

Filter Expressions

Filter expressions specify which entries should display when you run the filter. The simplest expression is made up of three components:

Attribute - the query language name of the attribute.
Operator - the type of comparison between the attribute and the attribute value. Cloudera Manager supports the standard comparator operators: =, !=, >, <, >=, <=, and RLIKE, which does regular expression matching as specified in the Java Pattern class documentation. Numeric values can be compared with all operators. String values can be compared with =, !=, and RLIKE. Boolean values can be compare with = and !=.
Value - the value of the attribute. The value depends on the type of the attribute. For a Boolean value, specify either true or false. When specifying a string value, enclose the value in double quotes.

You create compound filter expressions using the AND and OR operators. When more than one operator is used in an expression, AND is evaluated first, then OR. To change the order of evaluation, surround subexpressions with parentheses.

Compound Expressions

To find all the queries issued by the root user that produced over 100 rows, use the expression:

user = "root" AND rowsProduced > 100

To find all the executing queries issued by users Jack or Jill, use the expression:

executing = true AND (user = "Jack" OR user = "Jill")

Choosing and Running a Filter

Do one of the following:
- Select a Suggested or Recently Run Filter
  1. Click the to the right of the Search button to display a list of sample and recently run filters, and select a filter. The filter text displays in the text box.
- Construct a Filter from Attribute Histograms
  1. Optionally, click the Select Attributes link to display a dialog where you can chose which attributes to display in histograms. Check the checkbox next to one or more attributes, and click Close.
  2. Click the Enhance Filter link. Histograms of the selected attributes display with the number of results that match each value of the selected attributes.
  3. Click a histogram bar that represents the range of attribute values to filter on. The color of the histogram bar gets lighter (on the right below)
    
    and a filter with the attribute value set to the range of the histogram bucket is added to the text box. The range includes the lower bound of the bucket and excludes the upper bound of the bucket. For example:
```
(<x>_duration >= 17600.0 AND <x>_duration < 18000.0)
```
    where <x> is query or application.
    If you click the same histogram bar again, the color reverts to the darker blue and the filter is removed from the text box.
    If you click another histogram bar, another filter is OR'd with the existing filter:
```
(<x>_duration >= 17600.0 AND <x>_duration < 18000.0 OR <x>_duration >= 16000.0 AND <x>_duration < 16400.0)
```
- Type a Filter
  1. Start typing or press Spacebar in the text box. As you type, filter attributes matching the letter you type display. If you press Spacebar, standard filter attributes display. These suggestions are part of typeahead, which helps build valid queries. For information about the attribute name and supported values for each field, hover over the field in an existing query.
  2. Select an attribute and press Enter.
  3. Press Spacebar to display a drop-down list of operators.
  4. Select an operator and press Enter.
  5. Specify an attribute value in one of the following ways:
    - For attribute values that support typeahead, press Spacebar to display a drop-down list of values and press Enter.
    - Type a value.
Put the cursor on the text box and press Enter or click Search. The list displays the results that match the specified filter. If the histograms are showing, they are redrawn to show only the values for the selected filter. The filter is added to the Recently Run list.

Example: Drilling into Query Results

Suppose we have a set of results with the following duration distribution :

The 0-20.00s bucket has 7 results, but with the current distribution we cannot discriminate between the results in that bucket.

Selecting the left-most histogram bar adds the following filter to the text box:

<x>_duration >= 0.0 AND <x>_duration < 20000.0

where <x> is query or application. After clicking Search again, the histogram appears as follows:

Selecting the histogram with 5 results again refines the filter to:

<x>_duration >= 16000.0 AND <x>_duration < 18000.0

After clicking Search again, the histogram appears as follows:

Filter Attributes

The available filter attributes, their names as they are displayed in Cloudera Manager, their types, and descriptions, are enumerated below.

Attribute	Display Name	Value Type	Description
`bytes_streamed`	Bytes Streamed	BYTES	The total number of bytes sent between Impala daemons while processing the query.
`coordinator_host_id`	Coordinator	STRING	The host coordinating the query.
`database`	Database	STRING	The database on which the query was run.
`ddl_type`	DDL Type	STRING	The type of DDL query.
`executing`	Executing	BOOLEAN	Whether the query is currently executing.
`file_formats`	File Formats	STRING	The file formats used in the query. A file format is a string of the form: `File Type`/`Compression Type`, where `File Type` can take the values: `TEXT`, `PARQUET`, `SEQUENCE_FILE`, and `RC_FILE`, and `Compression Type` can take the values: `NONE`, `DEFAULT`, `BZIP2`. For further information, see How Impala Works with Hadoop File Formats.
`hbase_bytes_read`	HBase Bytes Read	BYTES	The total number of bytes read from HBase by the query.
`hbase_bytes_read_per_second`	HBase Read Throughput	BYTES_PER_SECOND	The overall HBase read throughput (in B/s) of the query.
`hdfs_bytes_read`	HDFS Bytes Read	BYTES	The total number of bytes (in GiB) read from HDFS by the query.
`hdfs_bytes_read_local`	HDFS Local Bytes Read	BYTES	The total number of local bytes read (in GiB) from HDFS by the query.
`hdfs_bytes_read_local_percentage`	HDFS Local Bytes Read Percentage	NUMBER	The percentage of all bytes read from HDFS by the query that were local.
`hdfs_bytes_read_per_second`	HDFS Read Throughput	BYTES_PER_SECOND	The overall HDFS read throughput (in B/s) of the query.
`hdfs_bytes_read_remote`	HDFS Remote Bytes Read	BYTES	The total number of remote bytes read from HDFS by this query.
`hdfs_bytes_read_remote_percentage`	HDFS Remote Bytes Read Percentage	NUMBER	The percentage of all bytes read from HDFS by this query that were remote.
`hdfs_bytes_read_short_circuit`	HDFS Short Circuit Bytes Read	BYTES	The total number of bytes (in GiB) read from HDFS by the query that used short-circuit reads.
`hdfs_bytes_read_short_circuit_percentage`	HDFS Short Circuit Bytes Read Percentage	NUMBER	The percentage of all bytes (in GiB) read from HDFS by the query that used short-circuit reads.
`hdfs_bytes_skipped`	HDFS Bytes Skipped	BYTES	The total number of bytes that had to be skipped by the query while reading from HDFS. Any number above zero may indicate a problem.
`memory_accrual`	Memory Accrual	BYTE_SECONDS	The total accrued memory usage by the query. This is computed by multiplying the average aggregate memory usage of the query by the query's duration.
`memory_aggregate_peak`	Aggregate Peak Memory Usage	BYTES	The highest amount of memory allocated by this query at a particular time across all nodes.
`memory_per_node_peak`	Per Node Peak Memory Usage	BYTES	The highest amount of memory allocated by any single node that participated in this query. See Node With Peak Memory Usage for the name of the peak node.
`memory_per_node_peak_node`	Node With Peak Memory Usage	STRING	The node with the highest peak memory usage for this query.
`network_address`	Network Address	STRING	The network address that issued this query.
`pool`	Pool	STRING	The name of the YARN pool to which this query was issued. Within YARN, a pool is referred to as a queue.
`pool_wait_time`	Pool Wait Time	MILLISECONDS	The total amount of time the query spent waiting for pool resources to become available.
`query_duration`	Duration	MILLISECONDS	The duration of the query in milliseconds.
`query_id`	Query ID	STRING	The ID of the query.
`query_state`	Query State	STRING	The current state of the query: `CREATED`, `INITIALIZED`, `COMPILED`, `RUNNING`, `FINISHED`, `UNKNOWN`, `EXCEPTION`. If the query has failed or been canceled, `queryState` will be `EXCEPTION`.
`query_status`	Query Status	STRING	The status of the query. If the query failed, `queryStatus` will contain diagnostic info such as `Memory limit exceeded`, `Failed to write row ...`. If canceled, `queryStatus` is `Canceled`. Otherwise, `queryStatus` is `OK`.
`query_type`	Query Type	STRING	The type of the query's SQL statement: `DML`, `DDL`, `QUERY`, `UNKNOWN`.
`rows_produced`	Rows Produced	NUMBER	The number of rows returned by the query.
`service_name`	Service Name	STRING	The name of the Impala service.
`session_id`	Session ID	STRING	The ID of the session that issued this query.
`session_type`	Session Type	STRING	The type of the session that issued this query.
`stats_missing`	Stats Mission	BOOLEAN	Whether the query was flagged with a missing table or column statistics warning during the planning process.
`statement`	Statement	STRING	The query's SQL statement.
`thread_cpu_time`	Threads: CPU Time	MILLISECONDS	The sum of the CPU time used by all threads of the query.
`thread_cpu_time_percentage`	Threads: CPU Time Percentage	NUMBER	The sum of the CPU time used by all threads of the query divided by the total thread time.
`thread_network_wait_time`	Threads: Network Wait Time	MILLISECONDS	The sum of the time spent waiting for the network by all threads of the query.
`thread_network_wait_time_percentage`	Threads: Network Wait Time Percentage	NUMBER	The sum of the time spent waiting for the network by all threads of the query divided by the total thread time.
`thread_storage_wait_time`	Threads: Storage Wait Time	MILLISECONDS	The sum of the time spent waiting for storage by all threads of the query.
`thread_storage_wait_time_percentage`	Threads: Storage Wait Time Percentage	NUMBER	The sum of the time spent waiting for storage by all threads of the query divided by the total thread time.
`thread_total_time`	Threads: Total Time	MILLISECONDS	The sum of thread CPU, storage wait, and network wait times used by all threads of the query.
`user`	User	STRING	The user who issued the query.

Examples

Consider the following filter expressions: user = "root", rowsProduced > 0, fileFormats RLIKE ".TEXT.*", and executing = true. In the examples:

The filter attributes are user, rowsProduced, fileFormats, and executing.
The operators are =, >, and RLIKE.
The filter values are root, 0, .TEXT.*, and true.

Viewing the Distribution of Task Attempts

Query Details