Impala workload management

Learn how to enable Impala query logging in Cloudera Data Warehouse to track queries, analyze performance, and retain execution data for better insights.

Cloudera Data Warehouse provides you the option to enable logging Impala queries on an existing Virtual Warehouse or while creating a new Impala Virtual Warehouse. By logging the Impala queries in Cloudera Data Warehouse, you gain increased observability of the workloads running on Impala, which you can use to improve the performance of your Impala Virtual Warehouses.

This feature represents a significant enhancement to query profiling capabilities. You can have Impala archive crucial data from each query's profile into dedicated database tables known as the query history table and live query table. These tables are part of the sys database and are designed to store valuable information that can later be queried using any Impala client, providing a consolidated view of both actively running and previously executed queries.

The query history table, sys.impala_query_log proves particularly useful when dissecting workloads for in-depth analysis of query performance. Unlike the limitations associated with query profiles, which are only available to the client that initiated the query, the query history table offers a comprehensive solution for querying completed queries without the need to parse the text of each query profile. Additionally, the query history table provides a comprehensive view across all Impala coordinators.

The Impala query information is stored indefinitely in the sys.impala_query_log table whereas the sys.impala_query_live table reflects the in-memory state of all Impala coordinators. Actively running and recently completed queries are stored in this table. Data is removed from this table once the query finishes and is persisted in the sys.impala_query_log table or if the coordinator is restarted. Therefore, there is a possibility that some of the records could momentarily be duplicated in both these tables.

Since the sys.impala_query_live table is stored only in-memory, recently completed queries that are not yet persisted to the sys.impala_query_log table are lost if the coordinator crashes. However, if the coordinator is shut down gracefully, then the recently completed queries are stored in the sys.impala_query_log table and are not lost.

The <onlyCoordinators> element in Impala’s Admission Control restricts a request pool to coordinators only, excluding executors. This is mainly used for querying the sys.impala_query_live table. However, these pools can still run any query, potentially exhausting coordinator resources. Proper naming is important to avoid unintended query routing. For more information, see Apache Impala: onlyCoordinators.