Impala workload management
Learn how to enable Impala query logging in Cloudera Data Warehouse to track queries, analyze performance, and retain execution data for better insights.
Cloudera Data Warehouse provides you the option to enable logging Impala queries on an existing Virtual Warehouse or while creating a new Impala Virtual Warehouse. By logging the Impala queries in Cloudera Data Warehouse, you gain increased observability of the workloads running on Impala, which you can use to improve the performance of your Impala Virtual Warehouses.
This feature represents a significant enhancement to query profiling capabilities. You can have
Impala archive crucial data from each query's profile into dedicated database tables known as the
query history table and live query table. These tables are part of the sys
database and are designed to store valuable information that can later be queried using any
Impala client, providing a consolidated view of both actively running and previously executed
queries.
The query history table, sys.impala_query_log
proves particularly useful when
dissecting workloads for in-depth analysis of query performance. Unlike the limitations
associated with query profiles, which are only available to the client that initiated the query,
the query history table offers a comprehensive solution for querying completed queries without
the need to parse the text of each query profile. Additionally, the query history table provides
a comprehensive view across all Impala coordinators.
The Impala query information is stored indefinitely in the
sys.impala_query_log
table whereas the sys.impala_query_live
table reflects the in-memory state of all Impala coordinators. Actively running and recently
completed queries are stored in this table. Data is removed from this table once the query
finishes and is persisted in the sys.impala_query_log
table or if the
coordinator is restarted. Therefore, there is a possibility that some of the records could
momentarily be duplicated in both these tables.
Since the sys.impala_query_live
table is stored only in-memory, recently
completed queries that are not yet persisted to the sys.impala_query_log
table
are lost if the coordinator crashes. However, if the coordinator is shut down gracefully, then
the recently completed queries are stored in the sys.impala_query_log
table and
are not lost.
The <onlyCoordinators>
element in Impala’s Admission Control restricts a
request pool to coordinators only, excluding executors. This is mainly used for querying the
sys.impala_query_live
table. However, these pools can still run any query,
potentially exhausting coordinator resources. Proper naming is important to avoid unintended
query routing. For more information, see Apache Impala: onlyCoordinators.