Time and data skews
Skews in Avg Time and Max Time, which are listed in the Exec Summary section of the query profile, are possible due to either data skews or workload skews:
By: Manish Maheshwari, Data Architect and Data Scientist at Cloudera, Inc.
- Data skews can be fixed by running the HDFS balancer periodically and
then running
INVALIDATE METADATA
to rebalance the data. See HDFS Balancers and INVALIDATE METADATA Statement for more information. -
Workload skews can be addressed by ensuring that similar CPU and memory configurations are used on all the Impala daemons. To address CPU usage, ensure that similar hardware is used on nodes running Impala daemons. To address memory configurations use Cloudera Manager. From its home page select Impala Daemon Memory Limit configuration property:
. Then search for the
See Using the Query Profile for Performance Tuning for more information.