Time and data skews

Skews in Avg Time and Max Time, which are listed in the Exec Summary section of the query profile, are possible due to either data skews or workload skews:

By: Manish Maheshwari, Data Architect and Data Scientist at Cloudera, Inc.

  • Data skews can be fixed by running the HDFS balancer periodically and then running INVALIDATE METADATA to rebalance the data. See HDFS Balancers and INVALIDATE METADATA Statement for more information.
  • Workload skews can be addressed by ensuring that similar CPU and memory configurations are used on all the Impala daemons. To address CPU usage, ensure that similar hardware is used on nodes running Impala daemons. To address memory configurations use Cloudera Manager. From its home page select Impala > Configuration. Then search for the Impala Daemon Memory Limit configuration property:

See Using the Query Profile for Performance Tuning for more information.