Query runs slowly

Impala queries can run slowly for a number of reasons, which are explained in this topic.

By: Manish Maheshwari, Data Architect and Data Scientist at Cloudera, Inc.

The top reasons that cause queries to run slowly are:

Missing load

Error:

Metadata load finished. loaded-tables=1/1 load-requests=1 catalog-updates=3: 2.75s (2746369188

Description and cause:

If Impala does not have the metadata of a table cached in the catalog daemon (catalogd), queries run slowly

Solution:

To avoid these situations, make sure that tables are refreshed in ETL pipelines and that you are using the on-demand metadata feature described in On-demand metadata and metadata management.

Missing statistics

Error:

WARNING: The following tables are missing relevant table and/or column statistics. Default.web_logs

Description and cause:

Missing statistics cause wrong join types, for example a hash join rather than a broadcast join. It can also cause wrong join order. Both conditions cause queries to run slower than optimal.

Solution:

Run COMPUTE STATS for each table involved in the query and then rerun the query. See COMPUTE STATS Statement for more information.