Query runs slowly
Impala queries can run slowly for a number of reasons, which are explained in this topic.
By: Manish Maheshwari, Data Architect and Data Scientist at Cloudera, Inc.
The top reasons that cause queries to run slowly are:
Missing load
Error:
Metadata load finished. loaded-tables=1/1 load-requests=1 catalog-updates=3: 2.75s (2746369188
Description and cause:
If Impala does not have the metadata of a table cached in the catalog daemon (catalogd
), queries run slowly
Solution:
To avoid these situations, make sure that tables are refreshed in ETL pipelines and that you are using the on-demand metadata feature described in On-demand metadata and metadata management.
Missing statistics
Error:
WARNING: The following tables are missing relevant table and/or column statistics. Default.web_logs
Description and cause:
Missing statistics cause wrong join types, for example a hash join rather than a broadcast join. It can also cause wrong join order. Both conditions cause queries to run slower than optimal.
Solution:
Run COMPUTE STATS
for each table involved in the query
and then rerun the query. See COMPUTE STATS Statement for more information.