Troubleshooting Impala
This topic describes the general troubleshooting procedures to diagnose some of the commonly encountered issues in Impala.
Symptom | Explanation | Recommendation |
---|---|---|
Impala takes a long time to start. | Impala instances with large numbers of tables, partitions, or data files take longer to start because the metadata for these objects is broadcast to all impalad nodes and cached. | Adjust timeout and synchronicity settings. |
Joins fail to complete. |
There may be insufficient memory. During a join, data from the second, third, and so on sets to be joined is loaded into memory. If Impala chooses an inefficient join order or join mechanism, the query could exceed the total memory available. |
Start by gathering statistics with the Consider specifying the If tuning at the SQL level is not sufficient, add more memory to your system or join smaller data sets. |
Queries return incorrect results. |
Impala metadata may be outdated after changes are performed in Hive. |
After inserting data, adding a partition, or other operation
in Hive, refresh the metadata for the table with the
REFRESH statement. |
Attempts to complete Impala tasks such as executing
|
This can be the result of permissions issues. For example, you
could use the Hive shell as the hive user to create a table.
After creating this table, you could attempt to complete some
action, such as an |
Ensure the Impala user has sufficient permissions to the table that the Hive user created. |
Impala fails to start up, with the impalad
logs referring to errors connecting to the
|
A large number of databases, tables, partitions, and so on can
require metadata synchronization, particularly on startup, that
takes longer than the default timeout for the
|
Configure the statestore timeout value and
possibly other settings related to the frequency of
statestore updates and metadata loading.
|