Impala

The Impala service is a distributed, MPP database engine for interactive performance of SQL queries over large data sets. Impala performs best when it can operate on data in memory. Therefore, Impala is often configured with a very large heap size.

Impala daemon processes must be colocated with HDFS data nodes to use HDFS local reads, which also improve performance.

Impala does not provide any built-in load balancing, so a production Impala deployment should be deployed behind a load balancer for performance and high availability. For more information on configuring Impala with a load balancer, see Configuring Load Balancer for Impala.

Before deploying Impala in a production environment, review the Impala performance best practices.