Impala Performance Tuning Guidelines
Overview
Kudu RPC
Setting up dedicated coordinators
Load balancing for coordinators
On-demand metadata and metadata management
Enabling on-demand metadata fetch
Enabling release of stale metadata
Avoiding small files
Automatic metadata management
Coding in Spark for automatic metadata management
Manual metadata management
Admission control
Estimating memory limits
Resource pool design
Table and column statistics
Setting statistics manually
Tuning SQL queries
Recommended SET options for Impala
Recommended configurations
Using Impala with Hue
Using Impala with BI tools
Appropriate file formats
Partitioning granularity recommendations
Addressing hotspotting
Detecting block skews
Minimizing overhead when transmitting results to clients
Join query performance tuning
Query profiles
Execution summary
Query timeline
Common scenarios for debugging queries using query profiles
Memory limit exceeded
Query runs slowly
Admission control
Client fetch wait timer
Wrong join order
Time and data skews