Apache Impala Reference
Performance Considerations
Performance Best Practices
Query Join Performance
Table and Column Statistics
Generating Table and Column Statistics
Runtime Filtering
Distribute Runtime Filter Aggregation
Skip Scheduling Bloom Filter
Min/Max Filtering
Bloom Filtering
Late Materialization of Columns
Partitioning
Partition Pruning for Queries
Understanding Performance using EXPLAIN Plan
Understanding Performance using SUMMARY Report
Understanding Performance using Query Profile
Planner changes for CPU usage
Planner changes to improve cardinality estimation
Caching Codegen Functions
Scalability Considerations
Scaling Limits and Guidelines
Hadoop File Formats Support
Using Text Data Files
Using Parquet Data Files
Using ORC Data Files
Using Avro Data Files
Using RCFile Data Files
Using SequenceFile Data Files
Storage Systems Supports
Impala with Azure Data Lake Store (ADLS)
Impala with Amazon S3
Specifying Impala Credentials to Access S3
Configure Impala Daemon to spill to S3
Ports Used by Impala
Transactions