Apache Impala Reference
Performance Considerations
Performance Best Practices
Query Join Performance
Table and Column Statistics
Generating Table and Column Statistics
Runtime Filtering
Min/Max Filtering
Bloom Filtering
Late Materialization of Columns
Partitioning
Partition Pruning for Queries
HDFS Caching
HDFS Block Skew
Understanding Performance using EXPLAIN Plan
Understanding Performance using SUMMARY Report
Understanding Performance using Query Profile
Planner changes for CPU usage
DDL Bucketed Tables
Scalability Considerations
Scaling Limits and Guidelines
Dedicated Coordinator
Hadoop File Formats Support
Using Text Data Files
Using Parquet Data Files
Using ORC Data Files
Using Avro Data Files
Using RCFile Data Files
Using SequenceFile Data Files
Storage Systems Supports
Impala with HDFS
Configure Impala Daemon to spill to HDFS
Impala with Kudu
Configuring for Kudu Tables
Impala DDL for Kudu
Partitioning for Kudu Tables
Creating External Table
Impala DML for Kudu Tables
Impala with HBase
Impala with Azure Data Lake Store (ADLS)
Impala with Amazon S3
Specifying Impala Credentials to Access S3
Impala with Ozone
Configure Impala Daemon to spill to Ozone
Ports Used by Impala
Migration Guide
Setting up Data Cache for Remote Reads
Managing Metadata in Impala
On-demand Metadata
Automatic Invalidation of Metadata Cache
Automatic Invalidation/Refresh of Metadata
Transactions