Scaling Limits and Guidelines
Consider and respect the scalability limitations provided in this topic in order to achieve optimal scalability and performance in Impala. For example, while you might be able to create a table with 2000 columns, you will experience performance problems while querying the table. This topic does not cover functional limitations in Impala.
Unless noted otherwise, the limits listed in this topic were tested and certified.
The limits noted as "generally safe" are not certified, but recommended as generally safe. A safe range is not a hard limit as unforeseen errors or troubles in your particular environment can affect the range.
- Number of Impalad Executors: 500 nodes depending on use case
- Number of Impalad Coordinators: 1 coordinator for at most every 50 executors
Data Storage Limits
There are no hard limits for the following, but you will experience gradual performance degradation as you increase these numbers.
- Number of databases
- Number of tables - total, per database
- Number of partitions - total, per table
- Number of files - total, per table, per table per partition
- Number of views - total, per database
- Number of user-defined functions - total, per database
- Number of columns per row group
- Number of row groups per block
- Number of HDFS blocks per file
Schema Design Limits
- Number of columns
- 300 for Kudu tables
- 1000 for other types of tables
- Number of roles: 10,000
Query Limits - Compile Time
- Maximum number of columns in a query, included in a
INSERT, and in an expression: no limit
- Number of tables referenced: no limit
- Number of plan nodes: no limit
- Number of plan fragments: no limit
- Depth of expression tree: 1000 hard limit
- Width of expression tree: 10,000 hard limit
- Codegen: Very deeply nested expressions within queries can exceed
internal Impala limits, leading to excessive memory usage. Setting the
disable_codegen=truemay reduce the impact, at a cost of longer query runtime.