Scaling Limits and Guidelines
This topic lists the scalability limitation in Impala. For a given functional feature, it is recommended that you respect these limitations to achieve optimal scalability and performance. For example, while you might be able to create a table with 2000 columns, you will experience performance problems while querying the table. This topic does not cover functional limitations in Impala.
Unless noted otherwise, the limits were tested and certified.
The limits noted as "generally safe" are not certified, but recommended as generally safe. A safe range is not a hard limit as unforeseen errors or troubles in your particular environment can affect the range.
- Number of Impalad Executors
- 80 nodes in CDH 5.14 and lower
- 150 nodes in CDH 5.15 and higher
- Number of Impalad Coordinators: 1 coordinator for at most every 50 executors
See Dedicated Coordinators for details.
- The number of Impala clusters per deployment
- 1 Impala cluster in Impala 3.1 and lower
- Multiple clusters in Impala 3.2 and higher is generally safe.
Data Storage Limits
There are no hard limits for the following, but you will experience gradual performance degradation as you increase these numbers.
- Number of databases
- Number of tables - total, per database
- Number of partitions - total, per table
- Number of files - total, per table, per table per partition
- Number of views - total, per database
- Number of user-defined functions - total, per database
- Number of columns per row group
- Number of row groups per block
- Number of HDFS blocks per file
Schema Design Limits
- Number of columns
- 300 for Kudu tables
See Kudu Usage Limitations for more information.
- 1000 for other types of tables
- 300 for Kudu tables
- Number of roles: 10,000 for Sentry
Query Limits - Compile Time
- Maximum number of columns in a query, included in a SELECT list, INSERT, and in an expression: no limit
- Number of tables referenced: no limit
- Number of plan nodes: no limit
- Number of plan fragments: no limit
- Depth of expression tree: 1000 hard limit
- Width of expression tree: 10,000 hard limit
Query Limits - Runtime Time
- Very deeply nested expressions within queries can exceed internal Impala limits, leading to excessive memory usage. Setting the query option disable_codegen=true may reduce the impact, at a cost of longer query runtime.