Scaling Limits and Guidelines

Consider and respect the scalability limitations provided in this topic in order to achieve optimal scalability and performance in Impala. For example, while you might be able to create a table with 2000 columns, you will experience performance problems while querying the table. This topic does not cover functional limitations in Impala.

Unless noted otherwise, the limits listed in this topic were tested and certified.

The limits noted as "generally safe" are not certified, but recommended as generally safe. A safe range is not a hard limit as unforeseen errors or troubles in your particular environment can affect the range.

Deployment Limits

  • Number of Impalad Executors: 150 nodes
  • Number of Impalad Coordinators: 1 coordinator for at most every 50 executors

Data Storage Limits

There are no hard limits for the following, but you will experience gradual performance degradation as you increase these numbers.

  • Number of databases
  • Number of tables - total, per database
  • Number of partitions - total, per table
  • Number of files - total, per table, per table per partition
  • Number of views - total, per database
  • Number of user-defined functions - total, per database
  • Parquet
    • Number of columns per row group
    • Number of row groups per block
    • Number of HDFS blocks per file

Schema Design Limits

  • Number of columns
    • 300 for Kudu tables
    • 1000 for other types of tables

Security Limits

  • Number of roles: 10,000

Query Limits - Compile Time

  • Maximum number of columns in a query, included in a SELECT list, INSERT, and in an expression: no limit
  • Number of tables referenced: no limit
  • Number of plan nodes: no limit
  • Number of plan fragments: no limit
  • Depth of expression tree: 1000 hard limit
  • Width of expression tree: 10,000 hard limit
  • Codegen: Very deeply nested expressions within queries can exceed internal Impala limits, leading to excessive memory usage. Setting the query option disable_codegen=true may reduce the impact, at a cost of longer query runtime.