Partitioning granularity recommendations
Following are recommendations for table partitioning granularity that provides the best performance in Impala.
- Choose a partitioning strategy that ensures there is at least 256 MB of data in each partition.
- Over-partitioning causes query planning to take longer than necessary because Impala prunes the unnecessary partitions, which results in small files in each partition.
- Cloudera recommends that you keep the number of partitions in tables under 30,000.
integerdata types for partition key columns:
- Partition key values are turned into HDFS directory names so you
can minimize memory usage by using numeric values for common partition
key fields such as
- Use the smallest
integerdata type that holds the appropriate range of values. Typically,
YEAR. Use the
EXTRACT()function to pull out individual date and time fields from a
CAST()the return value to the appropriate
- Partition key values are turned into HDFS directory names so you can minimize memory usage by using numeric values for common partition key fields such as