Partitioned tables

Partitioning organizes table data and can improve query performance of low volume data. You can query slices of the data instead of scanning the entire table.

You should avoid creating many small partitions. You can partition managed and external tables. You use the PARTITIONED BY clause to create a partitioned table and follow step-by-step instructions to insert data into the partitions. You can put files, such as CSV (comma-separated-values) files, that contain the data in directories that represent partitions and create external tables based on the CSV data.

Under certain conditions, you must manually repair metadata about Hive or Impala partitions that resides in the metastore to keep changes to partitions in sync with the metadata. You learn when and how to do this.