Managing partition retention time

You can keep the size of the Apache Hive metadata and data you accumulate for log processing, and other activities, to a manageable size by setting a retention period for the data.

The table must be configured to automatically synchronize partition metadata with directories or objects on a file system.
When a partition metadata retention period is specified, Hive will drop the metadata and corresponding data for partitions created after the retention period, and this action will also apply to existing partitions. You express the retention time using a numeral and the following character or characters:
  • ms (milliseconds)
  • s (seconds)
  • m (minutes)
  • d (days)

In this task, you configure automatic synchronization of the file system partitions with the metastore and a partition retention period. Assume you already created a partitioned, external table named employees.

  1. If necessary, enable automatic discovery of partitions for the table employees.
    ALTER TABLE employees SET TBLPROPERTIES ('discover.partitions'='true'); 
    By default, external partitioned tables already set this table property to true.
  2. Configure a partition retention period of one week.
    ALTER TABLE employees SET TBLPROPERTIES ('partition.retention.period'='7d');
    The partition metadata as well as the actual data for employees in Hive is automatically dropped after a week.