Using Apache Hive
Also available as:
PDF

Manage partition retention time

You can keep the size of the Hive metadata and data you accumlate for log processing, and other activities, to a manageable size by setting a retention period for the data.

The table must be configured to automatically synchronize partition metadata with directories or objects on a file system, such as HDFS or S3.
If you specify a partition metadata retention period, Hive drops the metadata and corresponding data in any partition created after the retention period. You express the retention time using a numeral and the following character or characters:
  • ms (milliseconds)
  • s (seconds)
  • m (minutes)
  • d (days)

In this task, you configure automatic synchronization of the file system partitions with the metastore and a partition retention period. Assume you already created a partitioned, external table named employees as described earlier (see link below).

  1. If necessary, enable automatic discovery of partitions for the table employees.
    ALTER TABLE employees SET TBLPROPERTIES ('discover.partitions'='true'); 
    By default, external partitioned tables already set this table property to true.
  2. Configure a partition retention period of one week.
    ALTER TABLE employees SET TBLPROPERTIES ('partition.retention.period'='7d');
    The partition metadata as well as the actual data for employees in Hive is automatically dropped after a week.