Automate partition discovery and repair

Hive can automatically and periodically discover discrepancies in partition metadata in the Hive metastore and in corresponding directories, or objects, on the file system. After discovering discrepancies, Hive performs synchronization. Automated partition discovery is useful for processing log data, and other data, in Spark and Hive catalogs.

The discover.partitions table property enables and disables synchronization of the file system with partitions. In external partitioned tables, this property is enabled (true) by default when you create the table. To a legacy external table (created using a version of Hive that does not support this feature), you need to add discover.partitions to the table properties to enable partition discovery.

By default, the discovery and synchronization of partitions occurs every 5 minutes, but you can configure the frequency as shown in this task.

Enable compaction (see link below) as a workaround to the known issue that discovery does not begin unless compaction is enabled.
  1. Assuming you have an external table created using a version of Hive that does not support partition discovery, enable partition discovery for the table.
    ALTER TABLE exttbl SET TBLPROPERTIES ('discover.partitions' = 'true');
  2. Set synchronization of partitions to occur every 10 minutes expressed in seconds: Set metastore.partition.management.task.frequency to 600.
    ALTER TABLE exttbl SET TBLPROPERTIES ('metastore.partition.management.task.frequency' = 600);