Partition refresh and configuration
You can discover partition changes and synchronize Hive metadata automatically. Performing synchronization automatically as opposed to manually can save substantial time, especially when partitioned data, such as logs, changes frequently. You can also configure how long to retain partition data and metadata.
After creating a partitioned table, Hive does not update metadata about corresponding objects or directories on the file system that you add or drop. The partition metadata in the Hive metastore becomes stale after corresponding objects/directories are added or deleted. You need to synchronize the metastore and the file system.
- Manually
You run the MSCK (metastore consistency check) Hive command:
MSCK REPAIR TABLE table_name SYNC PARTITIONS
every time you need to synchronize a partition with the file system. - Automatically
You set up partition discovery to occur periodically.
discover.partitions
is enabled for a table, Hive performs an automatic refresh as follows: - Adds corresponding partitions that are in the file system, but not in metastore, to the metastore.
- Removes partition schema information from metastore if you removed the corresponding partitions from the file system.
Partition retention
You can configure how long to keep partition metadata and data, and remove it after the retention period elapses.Limitations
Generally, partition discovery and retention is not recommended for use on managed tables. The Hive metastore acquires an exclusive lock on a table that enables partition discovery that can slow down other queries.