Using Apache Hive
Also available as:
PDF

Repair partitions using MSCK repair

The MSCK REPAIR TABLE command was designed to manually add partitions that are added to or removed from the file system, such as HDFS or S3, but are not present in the metastore.

This task assumes you created a partitioned external table named emp_part that stores partitions outside the warehouse. You remove one of the partition directories on the file system. This action renders the metastore inconsistent with the file system. You repair the discrepancy manually to synchronize the metastore with the file system, HDFS for example.

  1. List the partition directories on HDFS.
    hdfs dfs -ls /user/hive/dataload/employee
    Found 4 items
    drwxr-xr-x   - hive hdfs          0 2019-06-09 00:31 /user/hive/dataload/employee/dept=finance 
    drwxr-xr-x   - hive hdfs          0 2019-06-09 00:31 /user/hive/dataload/employee/dept=sales
    drwxr-xr-x   - hive hdfs          0 2019-06-09 00:31 /user/hive/dataload/employee/dept=service
    -rw-r--r--   3 hdfs hdfs        147 2019-06-08 23:54 /user/hive/dataload/employee/employees.csv
  2. Remove the dept=sales directory.
    hdfs dfs -rm -r /user/hive/dataload/employee/dept=sales;
    19/06/13 00:52:16 INFO fs.TrashPolicyDefault: Moved: 'hdfs://max.com:8020/user/hive/dataload/employee/dept=sales' to trash at:...
    This action also removes the data file 000000_0 in the dept=sales directory.
  3. Launch Beeline, and in the Hive shell, look at the emp_part table partitions.
    SHOW PARTITIONS emp_part;
    The list of partitions is stale; it still includes the dept=sales directory.
    +----------------+
    |   partition    |
    +----------------+
    | dept=finance   |
    | dept=sales     |
    | dept=service   |
    +----------------+
  4. Repair the partition manually.
    MSCK REPAIR TABLE emp_part DROP PARTITIONS;