The MSCK REPAIR TABLE command was designed to manually add partitions that are added
to or removed from the file system, such as HDFS or S3, but are not present in the
metastore.
This task assumes you created a partitioned external table named
emp_part
that stores partitions outside the warehouse. You
remove one of the partition directories on the file system. This action renders the
metastore inconsistent with the file system. You repair the discrepancy manually to
synchronize the metastore with the file system, HDFS for example.
-
List the partition directories on HDFS.
hdfs dfs -ls /user/hive/dataload/employee
Found 4 items
drwxr-xr-x - hive hdfs 0 2019-06-09 00:31 /user/hive/dataload/employee/dept=finance
drwxr-xr-x - hive hdfs 0 2019-06-09 00:31 /user/hive/dataload/employee/dept=sales
drwxr-xr-x - hive hdfs 0 2019-06-09 00:31 /user/hive/dataload/employee/dept=service
-rw-r--r-- 3 hdfs hdfs 147 2019-06-08 23:54 /user/hive/dataload/employee/employees.csv
-
Remove the
dept=sales
directory.
hdfs dfs -rm -r /user/hive/dataload/employee/dept=sales;
19/06/13 00:52:16 INFO fs.TrashPolicyDefault: Moved: 'hdfs://max.com:8020/user/hive/dataload/employee/dept=sales' to trash at:...
This action also removes the data file 000000_0 in the dept=sales
directory.
-
Launch Beeline, and in the Hive shell, look at the
emp_part
table partitions.
SHOW PARTITIONS emp_part;
The list of partitions is stale; it still includes the dept=sales
directory.
+----------------+
| partition |
+----------------+
| dept=finance |
| dept=sales |
| dept=service |
+----------------+
-
Repair the partition manually.
MSCK REPAIR TABLE emp_part DROP PARTITIONS;