Running the Hive Upgrade Check tool

The Hive Strict Metastore Migration uses the public Hive Thrift API to materialize every table to determine if it needs to be upgraded. That process is very time consuming. If you are expediting the Hive upgrade process and modified the upgrade process to skip materialzing every table in the metastore, you need to identify databases and tables that are subject to the upgrade process and run HSMM on them or run provided scripts.

You use the Hive Upgrade Check community tool to help you identify tables that have problems affecting migration. You resolve problems revealed by the Hive Upgrade Check tool to clean up the Hive Metastore before migration. If you do not want to use the Hive Upgrade Check tool, you need to perform the tasks described in the following subtopics to migrate Hive data to CDP:
  • Check SERDE Definitions and Availability
  • Handle Missing Table or Partition Locations
  • Manage Table Location Mapping
  • Make Tables SparkSQL Compatible
You modifed the Hive Strict Metastore Migration to skip processing Hive tables in your databases and then completed the upgrade process to CDP.
  1. Obtain the Hive Upgrade Check tool.
    Download the Hive Upgrade Check tool from the Community-based github location.
  2. Follow instructions in the github readme to run the tool.
    The Hive Upgrade Check (v.2.3.5.6+) will create a yaml file (hsmm_whitelist.yaml) identifying databases and tables that require attention.
  3. Do what the Hive Upgrade Check tool tells you to do.
    At a minimum, you must run the following processes described in the github readme:
    • process ID 1 Table / Partition Location Scan - Missing Directories
    • process id 3 Hive 3 Upgrade Checks - Managed Non-ACID to ACID Table Migrations