Preparing tables for migration

You download the Hive Upgrade Check tool and use it to identify problems in unmigrated tables. These problems can cause upgrade failure. It saves time to fix the problems and avoid failure. The tool provides help for fixing those problems before migrating the tables to CDP.

You use the Hive Upgrade Check community tool to help you identify tables that have problems affecting migration. You resolve problems revealed by the Hive Upgrade Check tool to clean up the Hive Metastore before migration. If you do not want to use the Hive Upgrade Check tool, you need to perform the tasks described in the following subtopics to migrate Hive data to CDP:
  • Check SERDE Definitions and Availability
  • Handle Missing Table or Partition Locations
  • Manage Table Location Mapping
  • Make Tables SparkSQL Compatible
  1. Obtain the Hive Upgrade Check tool.
    Download the Hive SRE Upgrade Check tool from the Cloudera labs github location.
  2. Follow instructions in the github readme to run the tool.
    The Hive Upgrade Check (v.2.3.5.6+) will create a yaml file (hsmm_<name>.yaml) identifying databases and tables that require attention.
  3. Follow instructions in prompts from the Hive Upgrade Check tool to resolve problems with the tables.
    At a minimum, you must run the following processes described in the github readme:
    • process ID 1 Table / Partition Location Scan - Missing Directories
    • process id 3 Hive 3 Upgrade Checks - Managed Non-ACID to ACID Table Migrations