HDP to CDP Private Cloud Base One Stage upgradePDF version

Merge Independent Hive and Spark Catalogs

If you upgraded to HDP 3.1.5 from an earlier version of HDP 3.x and did not convert independent catalogs to a shared catalog, you must do this before migrating tables to CDP.

In HDP 3.0 - 3.1.4, Spark and Hive use independent catalogs for accessing tables created using SparkSQL or Hive tables. A table created from Spark resides in the Spark catalog. A table created from Hive resides in the Hive catalog. Databases fall under the catalog namespace, similar to how tables belong to a database namespace. In HPD 3.1.5, Spark and Hive share a catalog in Hive metastore (HMS) instead of using separate catalogs.

The Apache Hive schematool in HDP 3.1.5 and CDP releases supports the mergeCatalog task. This task performs the following actions:
  • Detects conflicts in DB names across catalogs and in case of conflicts, lists each conflict, and exits.
  • When there are no conflicts, the following changes occur:
    • Adds EXTERNAL=true to the table properties of all managed tables in the source catalog.
    • Adds external.table.purge=true in table properties of all managed tables in the source catalog.
    • Sets tableType=EXTERNAL for all managed tables in the source catalog.
    • Sets CTLG_NAME=<toCatalog> for all databases in the source catalog.
Use the following syntax to merge the databases in the catalog named spark into a catalog named hive, which is the default catalog for HiveServer (HS2).
schematool -dbType <database> -mergeCatalog spark -toCatalog hive [-verbose] [-dryRun] 
                

The default names of the catalogs are spark and hive. The dryRun option rolls back the changes.

To merge catalogs:

  1. On the operating system command line, run a Hive schematool query test, using the dryRun option to roll back changes. For example:
    bin/schematool -mergeCatalog spark -toCatalog hive -verbose  -dbType mysql --dryRun                    
  2. Check the output, and if there are no conflicts, merge catalogs. For example:
    bin/schematool -mergeCatalog spark -toCatalog hive -verbose  -dbType mysql