Merge Independent Hive and Spark Catalogs

If you upgraded to HDP 3.1.5 from an earlier version of HDP 3.x and did not convert independent catalogs to a shared catalog, you must do this before migrating tables to CDP.

In HDP 3.0 - 3.1.4, Spark and Hive use independent catalogs for accessing tables created using SparkSQL or Hive tables. A table created from Spark resides in the Spark catalog. A table created from Hive resides in the Hive catalog. Databases fall under the catalog namespace, similar to how tables belong to a database namespace. In HPD 3.1.5, Spark and Hive share a catalog in Hive metastore (HMS) instead of using separate catalogs.

The Apache Hive schematool in HDP 3.1.5 and CDP releases supports the mergeCatalog task. This task performs the following actions:
  • Detects conflicts in DB names across catalogs and in case of conflicts, lists each conflict, and exits.
  • When there are no conflicts, the following changes occur:
    • Adds EXTERNAL=true to the table properties of all managed tables in the source catalog.
    • Adds external.table.purge=true in table properties of all managed tables in the source catalog.
    • Sets tableType=EXTERNALcodeph> for all managed tables in the source catalog.
    • Sets CTLG_NAME=<toCatalog> for all databases in the source catalog.
Use the following syntax to merge the databases in the catalog named spark into a catalog named hive, which is the default catalog for HiveServer (HS2).
schematool -dbType <database> -mergeCatalog spark -toCatalog hive [-verbose] [-dryRun] 

The default names of the catalogs are spark and hive. The dryRun option rolls back the changes.

To merge catalogs:

  1. On the Beeline command line, start Hive.
  2. At the Hive prompt, run a Hive schematool query test, using the dryRun option to roll back changes. For example:
    bin/schematool -mergeCatalog spark -toCatalog hive -verbose  -dbType derby --dryRun                    
  3. Check the output, and if there are no conflicts, merge catalogs. For example:
    bin/schematool -mergeCatalog spark -toCatalog hive -verbose  -dbType mysql