Merge Independent Hive and Spark Catalogs
If you upgraded to HDP 3.1.5 from an earlier version of HDP 3.x and did not convert independent catalogs to a shared catalog, you must do this before migrating tables to CDP.
In HDP 3.0 - 3.1.4, Spark and Hive use independent catalogs for accessing tables created using SparkSQL or Hive tables. A table created from Spark resides in the Spark catalog. A table created from Hive resides in the Hive catalog. Databases fall under the catalog namespace, similar to how tables belong to a database namespace. In HPD 3.1.5, Spark and Hive share a catalog in Hive metastore (HMS) instead of using separate catalogs.
mergeCatalog
task. This task performs the following actions:- Detects conflicts in DB names across catalogs and in case of conflicts, lists each conflict, and exits.
- When there are no conflicts, the following changes occur:
- Adds
EXTERNAL=true
to the table properties of all managed tables in the source catalog. - Adds
external.table.purge=true
in table properties of all managed tables in the source catalog. - Sets
tableType=EXTERNAL
for all managed tables in the source catalog. - Sets
CTLG_NAME=<toCatalog>
for all databases in the source catalog.
- Adds
schematool -dbType <database> -mergeCatalog spark -toCatalog hive [-verbose] [-dryRun]
The default names of the catalogs are spark and hive. The dryRun option rolls back the changes.
To merge catalogs: