Hive Sidecar Migration from CDH to CDP

If your cluster uses Apache Hive, you will need to migrate configurations, data, and workloads to the destination cluster.

CDP includes a new version of Hive, Hive 3. There are changes you should be aware of as you plan to migrate your Hive data to CDP.

  1. Implement configuration changes on the destination cluster.
    1. Configure properties to the After Upgrade value documented in Hive Configuration Property Changes .
    2. Configure critical Hive configurations documented. See Customizing critical Hive configurations.
    3. Set configuration overrides. See Setting Hive Configuration Overrides.
    4. Set required Hive Configuration Requirements and Recommendations documented in Hive Configuration Requirements and Recommendations .
    5. If you configured high availability for the Hive Metastore in the source cluster, ensure that it is also enabled in the destination cluster. See Configuring HMS for high availability.
  2. Use Replication Manager to migrate your Hive tables and metastore from the source cluster to the destination cluster. Replication Manager is available from the Cloudera Manager Admin console on the destination cluster. See Hive/Impala replication policy. You can begin replicating data before you are ready to take the new cluster to production and then use Snapshots to keep the destination cluster in sync with the data on the source cluster. If you check the box to replicate Sentry policy to Ranger, you cannot change the Ranger policy until your replications and migration are complete. See Sentry to Ranger replication for Hive replication policies.
  3. Migrate Workloads. See Migrating Hive workloads from CDH.
  4. Complete post-migration steps.
    1. Read all the Hive documentation for CDP, which includes: Configuring HMS for high availability Configuring HMS for high availability
    2. If you have any Impala workloads, see Impala Sidecar Migration.
    3. Read the Apache Ranger documentation to set up access to Hive tables: Using Ranger to Provide Authorization in CDP
    4. Handle syntax changes between Hive 1/2 to 3 as documented in Handling syntax changes
    5. Understand ACID tables, as documented in Migrating Hive workloads to ACID
    6. Generate statistics as documented in Generating statistics
    7. Convert HIVE CLI scripts to Beeline as documented in Converting Hive CLI scripts to Beeline