Migrating Hive data from HDP 2.x or HDP 3.x to Cloudera

The recommended way to migrate Hive data from HDP to Cloudera depends on the types of tables you are migrating: external, legacy-managed, or ACID (managed) tables.

The default behavior in Cloudera has changed so that all newly created managed tables are transactional tables by default. To minimize impact to applications during upgrade and keep legacy behavior, all non-acid legacy tables (both managed and external) will be converted to external tables in Cloudera during migration. Legacy ACID tables will be migrated to managed ACID tables in CDP.

Scenario 1: Migrating Non-ACID tables (SCHEMA_ONLY + distcp)

All managed and external Non-ACID tables will be migrated to external tables in Cloudera using the following components:
  • HMS-Mirror: using the SCHEMA_ONLY mode to transfer metadata
  • DistCP
In this scenario, HMS-Mirror and distcp are triggered from the source cluster (LEFT):

If direct Hive connectivity from on-prem to Cloudera is not available, you can dump of the schema to export it, and then preform a re-run manually in the target cluster.

Scenario 2: Migrating ACID tables (HYBRID + MIGRATE_ACID)

All managed ACID source tables (applicable to HDP) are migrated to ACID tables in Cloudera using HMS Mirror in the HYBRID migrate-acid mode with the intermediate-storage option. This approach migrates both data and metadata using Hive queries, and stages the data in an intermediate-storage location on S3. This approach avoids the need to link the target cloud environment with the on-prem source.

In this scenario, HMS-Mirror is triggered from the source cluster (LEFT), as shown in the following diagram: