Migrating Hive data from HDP 2.x or HDP 3.x to Cloudera
The recommended way to migrate Hive data from HDP to Cloudera depends on the types of tables you are migrating: external, legacy-managed, or ACID (managed) tables.
Scenario 1: Migrating Non-ACID tables (SCHEMA_ONLY + distcp)
- HMS-Mirror: using the SCHEMA_ONLY mode to transfer metadata
- DistCP
If direct Hive connectivity from on-prem to Cloudera is not available, you can dump of the schema to export it, and then preform a re-run manually in the target cluster.
Scenario 2: Migrating ACID tables (HYBRID + MIGRATE_ACID)
All managed ACID source tables (applicable to HDP) are migrated to ACID tables in Cloudera using HMS Mirror in the HYBRID
migrate-acid
mode with the intermediate-storage option. This
approach migrates both data and metadata using Hive queries, and stages the data in
an intermediate-storage location on S3. This approach avoids the need to link
the target cloud environment with the on-prem source.