During the Hive migration beside the SQL query, the query related tables and data are
also migrated from a CDH or CDP Private Cloud Base cluster to a Data Hub
cluster.
Before the migration, the source cluster is scanned to
collect the SQL queries, tables and data from Hive or Impala. This migration can be used
in cases when there is a heavy SQL query load and you want to unload the less time
sensitive queries to another cluster. Using the scheduling feature of the underlying
Replication Manager, you can keep the queries in sync between the source and destination
cluster. During the migration process, the SQL queries are not affected on the source
cluster and can remain in running state.
- Ensure that CMA is set up correctly using the steps in Setting up CMA server.
- Ensure that you have met the requirements detailed in Reviewing prerequisites before
migration.
- Ensure that you have a CDH 5, CDH 6 or CDP Private Cloud Base cluster registered
as a source from which you want to migrate your Hive queries. If you do not have
a source cluster yet, complete the steps in Registering source clusters.
- Ensure that you have a Data Engineering Data Hub cluster registered as a
destination cluster to which you want to migrate your Hive queries. If you do
not have a destination cluster yet, complete the steps in Registering destination clusters.
-
Click on the CDH or CDP Private Cloud Base cluster you want to use for the
migration on the Clusters page.
-
Click Start Scanning to open the Scan
Settings.
-
Select Hive table scan, Hive table
check and Hive workflow scan.
-
Provide the Hive query parser input.
You can pre-scan Hive2 SQL queries against Hive3 with the Hive
Workflow scan option. When selecting this Hive Workflow option, you need
to provide the location of your queries as shown in the following
example:
- HDFS paths
- With default namespace:
hdfs:///dir/
,
hdfs:///dir/file
- With specified namespace:
hdfs://namespace1/dir
,
hdfs://namespace1/dir/file
- With namenode address:
hdfs://nameNodeHost:port:/dir
,
hdfs://nameNodeHost:port:/dir/file
- Native file paths
your/local/dir
nodeFQDN:/your/local/dir/sqlFile
-
Click Scan selected.
You will be redirected to the scanning progress, where you can monitor
if the scanning process was successful or encountered any error.
-
Click on Hive SQL to view the collected queries when the
scan is finished.
You can also find the tables that are related to the queries under
Hive tables.
-
Add the Hive queries to Collections.
Collections serve as an organization method to sort and bundle the queries
into groups for the migration. You can create more collections beside the
Default collection based on your requirements. The
Hive tables that belong to the Hive queries are automatically added to the same
collection.
After you are finished with sorting the queries to
collections, you can start the migration process by creating the migration
plan.
-
Click Create Migration or select .
-
Select the source cluster, and click Next.
-
Select the destination cluster, and click
Next.
-
Select the type of migration, and click
Next.
-
Select the collections that you want to migrate, and click
Next.
You can select if the migration should Run Now
or be completed in a Scheduled Run.
Run Now means that the Hive queries in the
selected collections are going to be migrated as soon as the process
starts. When choosing the Scheduled Run, you can
select the start date of the migration, and set a frequency in which the
migration process should proceed. In case your goal is to keep the
queries in sync between the source and destination cluster, select the
Scheduled Run with a frequent time period for
migration.
-
Review the default configurations that are filled out
automatically.
-
Click Next.
An overview of the migration plan is displayed. At this point, you can
go back and change any configuration if the information is not correct.
If the information is correct, click
Create.
-
Click Go to Migrations when the migration plan is
successfully created.
-
Click on the CDH to CDP PC or CDP Private
Cloud Base to CDP PC migration to start the migration.
The steps are displayed that are going to be completed during the migration.
-
Click
to start
migration.
During the Hive SQL migration, a replication policy is created using the
Replication Manager. When the policy is created, click
to start uploading
the SQL migration. At this step, the Hive scripts from the source cluster
are copied to the Hive S3 bucket on the destination. When the Hive SQL
Migration is finished, click
to finalize the replication policies.
When all of the steps are successfully completed, the migration of Hive queries
from CDH or or CDP Private Cloud Base to CDP Public Cloud is finished. You can restart
the queries on the destination Data Engineering Data Hub cluster using Command Line
Interface (CLI) or Hue.