During the Hive migration beside the SQL query, the query related tables and data are
also migrated from a CDH or Cloudera Private Cloud Base cluster to
a Cloudera Data Hub cluster.
Before the migration, the source cluster is scanned to
collect the SQL queries, tables and data from Hive or Impala. This migration can be used
in cases when there is a heavy SQL query load and you want to unload the less time
sensitive queries to another cluster. Using the scheduling feature of the underlying Cloudera Replication Manager, you can keep the queries in sync between the source
and destination cluster. During the migration process, the SQL queries are not affected
on the source cluster and can remain in running state.
-
Click on the CDH or Cloudera Private Cloud Base
cluster you want to use for the migration on the Clusters
page.
-
Click Start Scanning to open the Scan
Settings.
-
Select Hive table scan, Hive table
check and Hive workflow scan.
-
Provide the Hive query parser input.
You can pre-scan Hive2 SQL queries against Hive3 with the Hive
Workflow scan option. When selecting this Hive Workflow option, you need
to provide the location of your queries as shown in the following
example:
- HDFS paths
- With default namespace:
hdfs:///dir/
,
hdfs:///dir/file
- With specified namespace:
hdfs://namespace1/dir
,
hdfs://namespace1/dir/file
- With namenode address:
hdfs://nameNodeHost:port:/dir
,
hdfs://nameNodeHost:port:/dir/file
- Native file paths
your/local/dir
nodeFQDN:/your/local/dir/sqlFile
-
Click Scan selected.
You will be redirected to the scanning progress, where you can monitor
if the scanning process was successful or encountered any error.
-
Click on Hive SQL to view the collected queries when the
scan is finished.
You can also find the tables that are related to the queries under
Hive tables.
-
Add the Hive queries to Collections.
Collections serve as an organization method to sort and bundle the queries
into groups for the migration. You can create more collections beside the
Default collection based on your requirements. The
Hive tables that belong to the Hive queries are automatically added to the same
collection.
After you are finished with sorting the queries to
collections, you can start the migration process by creating the migration
plan.
-
Click Create Migration or select .
-
Select the source cluster, and click Next.
-
Select the destination cluster, and click
Next.
-
Select the type of migration, and click
Next.
-
Select the collections that you want to migrate, and click
Next.
You can select if the migration should Run Now
or be completed in a Scheduled Run.
Run Now means that the Hive queries in the
selected collections are going to be migrated as soon as the process
starts. When choosing the Scheduled Run, you can
select the start date of the migration, and set a frequency in which the
migration process should proceed. In case your goal is to keep the
queries in sync between the source and destination cluster, select the
Scheduled Run with a frequent time period for
migration.
-
Review the default configurations that are filled out
automatically.
-
Click Next.
An overview of the migration plan is displayed. At this point, you can
go back and change any configuration if the information is not correct.
If the information is correct, click
Create.
-
Click Go to Migrations when the migration plan is
successfully created.
-
Click on the Cloudera Public Cloud or Cloudera Private Cloud Base to Cloudera Public Cloud migration to
start the migration.
The steps are displayed that are going to be completed during the migration.
-
Click to start
migration.
During the Hive SQL migration, a replication policy is created using the Cloudera Replication Manager. When the policy is created, click to start uploading
the SQL migration. At this step, the Hive scripts from the source cluster
are copied to the Hive S3 bucket on the destination. When the Hive SQL
Migration is finished, click to finalize the replication policies.
When all of the steps are successfully completed, the migration of Hive queries
from CDH or Cloudera Private Cloud Base to Cloudera
Public Cloud is finished. You can restart the queries on the
destination Cloudera Data Engineering Data Hub cluster using Command Line
Interface (CLI) or Hue.