Scanning the source cluster

You need to scan the CDH source cluster to identify the available datasets and workloads that can be migrated. Scanning also enables you to review and resolve syntax errors that can occur after the migration.

  1. Click on the CDH cluster you want to use for the migration on the Sources page.
  2. Click Scan to open the scanning options of datasets and workloads.
    You can choose from the following scanning options:
    HDFS and Hive Table Scan
    Scanning HDFS and Hive tables on the source cluster. Both scans use the CDH Discovery Tool. HDFS data scan uses _hdfs_report_ module, while Hive Table scan uses the _hive_metastore_ module.
    Hive Table Check
    Scanning Hive tables on the source cluster. _Hive Table Check_ embeds sre and u3 sub-programs of the Hive SRE Tooling. The result will be visible at the SRE column of the Hive datasets.
    Hive Workflow scan
    Scanning Hive SQL queries on the source cluster. You can pre-scan Hive2 SQL queries against Hive3 with the Hive Workflow scan option. When selecting this Hive Workflow option, you need to provide the location of your queries as shown in the following example:
    • HDFS paths
      • With default namespace: hdfs:///dir/, hdfs:///dir/file
      • With specified namespace: hdfs://namespace1/dir, hdfs://namespace1/dir/file
      • With namenode address: hdfs://nameNodeHost:port:/dir, hdfs://nameNodeHost:port:/dir/file
    • Native file paths
      • your/local/dir
      • nodeFQDN:/your/local/dir/sqlFile
    Full Hive data scan
    Scanning Hive tables and queries on the source cluster.
    Full data scan
    Scanning HDFS tables, Hive tables, and SQL queries on the source cluster.
  3. Click Command History to track the scanning status.
  4. Click Datasets tab to review the data scanned for the different services.
    When reviewing Hive SQL, you can check and edit any SQL query related errors before migrating the workflows to Public Cloud. The migration will be successful regardless of fixing the statement errors. However, you will not be able to execute the SQL queries on the new cluster due to the compatibility issues between Hive2 and Hive3.
    You can review the list of errors using , and open the editor using .
    After fixing the statement errors in the SQL editor window, Save the changes. The edited queries are replicated and saved in the S3 bucket of the target cluster. The original files are not overwritten.
The datasets and workflow on the CDH source cluster is scanned for Hive and HDFS.
Add labels to the datasets and workflows to have more control over what is migrated from the source cluster to the target cluster.