Using Navigator Optimizer Migration Scripts

First upload a workload from an enterprise data warehouse (EDW) platform. For example, from Oracle or Teradata. Then analyze the workload for information about migrating it to a Hadoop platform, such as Impala or Hive, using the Navigator Optimizer EDW workload migration page. See Getting Started: Evaluating and Starting an EDW Offload Project for details. After uploading the workload and reviewing it, navigate to the Effort Estimation & Scripts pane and download the DDL Scripts, Sqoop Script, and the Revised Queries files:


  1. Copy the following files that you downloaded from the migration page to an edge node in your Hadoop cluster:
    • ddl.sqlDDL Scripts
    • sqoop_generator.pySqoop Script
    • rewritten_queries.csvRevised Queries
  2. Log in to the edge node and switch to a user who can access Hive or Impala on your cluster. Install the Cloudera JDBC drivers needed for your database type. If you are using Teradata or Netezza, install the appropriate Cloudera Connector. For more information, see Cloudera Enterprise Connector Documentation.
  3. Launch either the Impala shell or the Beeline CLI for Hive, and create the databases that are required for the workload. If you are unsure, open the ddl.sql file that you downloaded, and view the CREATE TABLE statements to confirm what databases are needed.
  4. In the Impala shell or Beeline, run the ddl.sql script. These scripts create the tables that your workload accesses. For example, if you are migrating workloads to Impala, use the following command to launch the Impala shell and run the DDL script:

    $ impala-shell -i <hostname> -d <database> -f <path_to_ddl_script_file>
            

    If your hostname is my_server.com, your database name is big_database, and your DDL script file is located in your current directory, enter the following command:

    $ impala-shell -i my_server.com -d big_database -f ./ddl.sql
            
  5. After the schema is created with the DDL script, prepare to run the Python script sqoop_generator.py. This script generates a shell script that contains Sqoop commands to import your data into the target platform. To prepare for running it:

    1. Make sure that you have Python 2.7 installed on your edge node.
    2. Log out of the Impala shell or the Beeline CLI, and navigate to the directory where you have downloaded the sqoop_generator.py script.
    3. If you are using a Unix platform, run the following command in a terminal window to make the script executable:
      $ chmod +x sqoop_generator.py
                  
    4. Make sure that the Python interpreter is in your $PATH environment variable. A quick way to test is to tun the following command:

      $ which python
                  

      If this command returns a path to the Python interpreter file like /usr/bin/python, your computer can find the interpreter to run the Python script.

    5. Make sure you are logged in as a user who can write to the directory where the Python script is run. This is important because the Python script creates a shell script that contains Sqoop commands and must write out this shell script to the directory.
  6. Run the Python script:

    $ python <dir_containing_script>/sqoop_generator.py
            

    Where <dir_containing_script> is the directory where the sqoop_generator.py script is located. For example, if the script is located in your current directory, enter the following command:

    $ python ./sqoop_generator.py
            

    Respond to the questions that the Python script generates. It asks you to confirm or provide the database vendor, database driver, database URL, database username, and database password. Then the Python script validates tables in the database and asks you to confirm. After the script runs through the database tables, it generates a shell script named sqoop-import-script.sh in the directory where the Python script was run. This shell script contains Squoop commands that you can run to import your data into the target platform with data type mapping.

  7. To create the necessary text files on the edge node of your cluster, run the shell script in a terminal window. If you want to verify that the script is creating the correct files, open it and review its contents. You can edit it manually before running it. When you are satisfied with the script, navigate to the directory where it is located and run the script with the following command in your terminal window:

    $ bash ./sqoop-import-script.sh
            
  8. After running the sqoop-import-script.sh script, log in to the Impala shell or the Beeline CLI and run the DDL scripts (ddl.sql) again to convert the text files created by the Sqoop script to Parquet-formatted files.

After performing the above steps, your Hadoop cluster is ready to receive the migrated workload after you have updated the SQL to run in either Impala or Hive.