Setting Up Sqoop

Cloudera Runtime includes the Sqoop Client for bulk importing and exporting data from diverse data sources to Hive. You learn how to install the RDBMS connector and Sqoop Client in CDP.

  1. In Cloudera Manager, in Clusters, select Add Service from the options menu.
  2. Select the Sqoop Client and click Continue.
  3. Choose a JDBC database driver, depending on the data source of the source or destination for a Sqoop import or export, respectively.
  4. Install the JDBC database driver in /var/lib/sqoop on the Sqoop node.
    Do not install the/opt/cloudera/parcels/CDH because upgrades modify this directory.
    • MySQL: Download the MySQL driver https://dev.mysql.com/downloads/connector/j/ to /var/lib/sqoop, and then run tar -xvzf mysql-connector-java-<version>.tar.gz.
    • Oracle: Download the driver from https://www.oracle.com/technetwork/database/enterprise-edition/jdbc-112010-090769.html and put it in /var/lib/sqoop.
    • PostgreSQL: Download the driver from http://jdbc.postgresql.org/download.html and put it in /var/lib/sqoop.
  5. In Cloudera Manager, click Actions > Deploy Client Configuration.
After setting up the Sqoop client, you can enter Sqoop commands using the following connection string, depending on your data source.
  • MySQL Syntax:

    jdbc:mysql://<HOST>:<PORT>/<DATABASE_NAME>

    Example:

    jdbc:mysql://my_mysql_server_hostname:3306/my_database_name
  • Oracle Syntax:

    jdbc:oracle:thin:@<HOST>:<PORT>:<DATABASE_NAME>

    Example:

    jdbc:oracle:thin:@my_oracle_server_hostname:1521:my_database_name
  • PostgreSQL Syntax:

    jdbc:postgresql://<HOST>:<PORT>/<DATABASE_NAME>

    Example:

    jdbc:postgresql://my_postgres_server_hostname:5432/my_database_name
  • Netezza Syntax:

    jdbc:netezza://<HOST>:<PORT>/<DATABASE_NAME>
    Example:
    jdbc:netezza://my_netezza_server_hostname:5480/my_database_name
  • Teradata Syntax:

    jdbc:teradata://<HOST>/DBS_PORT=1025/DATABASE=<DATABASE_NAME>
    Example:
    jdbc:teradata://my_teradata_server_hostname/DBS_PORT=1025/DATABASE=my_database_name