Configuring Sqoop 2

This section explains how to configure the Sqoop 2 server.

Configuring which Hadoop Version to Use

The Sqoop 2 client does not interact directly with Hadoop MapReduce, and so it does not require any MapReduce configuration.

The Sqoop 2 server can work with either MRv1 or YARN. It cannot work with both simultaneously.You set the MapReduce version the Sqoop 2 server works with by means of the alternatives command (or update-alternatives, depending on your operating system):
  • To use YARN:
    alternatives --set sqoop2-tomcat-conf /etc/sqoop2/tomcat-conf.dist
  • To use MRv1:
    alternatives --set sqoop2-tomcat-conf /etc/sqoop2/tomcat-conf.mr1

Configuring Sqoop 2 to Use PostgreSQL instead of Apache Derby

Deciding which Database to Use

Sqoop 2 has a built-in Derby database, but Cloudera recommends that you use a PostgreSQL database instead, for the following reasons:
  • Derby runs in embedded mode and it is not possible to monitor its health.
  • Though it might be possible, Cloudera currently has no live backup strategy for the embedded Derby database.
  • Under load, Cloudera has observed locks and rollbacks with the embedded Derby database that do not happen with server-based databases.
See CDH and Cloudera Manager Supported Databases for tested database versions.

Install PostgreSQL 8.4.x or 9.0.x

See External PostgreSQL Database.

Create the Sqoop User and Sqoop Database

For example, using the PostgreSQL psql command-line tool:

$ psql -U postgres
Password for user postgres: *****

postgres=# CREATE ROLE sqoop LOGIN ENCRYPTED PASSWORD 'sqoop'
 NOSUPERUSER INHERIT CREATEDB NOCREATEROLE;
CREATE ROLE

postgres=# CREATE DATABASE "sqoop" WITH OWNER = sqoop
 ENCODING = 'UTF8'
 TABLESPACE = pg_default
 LC_COLLATE = 'en_US.UTF8'
 LC_CTYPE = 'en_US.UTF8'
 CONNECTION LIMIT = -1;
CREATE DATABASE

postgres=# \q

Stop the Sqoop 2 Server

$ sudo /sbin/service sqoop2-server stop

Configure Sqoop 2 to use PostgreSQL

Edit the sqoop.properties file (normally /etc/sqoop2/conf) as follows:

org.apache.sqoop.repository.jdbc.handler=org.apache.sqoop.repository.postgresql.PostgresqlRepositoryHandler
org.apache.sqoop.repository.jdbc.transaction.isolation=isolation level
org.apache.sqoop.repository.jdbc.maximum.connections=max connections
org.apache.sqoop.repository.jdbc.url=jdbc URL
org.apache.sqoop.repository.jdbc.driver=org.postgresql.Driver
org.apache.sqoop.repository.jdbc.user=username
org.apache.sqoop.repository.jdbc.password=password
org.apache.sqoop.repository.jdbc.properties.property=value

Restart the Sqoop 2 Server

$ sudo /sbin/service sqoop2-server start

Installing the JDBC Drivers

Sqoop 2 does not ship with third party JDBC drivers. You must download them separately and save them to the /var/lib/sqoop2/ directory on the server. The following sections show how to install the most common JDBC drivers. Once you have installed the JDBC drivers, restart the Sqoop 2 server so that the drivers are loaded.

Installing the MySQL JDBC Driver

Download the MySQL JDBC driver here. You must sign up for an account if you do not already have one, then log in before you can download the driver. Copy it to the /var/lib/sqoop2/ directory. For example:

$ sudo cp mysql-connector-java-version/mysql-connector-java-version-bin.jar /var/lib/sqoop2/
At the time of publication, version was 5.1.31, but the version might change by the time you read this.

Installing the Oracle JDBC Driver

You can download the JDBC Driver from the Oracle website, for example here. You must accept the license agreement before you can download the driver. Download the ojdbc6.jar file and copy it to /var/lib/sqoop2/ directory:

$ sudo cp ojdbc6.jar /var/lib/sqoop2/

Installing the Microsoft SQL Server JDBC Driver

Download the Microsoft SQL Server JDBC driver here and copy it to the /var/lib/sqoop2/ directory. For example:

$ curl -L 'http://download.microsoft.com/download/0/2/A/02AAE597-3865-456C-AE7F-613F99F850A8/sqljdbc_4.0.2206.100_enu.tar.gz' | tar xz
$ sudo cp sqljdbc_4.0/enu/sqljdbc4.jar /var/lib/sqoop2/

Installing the PostgreSQL JDBC Driver

Download the PostgreSQL JDBC driver here and copy it to the /var/lib/sqoop2/ directory. For example:

$ curl -L 'http://jdbc.postgresql.org/download/postgresql-9.2-1002.jdbc4.jar' -o postgresql-9.2-1002.jdbc4.jar
$ sudo cp postgresql-9.2-1002.jdbc4.jar /var/lib/sqoop2/

Syntax for Configuring JDBC Connection Strings

These are the JDBC connection strings for supported databases.

MySql Connection String

Syntax:

jdbc:mysql://<HOST>:<PORT>/<DATABASE_NAME>

Example:

jdbc:mysql://my_mysql_server_hostname:3306/my_database_name

Oracle Connection String

Syntax:

jdbc:oracle:thin:@<HOST>:<PORT>:<DATABASE_NAME>

Example:

jdbc:oracle:thin:@my_oracle_server_hostname:1521:my_database_name

PostgreSQL Connection String

Syntax:

jdbc:postgresql://<HOST>:<PORT>/<DATABASE_NAME>

Example:

jdbc:postgresql://my_postgres_server_hostname:5432/my_database_name

Netezza Connection String

Syntax:

jdbc:netezza://<HOST>:<PORT>/<DATABASE_NAME>
Example:
jdbc:netezza://my_netezza_server_hostname:5480/my_database_name

Teradata Connection String

Syntax:

jdbc:teradata://<HOST>/DBS_PORT=1025/DATABASE=<DATABASE_NAME>
Example:
jdbc:teradata://my_teradata_server_hostname/DBS_PORT=1025/DATABASE=my_database_name