Managing the Sqoop 1 Client
To add Sqoop 1 to your cluster, add the Sqoop 1 Client service and a Sqoop 1 gateway and deploy the client configuration:
Adding the Sqoop 1 Client
Minimum Required Role: Full Administrator
- On the tab, click to the right of the cluster name and select Add a Service. A list of service types display. You can add one type of service at a time.
- Select the Sqoop 1 Client service and click Continue.
- Select the services on which the new service should depend. All services must depend on the same ZooKeeper service. Click Continue.
- Customize the assignment of role instances to hosts. The wizard evaluates the hardware configurations of the hosts to determine the best hosts for each role. The wizard assigns all
worker roles to the same set of hosts to which the HDFS DataNode role is assigned. You can reassign role instances.
Click a field below a role to display a dialog box containing a list of hosts. If you click a field containing multiple hosts, you can also select All Hosts to assign the role to all hosts, or Custom to display the hosts dialog box.
The following shortcuts for specifying hostname patterns are supported:- Range of hostnames (without the domain portion)
Range Definition Matching Hosts 10.1.1.[1-4] 10.1.1.1, 10.1.1.2, 10.1.1.3, 10.1.1.4 host[1-3].company.com host1.company.com, host2.company.com, host3.company.com host[07-10].company.com host07.company.com, host08.company.com, host09.company.com, host10.company.com - IP addresses
- Rack name
Click the View By Host button for an overview of the role assignment by hostname ranges.
- Range of hostnames (without the domain portion)
- Click Continue. The client configuration deployment command runs.
- Click Continue and click Finish.
Installing the JDBC Drivers for Sqoop 1
Sqoop 1 does not ship with third party JDBC drivers. You must download them separately and save them to the /var/lib/sqoop/ directory on the server. Ensure that you do not save JARs in the CDH parcel directory /opt/cloudera/parcels/CDH, because this directory is overwritten when you upgrade CDH.
Before you begin:
mkdir -p /var/lib/sqoop chown sqoop:sqoop /var/lib/sqoop chmod 755 /var/lib/sqoop
This sets permissions to drwxr-xr-x.
For JDBC drivers for Hive, Impala, Teradata, or Netezza, see the Connectors documentation.
CDP does not support the Sqoop exports using the Hadoop jar command (the Java API). The connector documentation from Teradata includes instructions that include the use of this API. CDP users have reportedly mistaken the unsupported API commands, such as -forcestage, for the supported Sqoop commands, such as –-staging-force. Cloudera supports the use of Sqoop only with commands documented in Using the Cloudera Connector Powered by Teradata. Cloudera does not support using Sqoop with Hadoop jar commands, such as those described in the Teradata Connector for Hadoop Tutorial.
Installing the MySQL JDBC Driver
Download the MySQL JDBC driver from http://www.mysql.com/downloads/connector/j/5.1.html. You will need to sign up for an account if you do not already have one, and log in, before you can download it. Then copy it to the /var/lib/sqoop/ directory. For example:
sudo cp mysql-connector-java-version/mysql-connector-java-version-bin.jar /var/lib/sqoop/
Installing the Oracle JDBC Driver
You can download the JDBC Driver from the Oracle website, for example http://www.oracle.com/technetwork/database/enterprise-edition/jdbc-112010-090769.html. You must accept the license agreement before you can download the driver. Download the ojdbc6.jar file and copy it to the /var/lib/sqoop/ directory:
sudo cp ojdbc6.jar /var/lib/sqoop/
Installing the Microsoft SQL Server JDBC Driver
Download the Microsoft SQL Server JDBC driver from http://www.microsoft.com/en-us/download/details.aspx?displaylang=en&id=11774 and copy it to the /var/lib/sqoop/ directory. For example:
curl -L 'http://download.microsoft.com/download/0/2/A/02AAE597-3865-456C-AE7F-613F99F850A8/sqljdbc_4.0.2206.100_enu.tar.gz' | tar xz sudo cp sqljdbc_4.0/enu/sqljdbc4.jar /var/lib/sqoop/
Installing the PostgreSQL JDBC Driver
Download the PostgreSQL JDBC driver from http://jdbc.postgresql.org/download.html and copy it to the /var/lib/sqoop/ directory. For example:
curl -L 'http://jdbc.postgresql.org/download/postgresql-9.2-1002.jdbc4.jar' -o postgresql-9.2-1002.jdbc4.jar sudo cp postgresql-9.2-1002.jdbc4.jar /var/lib/sqoop/
Syntax for Configuring JDBC Connection Strings
These are the JDBC connection strings for supported databases.
MySql Connection String
Syntax:
jdbc:mysql://<HOST>:<PORT>/<DATABASE_NAME>
Example:
jdbc:mysql://my_mysql_server_hostname:3306/my_database_name
Oracle Connection String
Syntax:
jdbc:oracle:thin:@<HOST>:<PORT>:<DATABASE_NAME>
Example:
jdbc:oracle:thin:@my_oracle_server_hostname:1521:my_database_name
PostgreSQL Connection String
Syntax:
jdbc:postgresql://<HOST>:<PORT>/<DATABASE_NAME>
Example:
jdbc:postgresql://my_postgres_server_hostname:5432/my_database_name
Netezza Connection String
Syntax:
jdbc:netezza://<HOST>:<PORT>/<DATABASE_NAME>Example:
jdbc:netezza://my_netezza_server_hostname:5480/my_database_name
Teradata Connection String
Syntax:
jdbc:teradata://<HOST>/DBS_PORT=1025/DATABASE=<DATABASE_NAME>
jdbc:teradata://my_teradata_server_hostname/DBS_PORT=1025/DATABASE=my_database_name
Setting HADOOP_MAPRED_HOME for Sqoop 1
- For each user who will be submitting MapReduce jobs using MapReduce v2 (YARN), or running Pig, Hive, or Sqoop 1 in a YARN installation, make sure that the HADOOP_MAPRED_HOME environment variable is set correctly, as follows:
export HADOOP_MAPRED_HOME=/usr/lib/hadoop-mapreduce