Apache Sqoop Known Issues
MySQL JDBC driver shipped with CentOS 6 systems does not work with Sqoop
CentOS 6 systems currently ship with version 5.1.17 of the MySQL JDBC driver. This version does not work correctly with Sqoop.
Bug: None
Affected Versions: MySQL JDBC 5.1.17, 5.1.4, 5.3.0
Cloudera Bug: CDH-23180
Workaround: Install version 5.1.31 of the JDBC driver as detailed in Installing the JDBC Drivers for Sqoop 1 (Sqoop 1) or Configuring Sqoop 2 (Sqoop 2).
MS SQL Server "integratedSecurity" option unavailable in Sqoop
The integratedSecurity option is not available in the Sqoop CLI.
Bug: None
Workaround: None
Sqoop 1
Sqoop does not support Hive metastore HA for Parquet format
If the Hive metastore (HMS) is configured for high availability (HA), the value of the hive.metastore.uris property in /etc/hive/conf.cloudera.HIVE-1/hive-site.xml contains a list of URIs instead of one URI. For example:
<value>thrift://hive-1.example.com:9083,thrift://hive-2.example.com:9083,thrift://hive-3.example.com:9083</value>
Sqoop uses Kite for Parquet file handling and Kite does not support HMS HA. Using Kite, Sqoop takes the first URI from the hive.metastore.uris list and uses that to connect to the HMS. If that address is not available, the Sqoop command fails.
This issue occurs when you use the following Sqoop arguments with the sqoop-import command: --hive-import and --as-parquetfile.
Bug: CDH-57895
Affected Versions: CDH 5.12.x and below.
Fixed in Versions: CDH 5.13 and higher.
Cloudera Bug: CDH-57895
Workaround: None.
Sqoop1 Metastore job tool ignores overrides
sqoop job --exec job -- options_to_override>
sqoop job --create my_eval_query –- eval --connect jdbc:mysql://fqdn.example.com:3306/db --username user --password password --query "select * from t1"
sqoop job --exec my_eval_query c1 c2 c3 1 2017-02-17 row data 1 2 2017-02-17 row data 2 3 2017-02-17 row data 3
sqoop job --exec my_eval_query -- --query "select * from t1 where c1 = 2" c1 c2 c3 1 2017-02-17 row data 1 2 2017-02-17 row data 2 <--- Only this row should be returned 3 2017-02-17 row data 3
Affected releases: CDH 5.8.0, 5.8.1, 5.8.3, 5.8.4; CDH 5.9.0, 5.9.1; CDH 5.10.0
Bug: SQOOP-2896
Cloudera Bug: CDH-51101
Fixed in Versions: CDH 5.8.5, CDH 5.9.2, CDH 5.9.3, CDH 5.10.1, CDH 5.10.2, CDH 5.11.0 (and higher) releases.
Sqoop1 (doc import + --as-parquetfile) limitation with KMS/KTS Encryption at Rest
sqoop import --connect jdbc:db2://djaxludb1001:61035/DDBAT003 --username=dh810202 --P --target-dir /data/hive_scratch/ASDISBURSEMENT --delete-target-dir -m1 --query "select disbursementnumber,disbursementdate,xmldata FROM DB2dba.ASDISBURSEMENT where DISBURSEMENTNUMBER = 2011113210000115311 AND \$CONDITIONS" -hive-import --hive-database adminserver -hive-table asdisbursement_dave --map-column-java XMLDATA=String --as-parquetfile 16/12/05 12:23:46 INFO mapreduce.Job: map 100% reduce 0% 16/12/05 12:23:46 INFO mapreduce.Job: Job job_1480530522947_0096 failed with state FAILED due to: Job commit failed: org.kitesdk.data.DatasetIOException: Could not move contents of hdfs://AJAX01-ns/tmp/adminserver/.temp/job_1480530522947_0096/mr/job_1480530522947_0096 to hdfs://AJAX01-ns/data/RetiredApps/INS/AdminServer/asdisbursement_dave <SNIP> Caused by: org.apache.hadoop.ipc.RemoteException(java.io.IOException): /tmp/adminserver/.temp/job_1480530522947_0096/mr/job_1480530522947_0096/5ddcac42-5d69-4e46-88c2-17bbedac4858.parquet can't be moved into an encryption zone.
Cloudera Bug: CDH-40826 and SQOOP-2943
Workaround: Use an alternate data file type, for example text or avro.
Doc import as Parquet files may result in out-of-memory errors
- With many very large rows (multiple megabytes per row) before initial-page-run check (ColumnWriter)
- When rows vary significantly by size so that the next-page-size check is based on small rows and is set very high followed by many large rows
Bug: PARQUET-99
Cloudera Bug: CDH-50896
Workaround: None, other than restructuring the data.
Hive, Pig, and Sqoop 1 fail in MRv1 tarball due to incorrect HADOOP_MAPRED_HOME setting
This issue affects tarball installations only.
Bug: None
Cloudera Bug: CDH-6640. Fixed for packages only: CDH-6648.
Workaround: For MRv1 only, manually edit as follows:
- Change the path for the Hadoop MapReduce home setting in /etc/default/hadoop from:
export HADOOP_MAPRED_HOME=/usr/lib/hadoop-mapreduce
to:
export HADOOP_MAPRED_HOME=/usr/lib/hadoop-0.20-mapreduce
- Remove /usr/lib/hadoop-mapreduce from the HADOOP_CLASSPATH.
Sqoop 2
Sqoop 2 client cannot be used with a different version of the Sqoop 2 server
The Sqoop 2 client and server must be running the same CDH version.
Cloudera Bug: CDH-11901
Workaround: Make sure all Sqoop 2 components are running the same version of CDH.
Sqoop 2 upgrade may fail if any job's source and destination links point to the same connector
For example, the links for the job shown in the following output both point to generic-jdbc-connector:
sqoop:000> show job --all 1 job(s) to show: Job with id 1 and name job1 (Enabled: true, Created by null at 5/13/15 3:05 PM, Updated by null at 5/13/15 6:04 PM) Throttling resources Extractors: Loaders: From link: 1 From database configuration Schema name: schema1 Table name: tab1 Table SQL statement: Table column names: col1 Partition column name: Null value allowed for the partition column: false Boundary query: Incremental read Check column: Last value: To link: 2 To database configuration Schema name: schema2 Table name: tab2 Table SQL statement: Table column names: col2 Stage table name: Should clear stage table: sqoop:000> show link --all 2 link(s) to show: link with id 1 and name try1 (Enabled: true, Created by null at 5/13/15 2:59 PM, Updated by null at 5/13/15 5:47 PM) Using Connector generic-jdbc-connector with id 2 Link configuration JDBC Driver Class: com.mysql.jdbc.Driver JDBC Connection String: jdbc:mysql://mysql.server/database Username: nvaidya Password: JDBC Connection Properties: link with id 2 and name try2 (Enabled: true, Created by null at 5/13/15 3:01 PM, Updated by null at 5/13/15 5:47 PM) Using Connector generic-jdbc-connector with id 2 Link configuration JDBC Driver Class: com.mysql.jdbc.Driver JDBC Connection String: jdbc:mysql://mysql.server/database Username: nvaidya Password: JDBC Connection Properties:
Cloudera Bug: CDH-27779
Workaround: Before upgrading, make sure no jobs have source and destination links that point to the same connector.