Known Issues for Apache Sqoop

Learn about the known issues in Sqoop, the impact or changes to the functionality, and the workaround.

CDPD-54770: Unable to read Sqoop metastore created by an older HSQLDB version: If you have upgraded to CDP PvC Base 7.1.8 Cumulative hotfix 4 or higher versions, you may encounter issues in reading the Sqoop metastore that was created using an older version of HyperSQL Database (HSQLDB).; Cloudera upgraded the HSQLDB dependency from 1.8.0.10 to 2.7.1 and this causes incompatibility issues in Sqoop jobs that are stored in HSQLDB.; After upgrading to CDP PvC Base 7.1.8 Cumulative hotfix 4, you must upgrade the Sqoop metastore and convert the database files to a format that can easily be read by HSQLDB 2.7.1. For more information, see Troubleshooting Apache Sqoop issues.

CDPD-44431: Using direct mode causes problems: Using direct mode has several drawbacks:

Imports can cause an intermittent and overlapping input split.

Imports can generate duplicate data.

Many problems, such as intermittent failures, can occur.

Additional configuration is required.; Stop using direct mode. Do not use the --direct option in Sqoop import or export commands.; Sqoop direct mode is disabled by default. However, if you still want to use it, enable it by either setting the sqoop.enable.deprecated.direct property globally in Cloudera Manager for Sqoop or by specifying it in the command-line through -Dsqoop.enable.deprecated.direct=true.

CDPD-3089: Avro, S3, and HCat do not work together properly: Importing an Avro file into S3 with HCat fails with Delegation Token not available.

Parquet columns inadvertently renamed: Column names that start with a number are renamed when you use the --as-parquetfile option to import data.; Prepend column names in Parquet tables with one or more letters or underscore characters.

PARQUET-99: Importing Parquet files might cause out-of-memory (OOM) errors: Importing multiple megabytes per row before initial-page-run check (ColumnWriter) can cause OOM. Also, rows that vary significantly by size so that the next-page-size check is based on small rows, and is set very high, followed by many large rows can also cause OOM.