Known Issues for Apache Sqoop
Learn about the known issues in Sqoop, the impact or changes to the functionality, and the workaround.
- Using direct mode causes problems
- Using direct mode has
- Imports can cause intermittent an overlapping input split.
- Imports can generate duplicate data.
- Many problems, such as intermittent failures, can occur.
- Additional configuration is required.
- Stop using direct mode. Do not use the --direct option in Sqoop import or export commands.
- CDPD-3089: Avro, S3, and HCat do not work together properly
- Importing an Avro file into S3 with HCat fails with Delegation Token not available.
- Parquet columns inadvertently renamed
- Column names that start with a number are renamed when you use the --as-parquetfile option to import data.
- Prepend column names in Parquet tables with one or more letters or underscore characters.
- Importing Parquet files might cause out-of-memory (OOM) errors
- Importing multiple megabytes per row before initial-page-run check (ColumnWriter) can cause OOM. Also, rows that vary significantly by size so that the next-page-size check is based on small rows, and is set very high, followed by many large rows can also cause OOM.