Known Issues for Apache Sqoop

This topic describes known issues and workarounds for using Parquet and Avro imports in this release of Cloudera Runtime.

Invalid method name: 'get_index_names'
Problem: During a Sqoop export operation, an invalid method is called. HCatalog returns an exception.
CDPD-3085
Corrupt data is displayed in Beeline
Problem: In Beeline, the output of selecting data from a Hive table imported by Sqoop is wrong. The actual data in the table is correct.
CDPD-3467
Avro, S3, and HCat do not work together properly
Problem: Importing an Avro file into S3 with HCat fails with Delegation Token not available.
CDPD-3089
Parquet columns inadvertently renamed
Problem: Column names that start with a number are renamed when you use the --as-parquetfile option to import data.
Workaround: Prepend column names in Parquet tables with one or more letters or underscore characters.
Apache JIRA: None
Importing Parquet files might cause out-of-memory (OOM) errors
Problem: Importing multiple megabytes per row before initial-page-run check (ColumnWriter) can cause OOM. Also, rows that vary significantly by size so that the next-page-size check is based on small rows, and is set very high, followed by many large rows can also cause OOM.
PARQUET-99