Fixed Issues in Apache Sqoop

Review the list of Sqoop issues that are resolved in Cloudera Runtime 7.2.18.

CDPD-44397: Implement ORC support in Sqoop-Connector-Teradata component
A new version of Cloudera Connector Powered by Teradata version is released which includes ORC support in the Sqoop-Connector-Teradata component. You can use Teradata Manager to import data from the Teradata server to Hive in ORC format.
CDPD-47175: Sqoop Hive import with ORC file fails with ClassCastException
The import process of Sqoop to ORC file has been updated. Whenever an unsupported conversion is attempted, Sqoop now provides a comprehensive error message describing the issue.

Sqoop can now import the following data types:

  • Byte, Short, Int, Long, Float, Double from the same RDBMS types
  • BigDecimal to Long, Double, String
  • Date, Timestamp to String, Date, Timestamp
CDPD-56523: Sqoop does not take --hive-compute-stats option into account for hs2-url Hive imports
Sqoop now considers the --hive-compute-stats option for Hive imports when hs2-url parameter is used.
CDPD-58538: Oozie should upload and use the config files from sqoop-conf/managers.d when available
Previously, Oozie did not honor Sqoop's managers.d configurations and extra connector JARs from the lib folder, but now both are automatically available in Oozie's Sqoop action, allowing users to seamlessly utilize connectors like the Sqoop Teradata connector without the need for manual configuration updates or copying JARs to the Workflow's lib folder.
CDPD-59557: Secure options to provide the Hive password for Sqoop Hive imports
This fix introduces secure options that you can use to provide the Hive password during Sqoop-Hive imports instead of the earlier way of providing the password as plaintext in the command-line interface.
CDPD-59710: Fix time stamp conversion issue when exporting Parquet
When available, Sqoop will incorporate the writer's time zone metadata from the Parquet file during the export operation.
CDPD-61547: Sqoop should not close 'System.out' and 'System.err'
In certain cases the Sqoop process closed the 'sysout' and 'syserr' streams making it impossible to write to these if Sqoop manually used in a custom JVM.
CDPD-63723: Sqoop should determine files as Parquet by PAR1 in header
Sqoop now looks at the first 4 bytes of a file instead of 3 bytes to determine if the file is a Parquet file or not
CDPD-63915: Sqoop Teradata export fails if the source table is empty
Fixed the issue where Sqoop Teradata export failed if the source table was empty

Apache patch information