New Features in Cloudera Connector Powered by Teradata
The following new features are included in Cloudera Connector Powered by Teradata.
CDP compatible version of Cloudera Connector Powered by Teradata version 1.8.5.1c7 and the TDCH library to version 1.8.5.1
Cloudera Connector Powered by Teradata includes ORC support in the Sqoop-Connector-Teradata component. In this release, you can use Teradata Manager to import data from Teradata server to Hive in ORC format.
- CDP Private Cloud Base 7.1.9 and later
Extended Cloudera Data Platform (CDP) runtime compatibility also includes
- CDP Public Cloud 7.2.9 and later
- CDP Private Cloud Base 7.1.7 and later
Upgraded TDCH version and Teradata driver
The connector has been upgraded to use TDCH version 1.8.5.1 and Teradata driver version 20.00.00.10. This update ensures better performance, compatibility, and includes bug fixes in the connector.
Supported commands for ORC imports
- Example 1: Import without providing Teradata driver and connection details to TeradataManager
-
/opt/cloudera/parcels/CDH/bin/sqoop import \ --connect ... \ --username ... \ --password ... \ --table employees \ --warehouse-dir "..." \ --hive-import \ --delete-target-dir \ --hive-overwrite \ --as-orcfile \ --external-table-dir "..." \ --hs2-url "..." \ -m 1
- Example 2: Import without providing Teradata driver and by providing the corresponding TeradataManager connection details
-
/opt/cloudera/parcels/CDH/bin/sqoop import \ --connect ... \ --username ... \ --password ... \ --table employees \ --warehouse-dir "..." \ --hive-import \ --delete-target-dir \ --hive-overwrite \ --as-orcfile \ --external-table-dir "..." \ --hs2-url "..." \ --connection-manager com.cloudera.connector.teradata.TeradataManager \ -m 1
Supported combinations of --driver and --connection-manager parameters
The compatibility matrix for driver and connection manager parameters remains unchanged and is the same as in the Sqoop-Teradata-Connector 1.8.3.2c7p1 release.
Other changes introduced through the TDCH upgrade to 1.8.5.1
The following features are added since the TDCH 1.8.3.2 version:
- Features included in the 1.8.5.1 release
-
- TDCH-2005: Update Teradata JDBC driver to 20.00.00.10
- TDCH-2004: Certify TDCH 1.8.x on CDP 7.1.7 SP2
- TDCH-1994: Add support for using both -targetpaths and -targettable for Hive import jobs
- TDCH-2020: Discontinue using internal undocumented interfaces in TDJDBC
- Features included in the 1.8.4.1 release
-
- TDCH-1989: Certify TDCH on CDP 7.1.8
- TDCH-1976: Fix Black Duck Security Issues
- TDCH-1993: Add support for custom staging directory instead of the default /user/<username>/<temp_directory> location
- TDCH-1962: Handling HASHAMP range of SQL query when AMP goes down in DBS
- TDCH-1998: Update Teradata JDBC driver to 17.20.00.12
- TDCH-1997: Include OSS License file (.pdf) in the rpm installation
CDP compatible version of Cloudera Connector Powered by Teradata Version 1.8.3.2c7p1
The changes introduced in this release do not contain a new version of TDCH library, Teradata driver, or any additional required changes from Sqoop; therefore, the CDP compatibility is the same as in Cloudera Connector Powered by Teradata version 1.8.3.2.c7.
- CDP Public Cloud 7.2.9 and later
- CDP Private Cloud Base 7.1.7 and later
Parquet support on CDP versions for HDFS or Hive imports
Some changes made earlier between TDCH 1.8.3.1 and 1.8.3.2 addressed an issue related to Hive API breakage, which made the latest Sqoop Teradata Connector 1.8.3.2c7 incompatible with TDCH 1.8.3.2 resulting in a bug in the Hive import process in Parquet file format.
This release focuses on resolving this incompatibility issue, so that the Teradata Manager can be used to run Hive imports in Parquet file format successfully as it was supposed to in Sqoop Teradata Connector 1.8.3.2c7 release.
Additional step required for importing Teradata to Hive in Parquet format
The --hs2-url argument must be provided explicitly as a Sqoop argument to support Hive JDBC connection with HiveServer2 (HS2) through TDCH.
Configuring user/password based authentication
From this release onwards, you can configure user/password based authentication (like LDAP) when connecting to Hive using Teradata Manager. For this, you must provide the required credentials either in the --hs2-url argument or explicitly using the --hs2-user and --hive-password Sqoop arguments.
Supported commands for Parquet imports
You can use the Parquet feature under the following conditions:
- You can import from Teradata to Hive in Parquet format using one of the following commands:
sqoop import --connect “jdbc:teradata://host/database” --connection-manager org.apache.sqoop.manager.GenericJdbcManager --driver com.teradata.jdbc.TeraDriver --table table1 --target-dir hdfs://ns1/tmp/table1 --hive-import --as-parquetfile
sqoop import --connect “jdbc:teradata://host/database” --connection-manager com.cloudera.connector.teradata.TeradataManager --table table1 --target-dir hdfs://nsq/tmp/table1 --hive-import --as-parquetfile --hs2-url “jdbc:hive2://…"
sqoop import --connect “jdbc:teradata://host/database” --connection-manager com.cloudera.connector.teradata.TeradataManager --table table1 --target-dir hdfs://nsq/tmp/table1 --hive-import --as-parquetfile --hs2-url “jdbc:hive2://…;user=foo;password=bar”
sqoop import --connect “jdbc:teradata://host/database” --connection-manager com.cloudera.connector.teradata.TeradataManager --table table1 --target-dir hdfs://nsq/tmp/table1 --hive-import --as-parquetfile --hs2-url “jdbc:hive2://…” --hs2-user foo --hive-password bar
- You can import from Teradata to HDFS in Parquet format using only Generic JDBC connection manager and with the following options:
sqoop import --connect “jdbc:teradata://host/database” --connection-manager org.apache.sqoop.manager.GenericJdbcManager --driver com.teradata.jdbc.TeraDriver --table table1 --target-dir hdfs://ns1/tmp/table1 --as-parquetfile
Any version of the Sqoop Teradata connector supports this command.
CDP compatible version of Cloudera Connector Powered by Teradata version 1.8.3.2c7 and the TDCH library to version 1.8.3.2
Cloudera Connector Powered by Teradata implements Parquet support in the Sqoop-Connector-Teradata component. In this release, you can use TeradataManager to import Parquet files.
- CDP Private Cloud 7.1.8 and later compatibility
If you install this connector version on CDP Private Cloud 7.1.8, you can import data from the Teradata server to Hive in Parquet using Teradata Manager.
- CDP Public Cloud 7.2.13 and later compatibility
If you install this connector version on CDP Public Cloud 7.2.13, you can import data from the Teradata server to Hive in Parquet format using Teradata Manager.
- CDP Public Cloud 7.2.9 and later
- CDP Private Cloud Base 7.1.7 and later
Parquet support on CDP versions for HDFS or Hive imports
You can use the Parquet feature to import data from the Teradata server to HDFS or Hive in Parquet format using GenericJdbcManager with the Teradata JDBC driver under the following conditions:- Sqoop Teradata Connector 1.8.1c7 (older connector) or 1.8.3.2c7 (latest connector) is installed on one the following CDP versions:
- CDP Public Cloud 7.2.9 - 7.2.12 (earlier CDP Public Cloud version)
- CDP Private Cloud Base 7.1.7 (earlier CDP Private Cloud Base version)
- Sqoop Teradata Connector 1.8.1c7 (earlier connector) is installed on one of the following CDP versions:
- CDP Public Cloud 7.2.13 (latest Public Cloud version) and later compatibility
- CDP Private Cloud 7.1.8 (latest Private Cloud Base version) and later compatibility
As shown above, the latest connector is backward compatible for use on earlier CDP versions and the earlier connector is forward-compatible for use on the later CDP versions.
You must use the supported combinations of --driver and --connection-manager parameters shown below in "Supported combinations of --driver and --connection-manager parameters".
Supported commands for Parquet imports
- You can import from Teradata to Hive in Parquet format using one of the following commands:
sqoop import --connect "jdbc:teradata://host/database" --connection-manager org.apache.sqoop.manager.GenericJdbcManager --driver com.teradata.jdbc.TeraDriver --table table1 --target-dir hdfs://ns1/tmp/table1 --hive-import --as-parquetfile
sqoop import --connect "jdbc:teradata://host/database" --connection-manager com.cloudera.connector.teradata.TeradataManager --table table1 --target-dir hdfs://nsq/tmp/table1 --hive-import --as-parquetfile
- You can import from Teradata to HDFS in Parquet format using only the following options:
sqoop import --connect "jdbc:teradata://host/database" --connection-manager org.apache.sqoop.manager.GenericJdbcManager --driver com.teradata.jdbc.TeraDriver --table table1 --target-dir hdfs://ns1/tmp/table1 --as-parquetfile
Any version of the Sqoop Teradata connector supports this command.
Supported combinations of --driver and --connection-manager parameters
--driver | --connection-manager |
---|---|
- | - |
com.teradata.jdbc.TeraDriver | - |
com.teradata.jdbc.TeraDriver | org.apache.sqoop.manager.GenericJdbcManager |
The following table describes supported combinations when importing Parquet data from Teradata to Hive:
--driver | --connection-manager |
---|---|
- | - |
com.teradata.jdbc.TeraDriver | - |
com.teradata.jdbc.TeraDriver | org.apache.sqoop.manager.GenericJdbcManager |
- | com.cloudera.connector.teradata.TeradataManager |
Other features
The following features have been added to the 1.8.3.2 version:
- TDCH-1972
- Certify TDCH on CDP 7.1.7 SP1 and add support for Hive JDBC with HiveServer2.
The following features are included in the 1.8.3.1 version:
- TDCH-1919
- TDCH support for Kerberos enabled Advanced SQL Engine (TDBMS)
- TDCH-1921
- Add more debug statements for "split.by.hash"
- TDCH-1922
- Add more debug statements for "split.by.value"
- TDCH-1923
- Add more debug statements for "split.by.partition"
- TDCH-1924
- Add more debug statements for "split.by.amp"
- TDCH-1925
- Add more debug statements for "batch.insert"
- TDCH-1950
- Certify TDCH with TDJDBC 17.10
The following features are included in the 1.8.2 release:
- TDCH-1571
- Add Timestamp Support for Parquet in TDCH
- TDCH-1858
- Certify TDCH with Advanced SQL Engine (TDBMS) 17.10
- TDCH-1892
- Adding more debug statements for fastload and fastexport methods for better debugging
- TDCH-1897
- Display the error at exact CLI option instead of generic message
- Supported Teradata Database versions
- Teradata Database 16.00Teradata Database 16.10
- Teradata Database 16.20
- Teradata Database 17.00
- Teradata Database 17.05
- Teradata Database 17.10
- Supported Hadoop versions
- Hadoop 3.1.1
- Supported Hive versions
- Hive 3.1.1
- Hive 3.1.3
- Certified Hadoop distributions
- Cloudera Data Platform (CDP) Private Cloud Base (CDP Datacenter) 7.1.7
- Supported Teradata Wallet versions
- Teradata Wallet 16.20 - since TD Wallet supports multiple versions installed on the system, TD Wallet 16.20 must be installed to use TD Wallet functionality.
CDP compatible version of Cloudera Connector Powered by Teradata Version 1.8.1c7 and the TDCH library to version 1.8.1
- Extended Cloudera Data Platform (CDP) compatibility
- CDP Public Cloud 7.2.9 and later
- CDP Private Cloud Base 7.1.7 and later
CDP compatible version of Cloudera Connector Powered by Teradata Version 1.8c7 and the TDCH library to version 1.8.0
- Extended Cloudera Data Platform (CDP) compatibility
- CDP Public Cloud 7.2.0 - 7.2.8
- CDP Private Cloud Base 7.1.0 - 7.1.6
- Support for sqoop import options --incremental lastmodified and --last-value
CDH 6 compatible version of Cloudera Connector Powered by Teradata 1.7.1c6 Available
Cloudera Connector Powered by Teradata 1.7.1c6 is compatible with CDH 6. It does not contain new features or changes.
CDH 6 compatible version of Cloudera Connector Powered by Teradata 1.7c6 Available
Cloudera Connector Powered by Teradata 1.7c6 is compatible with CDH 6. It does not contain new features or changes.
New Features in Cloudera Connector Powered by Teradata Version 1.7c5
Cloudera Connector Powered by Teradata now supports Teradata 16.x. This release upgrades the JDBC driver to version 16.10.00.05 and the TDCH library to version 1.5.4.
Cloudera Connector Powered by Teradata now supports importing tables without split-by column specified when the number of mappers is set to 1.
- split.by.partition
- split.by.hash
- split.by.value
- split.by.amp
- internal.fastexport
Note that the query import still only supports the split.by.partition input method.
- --fastexport-socket-hostname: Configures the host of the coordinator process. It sets the tdch.input.teradata.fastexport.coordinator.socket.host Java property exposed by the underlying Teradata Connector for Hadoop (TDCH) library.
- --fastexport-socket-port: Configures the port of the coordinator process. It sets the tdch.input.teradata.fastexport.coordinator.socket.port Java property exposed by the underlying Teradata Connector for Hadoop (TDCH) library.
For more information on these properties, see the Teradata Connector for Hadoop tutorial provided by Teradata.
New Features in Cloudera Connector Powered by Teradata Version 1.6.1c5
- Adds support for SLES 12.
New Features in Cloudera Connector Powered by Teradata Version 1.6c5
- Upgrades the JDBC driver to version 15.10.00.22 and the TDCH library to version 1.5.0. These libraries contain several bug fixes and improvements.
- Adds the --schema argument, used to override the <td-instance> value in the connection string of the Sqoop command. For example, if the connection string in the Sqoop command is jdbc:teradata://<td-host>/DATABASE=database1, but you specify --schema database2, your data is imported from database2 and not database1. If the connection string does not contain the DATABASE parameter — for example jdbc:teradata://<td-host>/CHARSET=UTF8) — you can also use the --schema database argument to have Sqoop behave as if you specified the jdbc:teradata://<td-host>/DATABASE=databasename,CHARSET=UTF8 connection string.
New Features in Cloudera Connector Powered by Teradata Version 1.5c5
- Fixed compatibility issue with CDH 5.5.0 and higher.
New Features in Cloudera Connector Powered by Teradata Versions 1.4c5
- Added support for JDK 8.
- Added --error-database option.
- Added ability to specify format of date, time, and timestamp types when importing into CSV.
- Import method split.by.amp now supports views.
- Upgraded Teradata connector for Hadoop to version 1.3.4.
New Features and Changes in Cloudera Connector Powered by Teradata 1.3c5
- Upgraded Teradata Connector for Hadoop to version 1.3.3.
- Parcel distribution now contains Teradata JDBC driver; manual download no longer required.
- Added support for query import into Avro file format.
Changes:
- Export method multiple.fastload has been removed.
New Features in Cloudera Connector Powered by Teradata Versions 1.2c5
- Upgraded Teradata Connector for Hadoop to version 1.2.1.
- Added support for Avro.
- Added support for Incremental import.
- Added support for --where argument.
- Added support for Hive import.
- Added support for Importing all tables using import-all-tables.
- Added support for Query Bands.
- Added new import method split.by.amp (supported only on Teradata 14.10 and higher).
New Features in Cloudera Connector Powered by Teradata Version 1.0.0
- Support for secondary indexes.
- Especially fast performance in most cases.