Data migration to Apache Hive
To migrate data from an RDBMS, such as MySQL, to Hive, you should consider Apache Sqoop in CDP with the Teradata connector. Apache Sqoop client CLI-based tool bulk transfers of data between relational databases and HDFS or cloud object stores including Amazon S3 and Microsoft ADLS.
The Sqoop client is available in CDP Private Cloud Base and in CDP Public Cloud.
In CDP Private Cloud Base, HDFS is typically the source of legacy system data that needs to undergo an extract, transform, and load (ETL) process. In CDP Public Cloud, S3 is typically the source of legacy data. You can also import data in delimited text (default) or SequenceFile format, and then convert data to ORC format recommended for Hive. Generally, for querying the data in Hive, ORC is the preferred format because of the performance enhancements ORC provides.
Using Sqoop with the Teradata Connector
CDP does not support the Sqoop exports using the Hadoop jar
command (the Java
API). The connector documentation from Teradata includes instructions that include the use of
this API. CDP users have reportedly mistaken the unsupported API commands, such as
-forcestage
, for the supported Sqoop commands, such as
–-staging-force
. Cloudera supports the use of Sqoop only with commands documented in
Using the Cloudera Connector Powered by Teradata.
Cloudera does not support using Sqoop with Hadoop jar
commands, such as those
described in the Teradata Connector for Hadoop Tutorial.