Data migration to Apache Hive
To migrate data from an RDBMS, such as MySQL, to Hive, you should consider Apache Sqoop in Cloudera with the Teradata connector. Apache Sqoop client CLI-based tool transfers data in bulk between relational databases and HDFS or cloud object stores including Amazon S3 and Microsoft ADLS.
The source of legacy system data that needs to undergo an extract, transform, and load (ETL) process typically resides on the file system or object store. You can also import data in delimited text (default) or SequenceFile format, and then convert data to ORC format recommended for Hive. Generally, for querying the data in Hive, ORC is the preferred format because of the performance enhancements ORC provides.
Teradata Connector for Sqoop
Cloudera does not support the Sqoop exports using the Hadoop
jar
command (the Java API). The connector documentation from Teradata includes
instructions that include the use of this API. Cloudera users
have reportedly mistaken the unsupported API commands, such as -forcestage
, for
the supported Sqoop commands, such as –-staging-force
. Cloudera supports the use of Sqoop only with commands
documented in Using the Cloudera
Connector Powered by Teradata. Cloudera does not
support using Sqoop with Hadoop jar
commands, such as those described in the
Teradata Connector for Hadoop Tutorial.
Apache Sqoop documentation on the Cloudera web site
To access the latest Sqoop documentation, go to Sqoop Documentation 1.4.7.7.1.6.0.