Data migration to Apache Hive
Sqoop is a tool for bulk importing and exporting data from diverse data sources to HDFS and Hive. The following diagram shows the process for moving data into Hive:
HDFS is typically the source of legacy system data that needs to undergo an extract,
transform, and load (ETL) process. You can also import data in delimited text (default) or
SequenceFile format, and then convert data to ORC format recommended for Hive. Generally,
for querying the data in Hive, ORC is the preferred format because of the performance
enhancements ORC provides. The following diagram shows an example of a common parallel and
distributed conversion of data to ORC for querying in Hive: