Migrating data
Also available as:
PDF

Data migration to Apache Hive

Sqoop is a tool for bulk importing and exporting data from diverse data sources to HDFS and Hive. The following diagram shows the process for moving data into Hive:



HDFS is typically the source of legacy system data that needs to undergo an extract, transform, and load (ETL) process. You can also import data in delimited text (default) or SequenceFile format, and then convert data to ORC format recommended for Hive. Generally, for querying the data in Hive, ORC is the preferred format because of the performance enhancements ORC provides. The following diagram shows an example of a common parallel and distributed conversion of data to ORC for querying in Hive: