HDP Data Services
Also available as:
PDF

Moving Data into Hive

There are multiple methods of moving data into Hive. How you move the data into Hive depends on the source format of the data and the target data format that is required. Generally, ORC is the preferred target data format because of the performance enhancements that it provides.

The following methods are most commonly used:

Table 1.5. Most Common Methods to Move Data into Hive

Source of Data

Target Data Format in Hive

Method Description

ETL for legacy systems

ORC file format

  1. Move data into HDFS.

  2. Use an external table to move data from HDFS to Hive.

  3. Then use Hive to convert the data to the ORC file format.

Operational SQL database

ORC file format

  1. Use Sqoop to import the data from the SQL database into Hive.

  2. Then use Hive to convert the data to the ORC file format.

Streaming source that is "append only"

ORC file format

  1. Write directly to the ORC file format using the Hive Streaming feature.