Chapter 6. Sample ETL Use Case
You can use Sqoop and Hive actions in a workflow to perform a common ETL flow: extract data from a relational database using Sqoop, transform the data in a Hive table, and load the data into a data warehouse using Sqoop.
Following is an example of a common simple Sqoop>Hive>Sqoop ETL workflow created in Workflow Manager. In this example, we extract customer data from a MySQL database, select specific data to include in a Hive table, and then load the data into a data warehouse.
Prerequisites
To successfully execute the example Sqoop>Hive>Sqoop ETL workflow defined below, the following prerequisites must be met.
Apache Hive and Apache Sqoop have been successfully installed and configured.
You successfully completed the tasks in "Configuring WorkFlow Manager View" in the Ambari Views guide.
All node managers must be able to communicate with the MySQL server.
Workflow Tasks
The sample workflow consists of the following:
Create an HDFS Directory for Each New User Create a Proxy User Copy JAR Files Create and Submit the Workflow
Create the Sqoop Action to Extract Data Create the Hive Action to Transform Data Create the Sqoop Action to Load Data