Create the Hive Action to Transform the Data
Hive stores user metadata in HDFS. By default, the location in HDFS for
this metadata is /user/$USERNAME
where $USERNAME is the name of
the user who is accessing Hive. If this directory does not exist with
read/write/execute permissions for the user running the workflow job, the job
fails.
Steps
In the workflow graph, click the connector that follows the Sqoop node, then click the + icon.
Click the Hive icon to add an action node to the workflow.
This Hive action will be used to transform the data you extracted using Sqoop.
Click the Hive node and rename it using a descriptive name.
For example, name the node
hive-mktg-ids
.Tip Spaces are not allowed in action node names.
Click the Hive node again and then click the Action Settings gear icon.
In the Action Settings dialog box, click the Query option.
You can either use a script or a query, but for this example we use Query.
In the Query field, type the query you want to use.
For example:
INSERT OVERWRITE DIRECTORY '/usr/output/marketing/customer_id' SELECT * FROM marketing WHERE Id_Field > 100;
This query extracts from the table named
marketing
all IDs over 100 from the field Id_Field and places the data in/usr/output/marketing/customer_id
.Enter the JDBC URL.
For example:
jdbc:hive2://wfmgr-2.openstacklocal:10000
This entry can be found in Ambari at Hive>Summary on the HiveServer2 JDBC URL line.
Enter the password for the MySQL database.
Use the default settings for the remaining fields and options.
Click Save and close the dialog box.
More Information
Apache Hive Configuration Properties