Workflow Management
Also available as:
PDF
loading table of contents...

Create the Hive Action to Transform the Data

Hive stores user metadata in HDFS. By default, the location in HDFS for this metadata is /user/$USERNAME where $USERNAME is the name of the user who is accessing Hive. If this directory does not exist with read/write/execute permissions for the user running the workflow job, the job fails.

Steps

  1. In the workflow graph, click the connector that follows the Sqoop node, then click the + icon.

  2. Click the Hive icon to add an action node to the workflow.

    This Hive action will be used to transform the data you extracted using Sqoop.

  3. Click the Hive node and rename it using a descriptive name.

    For example, name the node hive-mktg-ids.

  4. Click the Hive node again and then click the Action Settings gear icon.

  5. In the Action Settings dialog box, click the Query option.

    You can either use a script or a query, but for this example we use Query.

  6. In the Query field, type the query you want to use.

    For example:

    INSERT OVERWRITE DIRECTORY '/usr/output/marketing/customer_id' SELECT * FROM marketing WHERE Id_Field > 100;
    

    This query extracts from the table named marketing all IDs over 100 from the field Id_Field and places the data in /usr/output/marketing/customer_id.

  7. Enter the JDBC URL.

    For example: jdbc:hive2://wfmgr-2.openstacklocal:10000

    This entry can be found in Ambari at Hive>Summary on the HiveServer2 JDBC URL line.

  8. Enter the password for the MySQL database.

  9. Use the default settings for the remaining fields and options.

  10. Click Save and close the dialog box.

More Information

Apache Hive Configuration Properties

Apache Hive Action

Apache Hive Operators and Functions

Apache Hive Query Language

Apache Hive2 Action

Apache Beeline Hive Commands