Using Hive Warehouse Connector with Oozie Spark Action
You can use Hive Warehouse Connector (HWC) with Oozie Spark action by updating job.properties file or action-level configurations.
Steps
- Create a new ShareLib using a different name, such as hwc.
- Place the HWC JAR onto the new ShareLib. For information about placing HWC JARs in the new ShareLib, see the Appendix - Creating a new ‘hwc’ ShareLib section below.
- Execute a ShareLib update.
- When executing a Spark action using the HWC include the following properties in
the job.properties
file:
oozie.action.sharelib.for.spark=spark,hwc
You can update the action-level configurations to execute Hive commands using both HWC and non-HWC. If you have a workflow which contains an action where you would like to use HWC and another action where you do not want to use HWC, you can achieve the same by specifying the ShareLib properties at the action level.
Example
<spark xmlns="uri:oozie:spark-action:1.0">
...
<configuration>
<property xmlns="">
<name>oozie.action.sharelib.for.spark</name>
<value>spark,hwc</value>
</property>
</configuration>
...
</spark>
Appendix - Creating a new ‘hwc’ ShareLib
The oozie admin commands have to be executed by the oozie user.
- Kinit as oozie.
- Check the current available
ShareLibs:
oozie admin -shareliblist -oozie {url}
- Create the folder for it on HDFS:
hdfs dfs -mkdir /user/oozie/share/lib/lib_{latestTimestamp}/hwc
- Add the JAR files to it from the /opt/cloudera/parcels/CDH/jars
directory:
- hive-warehouse-connector-assembly-1.0.0.***VERSION NUMBER***-XXX.jar
- hive-jdbc-3.1.3000.***VERSION NUMBER***-XXX.jar
- hive-jdbc-handler-3.1.3000.***VERSION NUMBER***-XXX.jar
- hive-service-3.1.3000.***VERSION NUMBER***-XXX.jar
- spark-sql-kafka-0-10_2.11-***VERSION NUMBER***-XXX.jar
- Update the ShareLib
property:
oozie admin -sharelibupdate -oozie {url}
- List the ShareLibs again to check if hwc is
present:
oozie admin -shareliblist -oozie {url}