Using Hive Warehouse Connector with Oozie Spark Action
Hive and Spark use different Thrift versions which are not compatible with each other.
If you have the Hive Warehouse Connector (HWC) JAR in Oozie's Spark classpath, there will be
conflicting Hive classes. This is because it can come from Oozie's default Spark classpath with
the original signature and also from the HWC JAR with the changed signature because of the
shading process.
Hive and Spark use different Thrift versions and are incompatible with each other.
Upgrading Thrift in Hive is complicated and may not be resolved in the near future. Therefore,
Thrift packages are shaded inside the HWC JAR to make Hive Warehouse Connector work with Spark
and Oozie’s Spark action.
This shading process changes the signature of some Hive classes
inside the HWC JAR because the HWC JAR is a fat JAR and contains Hive classes as well.
Oozie's Spark action also has Hive libraries on its classpath (added as part of the Cloudera
stack) because you can run simple Hive commands with Oozie's Spark action (not with HWC but
on its own). You can also run Hive actions with Hive Warehouse Connector through Oozie's
Spark action.
You can resolve this issue using one of the following options:
If you are only using HWC with Oozie's Spark action and not executing simple Hive commands,
you can place the HWC JAR in Oozie's Spark ShareLib. You can then remove all other Hive JARs from Oozie's Spark ShareLib.
or
If you are executing both simple Hive commands and using HWC through Oozie's Spark
action, placing the HWC JAR in Oozie's Spark ShareLib is not recommended. You must choose
a different option offered by Oozie like placing it next to the workflow.xml or placing it
on HDFS and specifying it in the workflow.xml using a <file> tag and so on. In this
case, you should exclude the other Hive JARs from the classpath when running an
Oozie-Spark-HWC action. You can achieve this by adding the following in the job.properties
file for your workflow: