Using Hive Warehouse Connector with Oozie Spark Action
Hive and Spark use different Thrift versions which are not compatible with each other. If you have the Hive Warehouse Connector (HWC) JAR in Oozie's Spark classpath, there will be conflicting Hive classes. This is because it can come from Oozie's default Spark classpath with the original signature and also from the HWC JAR with the changed signature because of the shading process.
This shading process changes the signature of some Hive classes inside the HWC JAR because the HWC JAR is a fat JAR and contains Hive classes as well. Oozie's Spark action also has Hive libraries on its classpath (added as part of the Cloudera stack) because you can run simple Hive commands with Oozie's Spark action (not with HWC but on its own). You can also run Hive actions with Hive Warehouse Connector through Oozie's Spark action.
You can resolve this issue using one of the following options:
If you are only using HWC with Oozie's Spark action and not executing simple Hive commands,
you can place the HWC JAR in Oozie's Spark ShareLib. You can then remove all other Hive JARs from Oozie's Spark ShareLib.
If you are executing both simple Hive commands and using HWC through Oozie's Spark
action, placing the HWC JAR in Oozie's Spark ShareLib is not recommended. You must choose
a different option offered by Oozie like placing it next to the workflow.xml or placing it
on HDFS and specifying it in the workflow.xml using a <file> tag and so on. In this
case, you should exclude the other Hive JARs from the classpath when running an
Oozie-Spark-HWC action. You can achieve this by adding the following in the job.properties
file for your workflow: