Using Hive Warehouse Connector with Oozie Spark Action

Hive and Spark use different Thrift versions which are not compatible with each other. If you have the Hive Warehouse Connector (HWC) JAR in Oozie's Spark classpath, there will be conflicting Hive classes. This is because it can come from Oozie's default Spark classpath with the original signature and also from the HWC JAR with the changed signature because of the shading process.

Hive and Spark use different Thrift versions and are incompatible with each other. Upgrading Thrift in Hive is complicated and may not be resolved in the near future. Therefore, Thrift packages are shaded inside the HWC JAR to make Hive Warehouse Connector work with Spark and Oozie’s Spark action.

This shading process changes the signature of some Hive classes inside the HWC JAR because the HWC JAR is a fat JAR and contains Hive classes as well. Oozie's Spark action also has Hive libraries on its classpath (added as part of the Cloudera stack) because you can execute simple Hive commands with Oozie's Spark action (not with HWC but on its own). You can also execute Hive actions with Hive Warehouse Connector through Oozie's Spark action.

You can resolve this issue using one of the following options:

  • If you are only using HWC with Oozie's Spark action and not executing simple Hive commands, you can place the HWC JAR in Oozie's Spark ShareLib. You can then remove all other Hive JARs from Oozie's Spark ShareLib.

    or

  • If you are executing both simple Hive commands and using HWC through Oozie's Spark action, placing the HWC JAR in Oozie's Spark ShareLib is not recommended. You must choose a different option offered by Oozie like placing it next to the workflow.xml or placing it on HDFS and specifying it in the workflow.xml using a <file> tag and so on. In this case, you should exclude the other Hive JARs from the classpath when running an Oozie-Spark-HWC action. You can achieve this by adding the following in the job.properties file for your workflow:
    oozie.action.ShareLib.for.spark.exclude=^.*\/hive\-(?!warehouse-connector).*\.jar$