HWC limitations
You need to be aware of HWC limitations, including Kerberos properties that are not allowed, and unsupported operations and connections. These limitations are in addition to Direct Reader mode, JDBC mode, and HWC and DataFrames API limitations.
General HWC limitations are:
- HWC supports reading tables in any format, but currently supports writing tables in ORC format only.
- The spark and livy thrift servers are not supported.
- The Hive Union types are not supported.
Workaround for using the Hive Warehouse Connector with Oozie Spark action
Hive and Spark use different Thrift versions and are incompatible with each other. Upgrading Thrift in Hive is complicated and may not be resolved in the near future. Therefore, Thrift packages are shaded inside the HWC JAR to make Hive Warehouse Connector work with Spark and Oozie Spark action. See the workaround in Cloudera Oozie documentation.
Secured cluster configurations
Set the following configurations in a secured cluster:
--conf "spark.security.credentials.hiveserver2.enabled=true"
-
--conf "spark.sql.hive.hiveserver2.jdbc.url.principal=hive/_HOST@ROOT.HWX.SITE"
The jdbc url must not contain the jdbc url principal and must be passed as shown here.