HWC changes from HDP to Cloudera

You need to understand the Hive Warehouse Connector (HWC) changes from HDP to Cloudera. Extensive HWC documentation can prepare you to update your HWC code to run on Cloudera. In Cloudera, methods and the configuration of the HWC connections differ from HDP.

Deprecated methods

The following methods have been deprecated in Cloudera Runtime 7.1.7:
  • hive.execute()
  • hive.executeQuery()

The HWC interface is simplified in Cloudera resulting in the convergence of the execute/ executeQuery methods to the sql method. The execute/ executeQuery methods are deprecated and will be removed from Cloudera in a future release. Historical calls to execute/ executeQuery used the JDBC connection and were limited to 1000 records. The 1000 record limitation does not apply to the sqlmethod, although using JDBC cluster mode is recommended only for production for workloads having a data size of 1GB or less. Larger workloads are not recommended for JDBC reads in production due to slow performance.

Although the old methods are still supported in Cloudera for backward compatibility, refactoring your code to use the sql method for all configurations (JDBC client, Direct Reader V1 or V2, and Secure Access modes) is highly recommended.

Recommended method refactoring

The following table shows the recommended method refactoring:
API From HDP To Cloudera HDP Example Cloudera Example
HWC sql API execute and executeQuery methods sql method hive.execute("select * from default.hwctest").show(1, false) hive.sql("select * from default.hwctest").show(1, false)
Spark sql API sql and spark.read.table methods No change

sql("select * from managedTable").show

scala> spark.read.table("managedTable").show

sql("select * from managedTable").show

scala> spark.read.table("managedTable").show

DataFrames API spark.read.format method No change val df = spark.read.format("HiveAcid").options(Map("table" -> "default.acidtbl")).load() val df = spark.read.format("HiveAcid").options(Map("table" -> "default.acidtbl")).load()

Deprecated and changed configurations

HWC read configuration is simplified in Cloudera. You use a common configuration for Spark Direct Reader, JDBC Cluster, or Secure Access mode.

The following HWC configurations that you might have used in HDP 3.1.5 cannot be used in Cloudera:
  • --conf spark.hadoop.hive.llap.daemon.service.hosts
  • --conf spark.hadoop.hive.zookeeper.quorum

Recommended configuration refactoring

Refactor configuration code to remove unsupported configurations. Use the following common configuration property: spark.datasource.hive.warehouse.read.mode.

You can transparently read data from Spark with HWC in different modes using just spark.sql("<query>").

Secured cluster configurations

In secured cluster configurations, you must set configurations as described in the HWC documentation for Cloudera:
  • --conf "spark.security.credentials.hiveserver2.enabled=true"
  • --conf "spark.sql.hive.hiveserver2.jdbc.url.principal=hive/_HOST@ROOT.HWX.SITE"

    The jdbc url must not contain the jdbc url principal and must be passed as shown here.

Deprecated features

The following features have been deprecated and will be removed from Cloudera in a future release:
  • Catalog browsing
  • JDBC client mode configuration