HWC changes from HDP to Cloudera
You need to understand the Hive Warehouse Connector (HWC) changes from HDP to Cloudera. Extensive HWC documentation can prepare you to update your HWC code to run on Cloudera. In Cloudera, methods and the configuration of the HWC connections differ from HDP.
Deprecated methods
- hive.execute()
- hive.executeQuery()
The HWC interface is simplified in Cloudera resulting in
the convergence of the execute
/ executeQuery
methods to
the sql
method. The execute
/ executeQuery
methods are deprecated and will be removed from Cloudera in
a future release. Historical calls to execute
/
executeQuery
used the JDBC connection and were limited to 1000 records.
The 1000 record limitation does not apply to the sql
method, although using
JDBC cluster mode is recommended only for production for workloads having a data size of 1GB
or less. Larger workloads are not recommended for JDBC reads in production due to slow
performance.
Although the old methods are still supported in Cloudera
for backward compatibility, refactoring your code to use the sql
method for
all configurations (JDBC client, Direct Reader V1 or V2, and Secure Access modes) is highly
recommended.
Recommended method refactoring
API | From HDP | To Cloudera | HDP Example | Cloudera Example |
---|---|---|---|---|
HWC sql API | execute and executeQuery methods |
sql method |
hive.execute("select * from default.hwctest").show(1,
false) |
hive.sql("select * from default.hwctest").show(1,
false) |
Spark sql API | sql and spark.read.table methods |
No change |
|
|
DataFrames API | spark.read.format method |
No change | val df = spark.read.format("HiveAcid").options(Map("table" ->
"default.acidtbl")).load() |
val df = spark.read.format("HiveAcid").options(Map("table" ->
"default.acidtbl")).load() |
Deprecated and changed configurations
HWC read configuration is simplified in Cloudera. You use a common configuration for Spark Direct Reader, JDBC Cluster, or Secure Access mode.
--conf spark.hadoop.hive.llap.daemon.service.hosts
--conf spark.hadoop.hive.zookeeper.quorum
Recommended configuration refactoring
Refactor configuration code to remove unsupported configurations. Use the following common
configuration property: spark.datasource.hive.warehouse.read.mode
.
You can transparently read data from Spark with HWC in different modes using just
spark.sql("<query>")
.
Secured cluster configurations
--conf "spark.security.credentials.hiveserver2.enabled=true"
--conf "spark.sql.hive.hi
veserver2.jdbc.url.principal=hive/_HOST@ROOT.HWX.SITE"
The jdbc url must not contain the jdbc url principal and must be passed as shown here.
Deprecated features
- Catalog browsing
- JDBC client mode configuration