HWC changes from HDP to CDP
You need to understand the Hive Warehouse Connector (HWC) changes from HDP to CDP. Extensive HWC documentation can prepare you to update your HWC code to run on CDP. In CDP, methods and the configuration of the HWC connections differ from HDP.
Deprecated methods
- hive.execute()
- hive.executeQuery()
The HWC interface is simplified in CDP resulting in the convergence of the
execute
/ executeQuery
methods to the
sql
method. The execute
/ executeQuery
methods are deprecated and will be removed from CDP in a future release. Historical calls to
execute
/ executeQuery
used the JDBC connection and were
limited to 1000 records. The 1000 record limitation does not apply to the
sql
method, although using JDBC cluster mode is recommended only for
production for workloads having a data size of 1GB or less. Larger workloads are not
recommended for JDBC reads in production due to slow performance.
Although the old methods are still supported in CDP for backward compatibility, refactoring
your code to use the sql
method for all configurations (JDBC client, Direct
Reader V1 or V2, and Secure Access modes) is highly recommended.
Recommended method refactoring
API | From HDP | To CDP | HDP Example | CDP Example |
---|---|---|---|---|
HWC sql API | execute and executeQuery methods |
sql method |
hive.execute("select * from default.hwctest").show(1,
false) |
hive.sql("select * from default.hwctest").show(1,
false) |
Spark sql API | sql and spark.read.table methods |
No change |
|
|
DataFrames API | spark.read.format method |
No change | val df = spark.read.format("HiveAcid").options(Map("table" ->
"default.acidtbl")).load() |
val df = spark.read.format("HiveAcid").options(Map("table" ->
"default.acidtbl")).load() |
Deprecated and changed configurations
HWC read configuration is simplified in CDP. You use a common configuration for Spark Direct Reader, JDBC Cluster, or Secure Access mode.
--conf spark.hadoop.hive.llap.daemon.service.hosts
--conf spark.hadoop.hive.zookeeper.quorum
Recommended configuration refactoring
Refactor configuration code to remove unsupported configurations. Use the following common
configuration property: spark.datasource.hive.warehouse.read.mode
.
You can transparently read data from Spark with HWC in different modes using just
spark.sql("<query>")
.
Secured cluster configurations
--conf "spark.security.credentials.hiveserver2.enabled=true"
--conf "spark.sql.hive.hi
veserver2.jdbc.url.principal=hive/_HOST@ROOT.HWX.SITE"
The jdbc url must not contain the jdbc url principal and must be passed as shown here.
Deprecated features
- Catalog browsing
- JDBC client mode configuration