Reading and writing Hive tables in R
The Hive Warehouse Connector (HWC) supports reads and writes to Apache Hive managed ACID tables in R. Cloudera provides an R package SparklyrHWC that includes all HWC methods, such as execute and executeQuery, and a spark_write_table method to write to managed tables. The native sparklyr spark_write_table method supports writes to external tables only.
Support
HWC should work with Sparklyr 1.0.4. Versions later than 1.0.4 should also work if interfaces are not changed by sparklyr. However, sparklyr isn't supported by Cloudera. We will support any issues around using HWC from sparklyr.
Downloading SparklyrHWC
You can download the SparklyrHWC R package that includes HWC methods from your CDP Cluster. Go to /opt/cloudera/parcels/CDH/lib/hive_warehouse_connector/. Copy SparklyrHWC to your download location.
- JDBC mode
- Suitable for writing production workloads.
- Suitable for reading production workloads having a data size of 1 GB or less.
- Use this mode for reading if latency is not an issue.
- Spark-ACID mode
- Suitable for reading production workloads.
- Does not support writes
Reading and writing managed tables
You can read Hive managed tables using either JDBC or Spark-ACID mode. The mode you configure affects the background process. You use the same R code regardless of the mode with one exception: You do not need to call the commitTxn(hs) when using JDBC mode.
To write to Hive managed tables, you must connect to HWC in JDBC mode.
Reading and writing external tables
You can read and write Hive external tables in R using the sparklyr package. HWC is not required.
In the following procedure, you configure Spark-Acid execution mode to read tables on a production cluster. You use the native sparklyr spark_read_table and spark_load_table to read Hive managed tables in R.Reading a Hive managed table example
Writing a Hive managed table in R
Supported HWC APIs in R
execute()
andexecuteQuery()
(recommended) APIs to run read SQL statementsexecuteUpdate()
API to run write SQL statements- API call for Hive CTASK operations (create table as select .....)
- Other HWC APIs, such as dropTable, dropDatabase, showTable
Any HWC API in Scala can be used in R. The behavior of these APIs in R and HWC is similar.