Apache Spark access to Apache Hive

From Apache Spark, you access ACID tables and external tables in Apache Hive 3 using the Hive Warehouse Connector.

The HiveWarehouseConnector library is a Spark library built on top of Apache Arrow for reading and writing Hive ACID and external tables from Spark.

In CDP Public Cloud, the Hive Warehouse Connector is designed to leverage the LLAP cache and optimized for fast transmission of data using low-latency analytical processing (LLAP). The connector orchestrates a distributed read from LLAP daemons. The read from cache occurs after applying security rules and ACID transformations. CDP Public Cloud uses LLAP to read ACID, or other Hive-managed tables, from Spark. You do not need LLAP to write to ACID, or other managed tables, from Spark. You do not need LLAP to access external tables from Spark. The HWC library internally uses the Hive Streaming API and LOAD DATA Hive commands to write the data.

In CDP Private Cloud Base, the Hive Warehouse Connector uses JDBC to transmit data.