Integrating Apache Hive with Apache Spark and BI

Introduction to HWC

HWC securely accesses Hive managed tables from Spark. You need to use Hive Warehouse Connector (HWC) software to query Apache Hive managed tables from Apache Spark.

To read Hive external tables from Spark, you do not need HWC. Spark uses native Spark to read external tables. If you configure HWC to work with managed tables, you can use the same configuration to work with external tables.

Supported applications and operations🔗

The Hive Warehouse Connector supports the following applications:

Spark shell
PySpark
The spark-submit script
Zeppelin with the Livy interpreter

The following list describes a few of the operations supported by the Hive Warehouse Connector:

Describing a table
Creating a table in ORC using .createTable() or in any format using .executeUpdate()
Writing to a table in ORC format
Selecting Hive data and retrieving a DataFrame
Writing a DataFrame to a Hive-managed ORC table in batch
Executing a Hive update statement
Reading table data, transforming it in Spark, and writing it to a new Hive table
Writing a DataFrame or Spark stream to Hive using HiveStreaming
Partitioning data when writing a DataFrame

We want your opinion

How can we improve this page?

What kind of feedback do you have?