Introduction to HWC
HWC securely accesses Hive managed tables from Spark. You need to use Hive Warehouse Connector (HWC) software to query Apache Hive managed tables from Apache Spark.
To read Hive external tables from Spark, you do not need HWC. Spark uses native Spark to read external tables. If you configure HWC to work with managed tables, you can use the same configuration to work with external tables.
Supported applications and operations
- Spark 2 (2.4.8 in the current CDP releases)
Spark 3 is not supported, even if deployed from the Cloudera parcel.
- Spark shell
- The spark-submit script
- Zeppelin with the Livy interpreter
- Describing a table
- Creating a table in ORC using .createTable() or in any format using .executeUpdate()
- Writing to a pre-existing or new table in Parquet, ORC, AVRO, or Textfile formats
- Selecting Hive data and retrieving a DataFrame
- Writing a DataFrame to a Hive-managed ORC table in batch
- Executing a Hive update statement
- Reading table data, transforming it in Spark, and writing it to a new Hive table
- Writing a DataFrame or Spark stream to Hive using HiveStreaming
- Partitioning data when writing a DataFrame