Managing streaming with Hive Warehouse Connector
Understand how Hive Warehouse Connector uses HMS for transaction management and directly writes ORC files to Hive table locations without relying on HiveServer2.
Hive Warehouse Connector does not rely on HiveServer2 for streaming. Instead, it interacts with HMS for transaction management and writes ORC bucket files directly to the table's location.
An example of using the
DATAFRAME_TO_STREAM
method for non-streaming
writes:myDF.write.format(DATAFRAME_TO_STREAM)
.option("metastoreUri", "thrift://jkovacs-1.jkovacs.root.hwx.site:9083")
.option("metastoreKrbPrincipal", "hive/_HOST@AD.HALXG.CLOUDERA.COM")
.option("database", "default")
.option("table", "hwctest")
.save()
Important Notes:
- Always pre-create the Hive table before writing to it.
- Ensure that the Spark session user has appropriate permissions for the table's file system location.
- Verify that the Hive Metastore URI is correctly configured in the options.