Direct Reader limitations
You must understand the limitations of Direct Reader mode and what functionality is not supported.
Limitations
- You cannot write data using HWC Direct Reader.
- Transaction semantics of Spark RDDs are not ensured when using Spark Direct Reader to read ACID tables.
- Supports only single-table transaction consistency. The direct reader does not guarantee that multiple tables referenced in a query read the same snapshot of data.
- This mode does not require a Hive Server (HS2) connection, therefore, the audit event is generated by HMS, which captures just the type of access (for example, SELECT) and does not capture all the details (about columns). Also, the audit event does not log the actual query (SQL).
- Does not auto-commit transactions submitted by dataframe or rdd APIs.
Explicitly close transactions to release locks.Some of the operations, such as
df.cache
,df.persist
, anddf.rdd
open transactions but do not close them. This is expected and the transactions are closed automatically after the end of the spark application. If such transactions have to be closed immediately, then it is recommended to execute the following in Spark:com.qubole.spark.hiveacid.transaction.HiveAcidTxnManagerObject.commitTxn(spark)
- Does not support Ranger authorization.
You must configure read access to the HDFS, or other, location for managed tables. You must have Read and Execute permissions on hive warehouse location (hive.metastore.warehouse.dir).
- Blocks compaction on open read transactions.
- Does not support storage-handlers.
Unsupported functionality
Spark Direct Reader does not support the following functionality:
- Writes
- Streaming inserts
- CTAS statements