Direct Reader limitations

You must understand the limitations of Direct Reader mode and what functionality is not supported.

Limitations

  • You cannot write data using HWC Direct Reader.
  • Transaction semantics of Spark RDDs are not ensured when using Spark Direct Reader to read ACID tables.
  • Supports only single-table transaction consistency. The direct reader does not guarantee that multiple tables referenced in a query read the same snapshot of data.
  • This mode does not require a Hive Server (HS2) connection, therefore, the audit event is generated by HMS, which captures just the type of access (for example, SELECT) and does not capture all the details (about columns). Also, the audit event does not log the actual query (SQL).
  • Does not auto-commit transactions submitted by dataframe or rdd APIs. Explicitly close transactions to release locks.
    Some of the operations, such as df.cache, df.persist, and df.rdd open transactions but do not close them. This is expected and the transactions are closed automatically after the end of the spark application. If such transactions have to be closed immediately, then it is recommended to execute the following in Spark:
    com.qubole.spark.hiveacid.transaction.HiveAcidTxnManagerObject.commitTxn(spark)
  • Does not support Ranger authorization.

    You must configure read access to the HDFS, or other, location for managed tables. You must have Read and Execute permissions on hive warehouse location (hive.metastore.warehouse.dir).

  • Blocks compaction on open read transactions.
  • Does not support storage-handlers.
The way Spark handles null and empty strings can cause a discrepancy between metadata and actual data when writing the data read by Spark Direct Reader to a CSV file.

Unsupported functionality

Spark Direct Reader does not support the following functionality:
  • Writes
  • Streaming inserts
  • CTAS statements