HWC and DataFrame API limitations

These limitations are in addition to Direct Reader mode, JDBC mode, Secure access mode, and HWC and DataFrames API limitations.

  • Table stats (basic stats and column stats) are not generated when you write a DataFrame to Hive.
  • When the HWC API save mode is overwrite, writes are limited.

    You cannot read from and overwrite the same table. If your query accesses only one table and you try to overwrite that table using an HWC API write method, a deadlock state might occur. Do not attempt this operation.

    Example: Operation Not Supported

    scala> val df = hive.executeQuery("select * from t1")
    scala> df.write.format("com.hortonworks.spark.sql.hive.llap.HiveWarehouseConnector"). \
    mode("overwrite").option("table", "t1").save     
  • Automatic table creation during HWC write operation is not supported in Spark 3.

    If you are using Spark 3 and performing a DataFrame write using HWC with the SaveMode.Append or SaveMode.Overwrite modes, the write operation fails if the table you are writing to does not exist. These modes can save a DataFrame only if the table you are writing to, already exists.

    For example, the following statement tries to append to a table, "ex2" that does not exist:
    df.write.format(HIVE_WAREHOUSE_CONNECTOR).mode("append").option("table", "ex2").save()
    The statement results in the following exception:
    22/05/26 12:01:12 ERROR llap.HiveWarehouseConnector: Failed to connect to HS2
    org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED: SemanticException [Error 10001]: Line 1:29 Table not found 'ex2'
    	at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:373)
    	at org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:359)
    	at org.apache.hive.jdbc.HiveStatement.runAsyncOnServer(HiveStatement.java:334)
    	at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:275)
          …
          …
    

    Alternatively, you can use the SaveMode.ErrorIfExists or SaveMode.Ignore modes to save DataFrames to a table.