HiveWarehouseSession API operations

As a Spark developer, you execute queries to Hive using the HiveWarehouseSession API that supports Scala, Java, and Python. In Spark source code, you create an instance of HiveWarehouseSession. Results are returned as a DataFrame to Spark.

Import statements and variables

The following string constants are defined by the API:

  • HIVE_WAREHOUSE_CONNECTOR
  • DATAFRAME_TO_STREAM
  • STREAM_TO_STREAM

For more information, see the Github project for the Hive Warehouse Connector (link below).

Assuming spark is running in an existing SparkSession, use this code for imports:

  • Scala
    import com.hortonworks.hwc.HiveWarehouseSession
    import com.hortonworks.hwc.HiveWarehouseSession._
    val hive = HiveWarehouseSession.session(spark).build()
  • Java
    import com.hortonworks.hwc.HiveWarehouseSession;
    import static com.hortonworks.hwc.HiveWarehouseSession.*;
    HiveWarehouseSession hive = HiveWarehouseSession.session(spark).build();
  • Python
    from pyspark_llap import HiveWarehouseSession
    hive = HiveWarehouseSession.session(spark).build()

Executing queries

HWC supports three methods for executing queries:
  • .sql()
    • Executes queries in any HWC mode.

    • Consistent with the Spark sql interface.

    • Masks the internal implementation based on cluster type.
  • .execute()
    • Required for executing queries if spark.datasource.hive.warehouse.read.jdbc.mode = client (default = cluster).
    • Uses a driver side JDBC connection.
    • Provided for backward compatibility where the method defaults to reading in JDBC client mode irrespective of the value of JDBC client or cluster mode configuration.
    • Recommended for catalog queries.
  • .executeQuery()
    • Executes queries, except catalog queries, in LLAP mode (spark.datasource.hive.warehouse.read.via.llap= true)
    • If LLAP is not enabled in the cluster, .executeQuery() does not work.
    • Provided for backward compatibility.
    • Recommended for catalog queries.