Integrating Apache Hive with Kafka, Spark, and BI
Also available as:
PDF

HiveWarehouseSession API operations

As a Spark developer, you execute queries to Hive using the JDBC-style HiveWarehouseSession API that supports Scala, Java, and Python. In Spark source code, you create an instance of HiveWarehouseSession. Results are returned as a DataFrame to Spark.

Import statements and variables

The following string constants are defined by the API:

  • HIVE_WAREHOUSE_CONNECTOR
  • DATAFRAME_TO_STREAM
  • STREAM_TO_STREAM

For more information, see the Github project for the Hive Warehouse Connector.

Assuming spark is running in an existing SparkSession, use this code for imports:

  • Scala
    import com.hortonworks.hwc.HiveWarehouseSession
    import com.hortonworks.hwc.HiveWarehouseSession._
    val hive = HiveWarehouseSession.session(spark).build()
  • Java
    import com.hortonworks.hwc.HiveWarehouseSession;
    import static com.hortonworks.hwc.HiveWarehouseSession.*;
    HiveWarehouseSession hive = HiveWarehouseSession.session(spark).build();
  • Python
    from pyspark_llap import HiveWarehouseSession
    hive = HiveWarehouseSession.session(spark).build()