As an Apache Spark developer, you learn the code constructs for executing Apache Hive queries using the HiveWarehouseSession API. In Spark source code, you see how to create an instance of HiveWarehouseSession.
Import statements and variables
The following string constants are defined by the API:
spark is running in an existing
use this code for imports:
import com.hortonworks.hwc.HiveWarehouseSession import com.hortonworks.hwc.HiveWarehouseSession._ val hive = HiveWarehouseSession.session(spark).build()
import com.hortonworks.hwc.HiveWarehouseSession; import static com.hortonworks.hwc.HiveWarehouseSession.*; HiveWarehouseSession hive = HiveWarehouseSession.session(spark).build();
from pyspark_llap import HiveWarehouseSession hive = HiveWarehouseSession.session(spark).build()
Executes queries in any HWC mode.
Consistent with the Spark sql interface.
- Masks the internal implementation based on cluster type.
- Required for executing queries if spark.datasource.hive.warehouse.read.jdbc.mode = client (default = cluster).
- Uses a driver side JDBC connection.
- Provided for backward compatibility where the method defaults to reading in JDBC client mode irrespective of the value of JDBC client or cluster mode configuration.
- Recommended for catalog queries.
- Executes queries, except catalog queries, in LLAP mode (spark.datasource.hive.warehouse.read.via.llap= true)
- If LLAP is not enabled in the cluster, .executeQuery() does not work. CDP Data Center does not support LLAP.
- Provided for backward compatibility.
Results are returned as a DataFrame to Spark.