Integrating Apache Hive with Apache Spark and BI

Configuring Spark Direct Reader mode

In a two-step procedure, you see how to configure Apache Spark to connect to the Apache Hive metastore. An example shows how to configure Spark Direct Reader mode while launching the Spark shell.

This procedure assumes you are not using Auto Translate and do not require serialization.

Set Kerberos configurations for HWC, or for an unsecured cluster, set spark.security.credentials.hiveserver2.enabled=false.

In Cloudera Manager, in Hosts > Roles, if Hive Metastore appears in the list of roles, copy the host name or IP address.
You use the host name or IP address in the next step to set the host value.
Launch the Spark shell and include the configuration of the spark.hadoop.hive.metastore.uris property to thrift://<host>:<port>.
For example:
```
spark-shell --jars /opt/cloudera/parcels/CDH/jars/hive-warehouse-connector-assembly-<version>.jar \
--conf "spark.hadoop.hive.metastore.uris=thrift://172.27.74.137:9083"
... <other conf strings>
                
```
If you use the HWC API, configure spark.sql.hive.hwc.execution.mode=spark

We want your opinion

How can we improve this page?

What kind of feedback do you have?