In a two-step procedure, you see how to configure Apache Spark to connect to the
Apache Hive metastore. An example shows how to configure Spark Direct Reader mode while
launching the Spark shell.
This procedure assumes you are not using Auto Translate and do not require serialization.
Set Kerberos configurations for HWC, or for an unsecured cluster, set
spark.security.credentials.hiveserver2.enabled=false
.
-
In Cloudera Manager, in Hosts > Roles, if Hive Metastore
appears in the list of roles, copy the host name or IP address.
You use the host name or IP address in the next step to set the host value.
-
Launch the Spark shell and include the configuration of the
spark.hadoop.hive.metastore.uris
property to
thrift://<host>:<port>.
For
example:
spark-shell --jars /opt/cloudera/parcels/CDH/jars/hive-warehouse-connector-assembly-<version>.jar \
--conf "spark.hadoop.hive.metastore.uris=thrift://172.27.74.137:9083"
... <other conf strings>
If you use the HWC API, configure
spark.sql.hive.hwc.execution.mode=spark