In a few steps, you configure Apache Spark to connect to the Apache Hive metastore.
An example shows how to configure Direct Reader reads while launching the Spark
shell.
This procedure assumes you require serialization and sets
spark.kryo.registrator=com.qubole.spark.hiveacid.util.HiveAcidKyroRegistrator
.
For secure clusters, additional configurations will be needed by spark.
.
-
In Cloudera Manager, in Hosts > Roles, if Hive Metastore
appears in the list of roles, copy the host name or IP address.
You use the host name or IP address in the next step to set the host value.
-
Launch the Spark shell and include the Direct Reader configurations.
For
example:
spark-shell --jars ./hive-warehouse-connector-assembly-<version>.jar \
--master yarn \
--conf spark.sql.extensions="com.hortonworks.spark.sql.rule.Extensions" \
--conf spark.kryo.registrator=com.qubole.spark.hiveacid.util.HiveAcidKyroRegistrator \
--conf spark.sql.hive.hiveserver2.jdbc.url="jdbc:hive2://<domain name>:<port>/default;principal=hive/_HOST@ROOT.HWX.SITE;retries=5;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2"
--conf spark.datasource.hive.warehouse.read.mode=DIRECT_READER_V1
-
Read data in table
customer
.
View data in table
customer
.
scala>
spark.sql("select c_customer_sk, c_customer_id, c_last_name, c_birth_country from customer where c_birth_year=1983 limit 2 ").show()
21/02/08 11:03:31 INFO rule.HWCSwitchRule: using DIRECT_READER_V1 extension for reading
+-------------+----------------+-----------+---------------+
|c_customer_sk| c_customer_id|c_last_name|c_birth_country|
+-------------+----------------+-----------+---------------+
| 55634|AAAAAAAACFJNAAAA| Campbell| THAILAND|
| 74213|AAAAAAAAFOBCBAAA| Hudgins| KYRGYZSTAN|
+-------------+----------------+-----------+---------------+