Using Direct Reader mode

In a few steps, you configure Apache Spark to connect to the Apache Hive metastore. An example shows how to configure Direct Reader reads while launching the Spark shell.

This procedure assumes you require serialization and sets spark.kryo.registrator=com.qubole.spark.hiveacid.util.HiveAcidKyroRegistrator.

For secure clusters, additional configurations will be needed by spark.

  1. In Cloudera Manager, in Hosts > Roles, if Hive Metastore appears in the list of roles, copy the host name or IP address.
    You use the host name or IP address in the next step to set the host value.
  2. Launch the Spark shell and include the Direct Reader configurations.
    For example:
    spark-shell --jars ./hive-warehouse-connector-assembly-<version>.jar \
    --master yarn \
    --conf spark.sql.extensions="com.hortonworks.spark.sql.rule.Extensions" \
    --conf spark.kryo.registrator=com.qubole.spark.hiveacid.util.HiveAcidKyroRegistrator \
    --conf spark.sql.hive.hiveserver2.jdbc.url="jdbc:hive2://<domain name>:<port>/default;principal=hive/_HOST@ROOT.HWX.SITE;retries=5;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2"
  3. Read data in table customer.
    View data in table customer.
    hive.sql("select c_customer_sk, c_customer_id, c_last_name, c_birth_country from customer where c_birth_year=1983 limit 2 ").show()
    21/02/08 11:03:31 INFO rule.HWCSwitchRule: using DIRECT_READER_V2 extension for reading
    |c_customer_sk|   c_customer_id|c_last_name|c_birth_country|
    |        55634|AAAAAAAACFJNAAAA|   Campbell|       THAILAND|
    |        74213|AAAAAAAAFOBCBAAA|    Hudgins|     KYRGYZSTAN|