Integrating Apache Hive with Apache Spark and BIPDF version

Configuring JDBC execution mode

In two steps, you configure Apache Spark to connect to HiveServer (HS2). An example shows how to configure this mode while launching the Spark shell.

  • Accept the default and recommended spark.datasource.hive.warehouse.read.jdbc.mode=cluster for the location of query execution.
  • Accept the default spark.datasource.hive.warehouse.load.staging.dir for the temporary staging location required by HWC.
  • Check that spark.hadoop.hive.zookeeper.quorum is configured.
  • Set Kerberos configurations for HWC, or for an unsecured cluster, set spark.security.credentials.hiveserver2.enabled=false.
  1. Find the HiveServer (HS2) JDBC URL in /etc/hive/conf.cloudera.HIVE_ON_TEZ-1/beeline-site.xml
    The value of beeline.hs2.jdbc.url.HIVE_ON_TEZ-1 is the HS2 JDBC URL in this sample file.
    ...
    <configuration>
     <property>
     <name>beeline.hs2.jdbc.url.default</name>
     <value>HIVE_ON_TEZ-1</value>
     </property>
     <property>
     <name>beeline.hs2.jdbc.url.HIVE_ON_TEZ-1</name>
     <value>jdbc:hive2://nightly7x-unsecure-1.nightly7x-unsecure.root.hwx.site:2181/;serviceDiscoveryMode=zooKeeper; \
        zooKeeperNamespace=hiveserver2;retries=5</value>
     </property>
    </configuration>         
  2. Set the Spark property to the value of the HS2 JDBC URL.
    For example, in /opt/cloudera/parcels/CDH-7.2.1-1.cdh7.2.1.p0.4847773/etc/spark/conf.dist/spark-defaults.conf, add the JDBC URL:
    ...
    spark.sql.hive.hiveserver2.jdbc.url spark.sql.hive.hiveserver2.jdbc.url jdbc:hive2://nightly7x-unsecure-1.nightly7x-unsecure.root.hwx.site:2181/;serviceDiscoveryMode=zooKeeper; \
      zooKeeperNamespace=hiveserver2;retries=5