Configuring HBase-Spark connector using Cloudera Manager when HBase is on a remote cluster

Learn how to configure the HBase-Spark connector when HBase is residing on a remote cluster.

If you are using the HBase-Spark3 connector, ensure that the software version is 7.1.7 SP1 Spark3 parcel or above.

  1. Go to the Spark or Spark3 service.
  2. Click the Configuration tab.
  3. Select Scope > Gateway.
  4. Select Category > Advanced.
  5. Locate the spark-defaults.conf property or search for it by typing its name in the Search box.
    • Spark2: Locate the Spark Client Advanced Configuration Snippet (Safety Valve) for spark-conf/spark-defaults.conf property or search for it by typing its name in the Search box.
    • Spark3: Locate the Spark 3 Client Advanced Configuration Snippet (Safety Valve) for spark3-conf/spark-defaults.conf property or search for it by typing its name in the Search box.
  6. Add the required properties to ensure that all required Phoenix and HBase platform dependencies are available on the classpath for the Spark executors and drivers.
  7. Add the following properties
    • to Spark Client Advanced Configuration Snippet (Safety Valve) for spark-conf/spark-defaults.conf:
      spark.security.credentials.hbase.enabled=true
      spark.hadoop.hbase.zookeeper.quorum=remote.zookeeper.example.com
      spark.hadoop.hbase.rpc.protection=privacy
      spark.hadoop.hbase.regionserver.kerberos.principal=hbase/_HOST@REMOTE.EXAMPLE.COM
      spark.hadoop.hbase.security.authentication=kerberos
      spark.yarn.access.hadoopFileSystems=hdfs://remote.hdfs.namenode.example.com:8020
    • to Spark 3 Client Advanced Configuration Snippet (Safety Valve) for spark3-conf/spark-defaults.conf:
      spark.security.credentials.hbase.enabled=true
      spark.hadoop.hbase.zookeeper.quorum=remote.zookeeper.example.com
      spark.hadoop.hbase.regionserver.kerberos.principal=hbase/_HOST@REMOTE.EXAMPLE.COM
      spark.hadoop.hbase.security.authentication=kerberos
      spark.kerberos.access.hadoopFileSystems=hdfs://remote.hdfs.namenode.example.com:8020
  8. Enter a Reason for change, and then click Save Changes to commit the changes.
  9. Restart the role and service when Cloudera Manager prompts you to restart.

    Perform the following steps while using HBase RegionServer:

    Edit the HBase RegionServer configuration for running Spark Filter. Spark Filter is used when Spark SQL Where clauses are in use.
    1. In Cloudera Manager, select the HBase service.
    2. Click the Configuration tab.
    3. Search for regionserver environment.
    4. Find the RegionServer Environment Advanced Configuration Snippet (Safety Valve).
    5. Click the plus icon to add the following property:
      • For Spark 2:

        Key: HBASE_CLASSPATH

        Value:
        /opt/cloudera/parcels/CDH/lib/hbase_connectors/lib/hbase-spark-***VERSION NUMBER***-198.jar:/opt/cloudera/parcels/CDH/lib/hbase_connectors/lib/hbase-spark-protocol-shaded-***VERSION NUMBER***-198.jar:/opt/cloudera/parcels/CDH/jars/scala-library-2.11.12.jar
      • For Spark 3:

        Key: HBASE_CLASSPATH

        Value:
        /opt/cloudera/parcels/SPARK3/lib/spark3/hbase_connectors/lib/hbase-spark3-protocol-shaded-***VERSION NUMBER***.jar:/opt/cloudera/parcels/SPARK3/lib/spark3/hbase_connectors/lib/hbase-spark3-***VERSION NUMBER***.jar:/opt/cloudera/parcels/SPARK3/lib/spark3/jars/scala-library-2.12.15.jar
    6. Ensure that the listed jars have the correct version number in their name.
    7. Click Save Changes.
    8. Restart the Region Server.
Build a Spark or Spark3 application using the dependencies that you provide when you run your application. If you follow the previous instructions, Cloudera Manager automatically configures the connector for Spark. If you have not:
  • Consider the following example while using a Spark2 application:
    spark-shell --conf spark.jars=hdfs:///path/hbase_jars_spark2/hbase-spark-protocol-shaded.jar,hdfs:///path/hbase_jars_spark2/hbase-spark.jar,hdfs:///path/hbase_jars_spark2/scala-library.jar,hdfs:///path/hbase_jars_common/hbase-shaded-mapreduce-***VERSION NUMBER***.jar,hdfs:///path/hbase_jars_common/opentelemetry-api-***VERSION NUMBER***.jar,hdfs:///path/hbase_jars_common/opentelemetry-context-***VERSION NUMBER***.jar --conf spark.security.credentials.hbase.enabled=true --conf spark.hadoop.hbase.zookeeper.quorum=remote.zookeeper.example.com --conf spark.hadoop.hbase.regionserver.kerberos.principal=hbase/_HOST@REMOTE.EXAMPLE .COM --conf spark.hadoop.hbase.security.authentication=kerberos --conf spark.yarn.access.hadoopFileSystems=hdfs://remote.hdfs.namenode.example.com:802
  • Consider the following example while using a Spark3 application:
    spark3-shell --conf spark.jars=hdfs:///path/hbase_jars_spark3/hbase-spark3-protocol-shaded.jar,hdfs:///path/hbase_jars_spark3/hbase-spark3.jar,hdfs:///path/hbase_jars_common/hbase-shaded-mapreduce-***VERSION NUMBER***.jar,hdfs:///path/hbase_jars_common/opentelemetry-api-***VERSION NUMBER***.jar,hdfs:///path/hbase_jars_common/opentelemetry-context-***VERSION NUMBER***.jar  --conf spark.security.credentials.hbase.enabled=true --conf spark.hadoop.hbase.zookeeper.quorum=remote.zookeeper.example.com --conf spark.hadoop.hbase.regionserver.kerberos.principal=hbase/_HOST@REMOTE.EXAMPLE.COM --conf spark.hadoop.hbase.security.authentication=kerberos --conf spark.kerberos.access.hadoopFileSystems=hdfs://remote.hdfs.namenode.example.com:8020 –conf spark.hadoop.hbase.rpc.protection=privacy