Learn how to configure the HBase-Spark connector when both the HBase and Spark are on
the same cluster.
- If you are using the HBase-Spark3 connector, ensure that the software
version is 7.1.7 SP1 Spark3 parcel or above.
- Ensure that every Spark node has the HBase Master, Region Server, or Gateway
role assigned to it. If no HBase role is assigned to a Spark node, add the
HBase Gateway role to it, which ensures that the HBase configuration files
are available on the Spark node. For more information, see Managing Roles.
-
Go to the Spark or Spark3 service.
-
Click the Configuration tab.
-
Ensure that the HBase service is selected in Spark
Service as a dependency.
-
Select .
-
Select .
-
Locate the
Spark Client Advanced Configuration Snippet (Safety Valve)
for spark-conf/spark-defaults.conf
property or search for it by
typing its name in the Search box.
-
Add the following properties to ensure that all required Phoenix and HBase
platform dependencies are available on the classpath for the Spark executors and
drivers:
- Spark2:
spark.executor.extraClassPath=/opt/cloudera/parcels/CDH/lib/hbase_connectors/lib/hbase-spark.jar:/opt/cloudera/parcels/CDH/lib/hbase_connectors/lib/hbase-spark-protocol-shaded.jar:/opt/cloudera/parcels/CDH/lib/hbase_connectors/lib/scala-library.jar
spark.driver.extraClassPath=/opt/cloudera/parcels/CDH/lib/hbase_connectors/lib/hbase-spark.jar:/opt/cloudera/parcels/CDH/lib/hbase_connectors/lib/hbase-spark-protocol-shaded.jar:/opt/cloudera/parcels/CDH/lib/hbase_connectors/lib/scala-library.jar
- Spark3:
spark.executor.extraClassPath=/opt/cloudera/parcels/SPARK3/lib/spark3/hbase_connectors/lib/hbase-spark3-***VERSION NUMBER***.jar:/opt/cloudera/parcels/SPARK3/lib/spark3/hbase_connectors/lib/hbase-spark3-protocol-shaded-***VERSION NUMBER***.jar
spark.driver.extraClassPath=/opt/cloudera/parcels/SPARK3/lib/spark3/hbase_connectors/lib/hbase-spark3-***VERSION NUMBER***.jar:/opt/cloudera/parcels/SPARK3/lib/spark3/hbase_connectors/lib/hbase-spark3-protocol-shaded-***VERSION NUMBER***.jar
-
Enter a Reason for change, and then click Save Changes
to commit the changes.
-
Restart the role and service when Cloudera Manager prompts you to
restart.
Perform the following steps while using HBase
RegionServer:
Edit the HBase RegionServer configuration for running Spark Filter. Spark
Filter is used when Spark SQL Where clauses are in use.
- In Cloudera Manager, select the HBase
service.
- Click the Configuration tab.
- Search for
regionserver environment
.
- Find the RegionServer Environment Advanced Configuration
Snippet (Safety Valve).
- Click the plus icon to add the following
property:
Key:
HBASE_CLASSPATH
Value:
/opt/cloudera/parcels/CDH/lib/hbase_connectors/lib/hbase-spark-***VERSION NUMBER***-198.jar:/opt/cloudera/parcels/CDH/lib/hbase_connectors/lib/hbase-spark-protocol-shaded-***VERSION NUMBER***-198.jar:/opt/cloudera/parcels/CDH/jars/scala-library-2.11.12.jar
- Ensure that the listed jars have the correct version number in their
name.
- Click Save Changes.
- Restart the Region Server.
- Build a Spark or Spark3 application using the dependencies that you provide when
you run your application. If you follow the previous instructions, Cloudera
Manager automatically configures the connector. If you have not, add the
necessary parameters to the command line when running the
spark-submit
or spark3-submit
command.--conf spark.executor.extraClassPath=/opt/cloudera/parcels/CDH/lib/hbase-spark-protocol-shaded.jar:/opt/cloudera/parcels/CDH/lib/hbase_connectors/lib/scala-library.jar --conf spark.driver.extraClassPath=/opt/cloudera/parcels/CDH/lib/hbase-spark-protocol-shaded.jar:/opt/cloudera/parcels/CDH/lib/hbase_connectors/lib/scala-library.jar
- Consider the following example while using a Spark3
application.
spark3-shell --conf spark.executor.extraClassPath=/opt/cloudera/parcels/SPARK3/lib/spark3/hbase_connectors/lib/hbase-spark3-***VERSION NUMBER***.jar:/opt/cloudera/parcels/SPARK3/lib/spark3/hbase_connectors/lib/hbase-spark3-protocol-shaded-***VERSION NUMBER***.jar:/etc/hbase/conf/ --conf spark.driver.extraClassPath=/opt/cloudera/parcels/SPARK3/lib/spark3/hbase_connectors/lib/hbase-spark3-***VERSION NUMBER***.jar:/opt/cloudera/parcels/SPARK3/lib/spark3/hbase_connectors/lib/hbase-spark3-protocol-shaded-***VERSION NUMBER***.jar:/etc/hbase/conf/