Configure HBase-Spark connector using Cloudera Manager

The HBase-Spark Connector bridges the gap between the simple HBase Key Value store and complex relational SQL queries. It enables users to perform complex data analytics on top of HBase using Spark.

An HBase DataFrame is a standard Spark DataFrame, and is able to interact with any other data sources such as Hive, ORC, Parquet, and JSON.

  1. Go to the Spark service.
  2. Click the Configuration tab.
  3. Ensure that the HBase service is selected in Spark Service as a dependency.
  4. Select Scope > Gateway.
  5. Select Category > Advanced.
  6. Locate the Spark Client Advanced Configuration Snippet (Safety Valve) for spark-conf/spark-defaults.conf property or search for it by typing its name in the Search box.
  7. Add the following properties to ensure that all required Phoenix and HBase platform dependencies are available on the classpath for the Spark executors and drivers:

    If your spark and hbase are running on the same instance, then skip to step 8.

    If you are using the HBase-Spark connector to connect to an HBase instance outside of the cluster, run the hbase mapredcp command on the remote cluster.
    • Copy all JAR files listed in the output to the local cluster, and add the JAR files to both *extraClasspath properties.
    • Copy the directory containing the hbase-site.xml from the remote cluster, and add it to the *extraClasspath properties.
    spark.executor.extraClassPath=/copied/hbase-conf/,/copied/hbase_connectors/lib/hbase-spark.jar:/copied/hbase_connectors/lib/hbase-spark-protocol-shaded.jar:/copied/hbase_connectors/lib/scala-library.jar, /copied/hbase-shaded-mapreduce-,...
  8. Enter a Reason for change, and then click Save Changes to commit the changes.
  9. Restart the role and service when Cloudera Manager prompts you to restart.
  • Enable your IDE by adding the following dependency to your build:
                            <version>[***VERSION EXAMPLE:***]</version>
  • Build a Spark application using the dependencies that you provide when you run your application. If you follow the previous instructions, Cloudera Manager automatically configures the connector. If you have not, add the necessary parameters to the command line when running the spark-submit command.
    --conf spark.executor.extraClassPath=/opt/cloudera/parcels/CDH/lib/hbase-spark-protocol-shaded.jar:opt/cloudera/parcels/CDH/jars/scala-library-2.11.12.jar --conf spark.driver.extraClassPath=/opt/cloudera/parcels/CDH/lib/hbase-spark-protocol-shaded.jar:opt/cloudera/parcels/CDH/jars/scala-library-2.11.12.jar