Configure HBase-Spark connector using Cloudera Manager
The HBase-Spark Connector bridges the gap between the simple HBase Key Value store and complex relational SQL queries. It enables users to perform complex data analytics on top of HBase using Spark.
An HBase DataFrame is a standard Spark DataFrame, and is able to interact with any other data sources such as Hive, ORC, Parquet, and JSON.
- Enable your IDE by adding the following dependency to your
build:
<dependency> <groupId>org.apache.hbase</groupId> <artifactId>hbase-spark</artifactId> <version>[***VERSION EXAMPLE: 6.0.0.7.2.10.0-297***]</version> <scope>provided</scope> </dependency>
- Build a Spark application using the dependencies that you provide when you run
your application. If you follow the previous instructions, Cloudera Manager
automatically configures the connector. If you have not, add the necessary
parameters to the command line when running the
spark-submit
command.--conf spark.executor.extraClassPath=/opt/cloudera/parcels/CDH/lib/hbase-spark-protocol-shaded.jar:opt/cloudera/parcels/CDH/jars/scala-library-2.11.12.jar --conf spark.driver.extraClassPath=/opt/cloudera/parcels/CDH/lib/hbase-spark-protocol-shaded.jar:opt/cloudera/parcels/CDH/jars/scala-library-2.11.12.jar