The HBase-Spark Connector bridges the gap between the simple HBase Key Value store
and complex relational SQL queries and enables users to perform complex data analytics on
top of HBase using Spark.
An HBase DataFrame is a standard Spark DataFrame, and is able to interact with any
other data sources such as Hive, ORC, Parquet, JSON, etc.
-
Edit the HBase RegionServer configuration for running Spark Filter.
Spark Filter is used when Spark SQL Where clauses are in use.
-
In Cloudera Manager, select the HBase
service.
-
Click the Configuration tab.
-
Search for
regionserver environment
.
-
Find the RegionServer Environment Advanced Configuration
Snippet (Safety Valve).
-
Click the plus icon to add the following property:
Key:
HBASE_CLASSPATH
Value:
/opt/cloudera/parcels/CDH/lib/hbase_connectors/lib/[***HBASE-SPARK JAR NAME***].jar:/opt/cloudera/parcels/CDH/lib/hbase_connectors/lib/[***HBASE-SPARK PROTOCOL SHADED JAR NAME***].jar:/opt/cloudera/parcels/CDH/jars/scala-library-2.11.12.jar
-
Ensure that the listed jars have the correct version number in their
name.
-
Click Save Changes.
-
Restart Region Server.
-
Invoke Spark shell with some addition jars using the following snippet:
spark-shell --jars /opt/cloudera/parcels/CDH/lib/hbase_connectors/lib/[***HBASE-SPARK JAR NAME***].jar,/opt/cloudera/parcels/CDH/lib/hbase_connectors/lib/[***HBASE-SPARK PROTOCOL SHADED JAR NAME***].jar --files /etc/hbase/conf/hbase-site.xml --conf spark.driver.extraClassPath=/etc/hbase/conf
Ensure that the listed jars have the correct version number in their
name.