Configuring HBase-Spark connector using Cloudera Manager when HBase is on a remote cluster
Learn how to configure the HBase-Spark connector when HBase is residing on a remote cluster.
If you are using the HBase-Spark3 connector, ensure that the software version is 7.1.7 SP1 Spark3 parcel or above.
Build a Spark or Spark3 application using the dependencies that you provide when you
run your application. If you follow the previous instructions, Cloudera Manager
automatically configures the connector for Spark. If you have not:
- Consider the following example while using a Spark2
application:
spark-shell --conf spark.jars=hdfs:///path/hbase_jars_spark2/hbase-spark-protocol-shaded.jar,hdfs:///path/hbase_jars_spark2/hbase-spark.jar,hdfs:///path/hbase_jars_spark2/scala-library.jar,hdfs:///path/hbase_jars_common/hbase-shaded-mapreduce-***VERSION NUMBER***.jar,hdfs:///path/hbase_jars_common/opentelemetry-api-***VERSION NUMBER***.jar,hdfs:///path/hbase_jars_common/opentelemetry-context-***VERSION NUMBER***.jar --conf spark.security.credentials.hbase.enabled=true --conf spark.hadoop.hbase.zookeeper.quorum=remote.zookeeper.example.com --conf spark.hadoop.hbase.regionserver.kerberos.principal=hbase/_HOST@REMOTE.EXAMPLE .COM --conf spark.hadoop.hbase.security.authentication=kerberos --conf spark.yarn.access.hadoopFileSystems=hdfs://remote.hdfs.namenode.example.com:802
- Consider the following example while using a Spark3
application:
spark3-shell --conf spark.jars=hdfs:///path/hbase_jars_spark3/hbase-spark3-protocol-shaded.jar,hdfs:///path/hbase_jars_spark3/hbase-spark3.jar,hdfs:///path/hbase_jars_common/hbase-shaded-mapreduce-***VERSION NUMBER***.jar,hdfs:///path/hbase_jars_common/opentelemetry-api-***VERSION NUMBER***.jar,hdfs:///path/hbase_jars_common/opentelemetry-context-***VERSION NUMBER***.jar --conf spark.security.credentials.hbase.enabled=true --conf spark.hadoop.hbase.zookeeper.quorum=remote.zookeeper.example.com --conf spark.hadoop.hbase.regionserver.kerberos.principal=hbase/_HOST@REMOTE.EXAMPLE.COM --conf spark.hadoop.hbase.security.authentication=kerberos --conf spark.kerberos.access.hadoopFileSystems=hdfs://remote.hdfs.namenode.example.com:8020 –conf spark.hadoop.hbase.rpc.protection=privacy