Configure Phoenix-Spark connector using Cloudera Manager

  1. Follow Step 1 to Step 7(b) as mentioned in Configure HBase-Spark connector using Cloudera Manager.
  2. Add the required properties to ensure that all required Phoenix and HBase platform dependencies are available on the classpath for the Spark executors and drivers.
    1. Upload the Phoenix-Spark3 connector file:
      hdfs dfs -put /opt/cloudera/parcels/cdh/lib/phoenix_connectors/phoenix5-spark3-shaded.jar /path/hbase_jars_spark3
    2. Add all the Phoenix-Spark connector related files that you have just uploaded in the previous steps to the spark.jars parameter:
      • Spark3:
        spark.jars=hdfs:///path/hbase_jars_common/hbase-site.xml.jar,hdfs:///path/hbase_jars_spark3/hbase-spark3.jar,hdfs:///path/hbase_jars_spark3/hbase-spark3-protocol-shaded.jar,/path/hbase_jars_common(other common files)...,hdfs:////path/hbase_jars_spark3/phoenix5-spark3-shaded.jar
  3. Enter a Reason for change, and then click Save Changes to commit the changes.
  4. Restart the role and service when Cloudera Manager prompts you to restart.

Build a Spark application using the dependencies that you provide when you run your application. If you follow the previous instructions, Cloudera Manager automatically configures the connector for Spark. If you have not:

Consider the following example while using a Spark application:
spark3-shell --conf spark.executor.extraClassPath=/opt/cloudera/parcels/CDHlib/phoenix_connectors/phoenix5-spark3-shaded-6.x.y.VERSION.jar --conf spark.jars=hdfs:///path/hbase_jars_common/hbase-site.xml.jar,hdfs:///path/hbase_jars_spark3/hbase-spark3-protocol-shaded.jar,hdfs:///path/hbase_jars_spark3/hbase-spark3.jar,hdfs:///path/hbase_jars_common/hbase-shaded-mapreduce-***VERSION NUMBER***.jar,hdfs:///path/hbase_jars_common/opentelemetry-api-***VERSION NUMBER***.jar,hdfs:///path/hbase_jars_common/opentelemetry-context-***VERSION NUMBER***.jar,hdfs:////path/hbase_jars_spark3/phoenix5-spark3-shaded.jar

If you run the spark3-submit command in the Yarn cluster mode, update the example as below.

spark3-submit --conf spark.driver.extraClassPath=/opt/cloudera/parcels/CDHlib/phoenix_connectors/phoenix5-spark3-shaded-6.x.y.VERSION.jar  --conf spark.executor.extraClassPath=/opt/cloudera/parcels/CDHlib/phoenix_connectors/phoenix5-spark3-shaded-6.x.y.VERSION.jar  spark.jars=hdfs:///path/hbase_jars_common/hbase-site.xml.jar,hdfs:///path/hbase_jars_spark3/hbase-spark3-protocol-shaded.jar,hdfs:///path/hbase_jars_spark3/hbase-spark3.jar,hdfs:///path/hbase_jars_common/hbase-shaded-mapreduce**VERSION NUMBER*.jar,hdfs:///path/hbase_jars_common/opentelemetry-api-*VERSION NUMBER*.jar,hdfs:///path/hbase_jars_common/opentelemetry-context-*VERSION NUMBER**.jar,hdfs:////path/hbase_jars_spark3/phoenix5-spark3-shaded.jar"