Configure Phoenix-Spark connector using Cloudera Manager

  1. Follow Step 1 to Step 7(b) as mentioned in Configure HBase-Spark connector using Cloudera Manager.
  2. Add the required properties to ensure that all required Phoenix and HBase platform dependencies are available on the classpath for the Spark executors and drivers.
    1. Upload the Phoenix-Spark3 connector file:
      hdfs dfs -put /opt/cloudera/parcels/cdh/lib/phoenix_connectors/phoenix5-spark3-shaded.jar /path/hbase_jars_spark3
    2. Upload the Phoenix-Spark2 connector file:
      hdfs dfs -put /opt/cloudera/parcels/cdh/lib/phoenix_connectors/phoenix5-spark-shaded.jar /path/hbase_jars_spark2
    3. Add all the Phoenix-Spark connector related files that you have just uploaded in the previous steps to the spark.jars parameter:
      • Spark2:
        spark.jars=hdfs:///path/hbase_jars_common/hbase-site.xml.jar,hdfs:///path/hbase_jars_spark2/hbase-spark-protocol-shaded.jar,hdfs:///path/hbase_jars_spark2/hbase-spark.jar,hdfs:///path/hbase_jars_spark2/scala-library.jar,/path/hbase_jars_common(other common files)...,hdfs:////path/hbase_jars_spark2/phoenix5-spark-shaded.jar
      • Spark3:
        spark.jars=hdfs:///path/hbase_jars_common/hbase-site.xml.jar,hdfs:///path/hbase_jars_spark3/hbase-spark3.jar,hdfs:///path/hbase_jars_spark3/hbase-spark3-protocol-shaded.jar,/path/hbase_jars_common(other common files)...,hdfs:////path/hbase_jars_spark3/phoenix5-spark3-shaded.jar
  3. Enter a Reason for change, and then click Save Changes to commit the changes.
  4. Restart the role and service when Cloudera Manager prompts you to restart.

Build a Spark or Spark3 application using the dependencies that you provide when you run your application. If you follow the previous instructions, Cloudera Manager automatically configures the connector for Spark. If you have not:

  • Consider the following example while using a Spark2 application:
    spark-shell --conf spark.jars=hdfs:///path/hbase_jars_common/hbase-site.xml.jar,hdfs:///path/hbase_jars_spark2/hbase-spark-protocol-shaded.jar,hdfs:///path/hbase_jars_spark2/hbase-spark.jar,hdfs:///path/hbase_jars_spark2/scala-library.jar,hdfs:///path/hbase_jars_common/hbase-shaded-mapreduce-***VERSION NUMBER***.jar,hdfs:///path/hbase_jars_common/opentelemetry-api-***VERSION NUMBER***.jar,hdfs:///path/hbase_jars_common/opentelemetry-context-***VERSION NUMBER***.jar,hdfs:////path/hbase_jars_spark2/phoenix5-spark-shaded.jar
  • Consider the following example while using a Spark3 application:
    spark3-shell --conf spark.jars=hdfs:///path/hbase_jars_common/hbase-site.xml.jar,hdfs:///path/hbase_jars_spark3/hbase-spark3-protocol-shaded.jar,hdfs:///path/hbase_jars_spark3/hbase-spark3.jar,hdfs:///path/hbase_jars_common/hbase-shaded-mapreduce-***VERSION NUMBER***.jar,hdfs:///path/hbase_jars_common/opentelemetry-api-***VERSION NUMBER***.jar,hdfs:///path/hbase_jars_common/opentelemetry-context-***VERSION NUMBER***.jar,hdfs:////path/hbase_jars_spark3/phoenix5-spark3-shaded.jar