Configuring Phoenix-Spark connector using Cloudera Manager when Phoenix is on a remote cluster

Learn how to configure the Phoenix-Spark connector when Phoenix is residing on a remote cluster.

If you are using the Phoenix-Spark3 connector, ensure that the software version is 7.1.7 SP1 Spark3 parcel or above.

  1. Perform the steps mentioned in Configuring HBase-Spark connector when HBase is on remote cluster.
  2. Add the required properties to ensure that all required Phoenix and HBase platform dependencies are available on the classpath for the Spark executors and drivers.
    • Upload the Spark2 connector:
      hdfs dfs -put /opt/cloudera/parcels/CDH/lib/phoenix_connectors/phoenix5-spark-shaded.jar /path/hbase_jars_spark2
    • Upload the Spark3 connector:
      hdfs dfs -put /opt/cloudera/parcels/SPARK3/lib/spark3/phoenix_connectors/phoenix5-spark3-shaded.jar /path/hbase_jars_spark3
    • Add the above files to the end of the existing list of files spark.jars parameter:
      • Spark2:
        spark.jars=<Jars already added for the Hbase-Spark connector>,hdfs:///path/hbase_jars_spark2/phoenix5-spark-shaded.jar
      • Spark3:
        spark.jars=<Jars already added for the Hbase-Spark connector>,hdfs:///path/hbase_jars_spark3/phoenix5-spark3-shaded.jar
  3. Enter a Reason for change, and then click Save Changes to commit the changes.
  4. Restart the role and service when Cloudera Manager prompts you to restart.
Build a Spark or Spark3 application using the dependencies that you provide when you run your application. If you follow the previous instructions, Cloudera Manager automatically configures the connector for Spark. If you have not:
  • Consider the following example while using a Spark2 application:
    spark-shell --conf spark.jars=hdfs:///path/hbase_jars_common/hbase-site.xml.jar,hdfs:///path/hbase_jars_spark2/hbase-spark-protocol-shaded.jar,hdfs:///path/hbase_jars_spark2/hbase-spark.jar,hdfs:///path/hbase_jars_spark2/scala-library.jar,hdfs:///path/hbase_jars_common/hbase-shaded-mapreduce-***VERSION NUMBER***.jar,hdfs:///path/hbase_jars_common/opentelemetry-api-***VERSION NUMBER***.jar,hdfs:///path/hbase_jars_common/opentelemetry-context-***VERSION NUMBER***.jar,hdfs:///path/hbase_jars_spark2/phoenix5-spark-shaded.jar
  • Consider the following example while using a Spark3 application:
    spark3-shell --conf spark.jars=hdfs:///path/hbase_jars_common/hbase-site.xml.jar,hdfs:///path/hbase_jars_spark3/hbase-spark3-protocol-shaded.jar,hdfs:///path/hbase_jars_spark3/hbase-spark3.jar,hdfs:///path/hbase_jars_common/hbase-shaded-mapreduce-***VERSION NUMBER***.jar,hdfs:///path/hbase_jars_common/opentelemetry-api-***VERSION NUMBER***.jar,hdfs:///path/hbase_jars_common/opentelemetry-context-***VERSION NUMBER***.jar,hdfs:///path/hbase_jars_spark3/phoenix5-spark3-shaded.jar