Configure Phoenix-Spark connector using Cloudera Manager

  1. Go to the Spark service.
  2. Click the Configuration tab.
  3. Select Scope > Gateway.
  4. Select Category > Advanced.
  5. Locate the Spark Client Advanced Configuration Snippet (Safety Valve) for spark-conf/spark-defaults.conf property or search for it by typing its name in the Search box.
  6. Add the following properties to ensure that all required Phoenix and HBase platform dependencies are available on the classpath for the Spark executors and drivers:

    Phoenix client JARs:

    spark.executor.extraClassPath=/opt/cloudera/parcels/CDH/lib/phoenix_connectors/phoenix5-spark-shaded.jar
    spark.driver.extraClassPath=/opt/cloudera/parcels/CDH/lib/phoenix_connectors/phoenix5-spark-shaded.jar

    If your spark and hbase are running on the same instance, then skip to step 7.

    If you are using the Phoenix-Spark connector to connect to an HBase instance outside of the cluster, run the hbase mapredcp command on the remote cluster.
    • Copy all JAR files listed in the output to the local cluster, and add the JAR files to both *extraClasspath properties.
    • Copy the directory containing hbase-site.xml from the remote cluster, and add it to the *extraClasspath properties.
    spark.executor.extraClassPath=/copied/hbase-site.xml,/copied/phoenix5-spark-shaded.jar,/copied/hbase-shaded-mapreduce-2.1.6.3.1.5.0-152
    .jar,<rest of hbase mapredcp jars>
    The Phoenix-Spark connector can have different names. For CDH6, HDP3, and CDP 7.1.5 and earlier, and CDP 7.2.1 to CDP 7.2.8 you must use phoenix-client.jar. For 7.16, 7.1.7, 7.2.9, 7.2.10, 7.2.11, and higher versions you must use phoenix5-spark-shaded.jar.
  7. Enter a Reason for change, and then click Save Changes to commit the changes.
  8. Restart the role and service when Cloudera Manager prompts you to restart.
  • Before you can use Phoenix-Spark connector for your Spark applications, you must configure your Maven settings to have a repository that points to the repository at https://repository.cloudera.com/artifactory/public/org/apache/phoenix/phoenix-spark/ and use the dependency:
    <dependency>
       <groupId>org.apache.phoenix</groupId>
       <artifactId>phoenix5-spark-shaded</artifactId>
       <version>[***VERSION EXAMPLE: 6.0.0.7.2.10.0-297***]</version>
       <scope>provided</scope>
    </dependency>
  • Enable your IDE by adding the following dependency to your build:
    <dependency>
        <groupId>org.apache.phoenix</groupId>
        <artifactId>phoenix5-spark-shaded</artifactId>
        <version>[***VERSION EXAMPLE: 6.0.0.7.2.10.0-297***]</version>
        <scope>provided</scope>
    </dependency>
  • Build a Spark application using the Phoenix-Spark connector with the dependencies that are present in the connector.
  • Build a Spark application using the dependencies that you provide when you run your application. Use the --jars /opt/cloudera/parcels/CDH/lib/phoenix_connectors/phoenix5-spark-[***VERSION***]-shaded.jar parameter when running the spark-submit command.