Configure Phoenix-Spark connector using Cloudera Manager

  1. Go to the Spark service.
  2. Click the Configuration tab.
  3. Select Scope > Gateway.
  4. Select Category > Advanced.
  5. Locate the Spark Client Advanced Configuration Snippet (Safety Valve) for spark-conf/spark-defaults.conf property or search for it by typing its name in the Search box.
  6. Add the following properties to ensure that all required Phoenix and HBase platform dependencies are available on the classpath for the Spark executors and drivers:

    Phoenix-Spark JARs:

    spark.executor.extraClassPath=phoenix5-spark-[***VERSION***].jar
    spark.driver.extraClassPath=phoenix5-spark-[***VERSION***].jar
  7. Enter a Reason for change, and then click Save Changes to commit the changes.
  8. Restart the role and service when Cloudera Manager prompts you to restart.
  • Before you can use Phoenix-Spark connector for your Spark applications, you must configure your Maven settings to have a repository that points to the repository at https://repository.cloudera.com/artifactory/public/org/apache/phoenix/phoenix5-spark/ and use the dependency:
    <dependency>
       <groupId>org.apache.phoenix</groupId>
       <artifactId>phoenix-spark</artifactId>
       <version>5.1.0-cdh7</version>
       <scope>provided</scope>
    </dependency>
  • Enable your IDE by adding the following dependency to your build:
    <dependency>
        <groupId>org.apache.phoenix</groupId>
        <artifactId>phoenix-spark</artifactId>
        <version>${phoenix.version}</version>
        <scope>provided</scope>
    </dependency>
  • Build a Spark application using the Phoenix-Spark connector with the dependencies that are present in the connector.
  • Build a Spark application using the dependencies that you provide when you run your application. Use the --jars /opt/cloudera/parcels/CDH/lib/phoenix_connectors/phoenix5-spark-[***VERSION***]-shaded.jar parameter when running the spark-submit command.