Configure Phoenix-Spark connector using Cloudera Manager

When using the Phoenix-Spark connector, you need the Spark connector JAR file. You can find this JAR file in the following location: /opt/cloudera/parcels/CDH/lib/phoenix_connectors

  1. Go to the Spark service.
  2. Click the Configuration tab.
  3. If you are using the HBase service on the same cluster, ensure that the HBase service is set as a dependent of the Spark service.

    Locate the HBase Service property and select the checkbox next to it.

  4. Select Scope > Gateway.
  5. Select Category > Advanced.
  6. Locate the Spark Client Advanced Configuration Snippet (Safety Valve) for spark-conf/spark-defaults.conf property or search for it by typing its name in the Search box.
  7. Add the following properties to ensure that all required Phoenix and HBase platform dependencies are available on the classpath for the Spark executors and drivers:

    Phoenix client JARs:

    spark.executor.extraClassPath=/opt/cloudera/parcels/CDH/lib/phoenix_connectors/phoenix5-spark-shaded.jar
    spark.driver.extraClassPath=/opt/cloudera/parcels/CDH/lib/phoenix_connectors/phoenix5-spark-shaded.jar

    If you are using the Phoenix-Spark connector to connect to an HBase instance outside of the cluster, you must include the following additional JAR files in your Spark classpath:

    /opt/cloudera/parcels/CDH-[***VERSION***]jars/hbase-shaded-mapreduce-[***VERSION***].jar

    You must replace the [***VERSION***] with the version number of the Cloudera Runtime. You can find the version number using the following command:

    find /opt/cloudera -iname hbase-shaded-mapreduce*.jar

    Copy and add hbase-site.xml from HBase cluster. You must take the HBase and Phoenix connector JAR files from the cluster where your HBase and Phoenix is running.

  8. Enter a Reason for change, and then click Save Changes to commit the changes.
  9. Restart the role and service when Cloudera Manager prompts you to restart.
  • Before you can use Phoenix-Spark connector for your Spark applications, you must configure your Maven settings to have a repository that points to the repository at https://repository.cloudera.com/artifactory/public/org/apache/phoenix/phoenix-spark/ and use the dependency:
    <dependency>
       <groupId>org.apache.phoenix</groupId>
       <artifactId>phoenix5-spark</artifactId>
       <version>[***VERSION EXAMPLE: 6.0.0.7.2.10.0-297***]</version>
       <scope>provided</scope>
    </dependency>
  • Enable your IDE by adding the following dependency to your build:
    <dependency>
        <groupId>org.apache.phoenix</groupId>
        <artifactId>phoenix5-spark</artifactId>
        <version>[***VERSION EXAMPLE: 6.0.0.7.2.10.0-297***]</version>
        <scope>provided</scope>
    </dependency>
  • Build a Spark application using the Phoenix-Spark connector with the dependencies that are present in the connector.
  • Build a Spark application using the dependencies that you provide when you run your application. Use the --jars /opt/cloudera/parcels/CDH/lib/phoenix_connectors/phoenix5-spark-[***VERSION***]-shaded.jar parameter when running the spark-submit command.