Using Apache Phoenix-Spark connector

You can use the Apache Phoenix-Spark connector on your secure clusters to perform READ and WRITE operations. The Phoenix-Spark connector allows Spark to load Phoenix tables as Resilient Distributed Datasets (RDDs) or DataFrames and lets you save them back to Phoenix.

Connect to a secure cluster

You can connect to a secured cluster using the Phoenix JDBC connector. Enter the following syntax in the shell:

jdbc:phoenix:[***ZOOKEEPER HOSTNAMES***]:[***ZOOKEEPER PORT***]:[***ROOT znode***]:[***PRINCIPAL NAME***]:[***KEYTAB FILE LOCATION***]
For example:
jdbc:phoenix:h1.cdh.local,h2.cdh.local,h3.cdh.local:2181:/hbase-secure:user1@cdh.LOCAL:/Users/user1/keytabs/myuser.headless.keytab

You need Principal and keytab parameters only if you have not done the kinit before starting the job and want Phoenix to log you in automatically.

Considerations for setting up Spark

  • Before you can use Phoenix-Spark connector for your Spark programs, you must configure your Maven settings to have a repository that points to the repository at https://repository.cloudera.com/artifactory/public/org/apache/phoenix/phoenix-spark/ and use the dependency:
    <dependency>
       <groupId>org.apache.phoenix</groupId>
       <artifactId>phoenix-spark</artifactId>
       <version>5.1.0-cdh7</version>
       <scope>provided</scope>
    </dependency>