Understanding Apache Phoenix-Spark connector
You can use the Apache Phoenix-Spark connector on your secure clusters to perform READ and WRITE operations. The Phoenix-Spark connector allows Spark to load Phoenix tables as Resilient Distributed Datasets (RDDs) or DataFrames and lets you save them back to Phoenix.
Connect to a secure cluster
You can connect to a secured cluster using the Phoenix JDBC connector. Enter the following syntax in the shell:
jdbc:phoenix:<ZK hostnames>:<ZK port>:<root znode>:<principal name>:<keytab file location>
jdbc:phoenix:h1.cdh.local,h2.cdh.local,h3.cdh.local:2181:/hbase-secure:user1@cdh.LOCAL:/Users/user1/keytabs/myuser.headless.keytab
You need Principal and keytab parameters only if you have not run the
kinit
command before starting the job and want Phoenix to log you
in automatically.
Considerations for setting up Spark
- Before you can use Phoenix-Spark connector for your Spark programs, you must
configure your maven settings to have a repository that points to the password
protected repository at https://repository.cloudera.com/artifactory/public/org/apache/phoenix/phoenix-spark/ and use the dependency:
<dependency> <groupId>org.apache.phoenix</groupId> <artifactId>phoenix-spark</artifactId> <version>5.0.0-cdh7</version> <scope>provided</scope> </dependency>
You can access the Maven repository using your Enterprise Support Subscription credentials.