Understanding Apache Phoenix-Spark connector
You can use Apache Phoenix-Spark connector on your secured clusters to perform READ and WRITE operations. The Phoenix-Spark connector allows Spark to load Phoenix tables as Resilient Distributed Datasets (RDDs) or DataFrames, and lets you save them back to Phoenix.
Connect to a secure cluster
You can connect to a secured cluster using the Phoenix JDBC connector. Enter the following syntax in the shell:
jdbc:phoenix:<ZK hostnames>:<ZK port>:<root znode>:<principal name>:<keytab file location> jdbc:phoenix:h1.cdh.local,h2.cdh.local,h3.cdh.local:2181:/hbase-secure:user1@cdh.LOCAL:/Users/user1/keytabs/myuser.headless.keytab
You need Principal and keytab parameters only if you have not done the kinit before starting the job and want Phoenix to log you in automatically.
Considerations for setting up Spark
- Before you can use Phoenix-Spark connector for your Spark programs, you must
configure your maven settings to have a repository that points to the password
protected repository at https://https://archive.cloudera.com/p/cdh7/188.8.131.52/maven-repository/
and use the dependency:
<dependency> <groupId>org.apache.phoenix</groupId> <artifactId>phoenix-spark</artifactId> <version>5.0.0-cdh7</version> <scope>provided</scope> </dependency>
You can access the Maven repository using your Enterprise Support Subscription credentials.