Using Apache Phoenix-Spark connector
You can use the Apache Phoenix-Spark connector on your secure clusters to perform READ and WRITE operations. The Phoenix-Spark connector allows Spark to load Phoenix tables as Resilient Distributed Datasets (RDDs) or DataFrames and lets you save them back to Phoenix.
Connect to a secure cluster
You can connect to a secured cluster using the Phoenix JDBC connector. Enter the following syntax in the shell:
jdbc:phoenix:[***ZOOKEEPER HOSTNAMES***]:[***ZOOKEEPER PORT***]:[***ROOT znode***]:[***PRINCIPAL NAME***]:[***KEYTAB FILE LOCATION***]
For
example:
jdbc:phoenix:h1.cdh.local,h2.cdh.local,h3.cdh.local:2181:/hbase-secure:user1@cdh.LOCAL:/Users/user1/keytabs/myuser.headless.keytab
You need Principal and keytab parameters only if you have not done the kinit before starting the job and want Phoenix to log you in automatically.
Considerations for setting up Spark
- Before you can use Phoenix-Spark connector for your Spark programs, you must
configure your Maven settings to have a repository that points to the repository at
https://repository.cloudera.com/artifactory/public/org/apache/phoenix/phoenix-spark/ and use the dependency:
<dependency> <groupId>org.apache.phoenix</groupId> <artifactId>phoenix-spark</artifactId> <version>5.1.0-cdh7</version> <scope>provided</scope> </dependency>