Configure Phoenix-Spark connector using Cloudera Manager
Go to the Spark service.
Click the Configuration tab.
Select Scope > Gateway.
Select Category > Advanced.
Locate the Spark Client Advanced Configuration Snippet (Safety Valve)
for spark-conf/spark-defaults.conf property or search for it by
typing its name in the Search box.
Add the following properties to ensure that all required Phoenix and HBase platform
dependencies are available on the classpath for the Spark executors and
drivers:
If your spark and hbase are running on the same instance, then skip to step
7.
If you are using the Phoenix-Spark connector to connect to an HBase
instance outside of the cluster, run the hbase mapredcp
command on the remote cluster.
Copy all JAR files listed in the output to the local cluster, and
add the JAR files to both *extraClasspath properties.
Copy the directory containing hbase-site.xml from the remote
cluster, and add it to the *extraClasspath properties.
spark.executor.extraClassPath=/copied/hbase-site.xml,/copied/phoenix5-spark-shaded.jar,/copied/hbase-shaded-mapreduce-2.1.6.3.1.5.0-152
.jar,<rest of hbase mapredcp jars>
The Phoenix-Spark
connector can have different names. For CDH6, HDP3, and CDP 7.1.5 and
earlier, and CDP 7.2.1 to CDP 7.2.8 you must use phoenix-client.jar. For
7.16, 7.1.7, 7.2.9, 7.2.10, 7.2.11, and higher versions you must use
phoenix5-spark-shaded.jar.
Enter a Reason for change, and then click Save Changes
to commit the changes.
Restart the role and service when Cloudera Manager prompts you to restart.
Build a Spark application using the Phoenix-Spark connector with the
dependencies that are present in the connector.
Build a Spark application using the dependencies that you provide when you run
your application. Use the --jars/opt/cloudera/parcels/CDH/lib/phoenix_connectors/phoenix5-spark-shaded-[***VERSION***].jar
parameter when running the spark-submit command.