Running Phoenix and HBase Spark Applications using CDE
Cloudera Data Engineering (CDE) supports running Spark applications in the CDP public cloud. You can use CDE to run Phoenix and HBase Spark applications against COD.
- Set your CDP workload password. For more information, see Setting the workload password.
- Synchronize users from the User Management Service in the CDP Control Plane into the environment in which your COD database is running.
- Ensure CDE service is enabled and a virtual cluster is created in the Data Engineering Experience. For more information, see Creating virtual clusters and Enabling a Cloudera Data Engineering service.
-
Set HBase and Phoenix versions in your Maven project.
-
Use the describe-client-connectivity command for the
HBase and Phoenix version information. The following code snippet shows
fetching the database connectivity information and parsing the required
HBase and Phoenix information to build your application.
echo "HBase version" cdp opdb describe-client-connectivity --database-name my-database --environment-name my-env | jq ".connectors[] | select(.name == \"hbase\") | .version" echo "Phoenix Connector Version" cdp opdb describe-client-connectivity --database-name my-database --environment-name my-env | jq ".connectors[] | select(.name == \"phoenix-thick-jdbc\") | .version"
HBase Version 2.4.6.7.2.14.0-133 Phoenix Spark Version "6.0.0.7.2.14.0-133"
-
Update the HBase and Phoenix connector versions in our Maven project or
configuration.
<properties> ... <phoenix.connector.version>6.0.0.7.2.14.0-133</phoenix.connector.version> <hbase.version>2.4.6.7.2.14.0-133</hbase.version> ... </properties>
-
Use the describe-client-connectivity command for the
HBase and Phoenix version information. The following code snippet shows
fetching the database connectivity information and parsing the required
HBase and Phoenix information to build your application.
-
Download hbase-site.xml and
hbase-omid-client-config.yml configuration files.
-
Use the describe-client-connectivity command to
determine the client configuration URL.
cdp opdb describe-client-connectivity --database-name spark-connector --environment-name cod-7213 | jq ".connectors[] | select(.name == \"hbase\") | .configuration.clientConfigurationDetails[] | select(.name == \"HBASE\") | .url "
"https://cod--XXXXXX-gateway0..xcu2-8y8x.dev.cldr.work/clouderamanager/api/v41/clusters/XXXXX/services/hbase/clientConfig"
-
Use the URL gathered from the previous command and run the
curl command to download the HBase
configurations.
curl -f -o "hbase-config.zip" -u "<csso_user>" "https://cod--XXXXXX-gateway0.cod-7213....xcu2-8y8x.dev.cldr.work/clouderamanager/api/v41/clusters/cod--XXXX/services/hbase/clientConfig"
-
Unzip hbase-config.zip and copy the
hbase-site.xml and
hbase-omid-client-config.yml to
src/main/resources path in the Maven
project.
unzip hbase-conf.zip cp hbase-conf/hbase-site.xml <path to src/main/resources> cp hbase-conf/hbase-omid-client-config.yml <path to src/main/resources>
-
Use the describe-client-connectivity command to
determine the client configuration URL.
-
Build the project.
$ mvn package
-
Create a CDE job.
- Configure CDE CLI to point to the virtual cluster. For more information, see Downloading the Cloudera Data Engineering command line interface.
-
Create a resource using the following command.
cde resource create --name phoenix-spark-app-resource
-
Upload the required jars which you downloaded while building the
project.
cde resource upload --name spark-app-resource --local-path ./target/connector-libs/hbase-shaded-mapreduce-2.4.6.7.2.14.0-133.jar --resource-path hbase-shaded-mapreduce-2.4.6.7.2.14.0-133.jar cde resource upload --name spark-app-resource --local-path ./target/connector-libs/opentelemetry-api-0.12.0.jar --resource-path opentelemetry-api-0.12.0.jar cde resource upload --name spark-app-resource --local-path ./target/connector-libs/opentelemetry-context-0.12.0.jar --resource-path opentelemetry-context-0.12.0.jar cde resource upload --name spark-app-resource --local-path ./target/connector-libs/phoenix5-spark-shaded-6.0.0.7.2.14.0-133.jar --resource-path phoenix5-spark-shaded-6.0.0.7.2.14.0-133.jar
-
Upload the Spark application app jar that you had built earlier.
cde resource upload --name spark-app-resource --local-path ./target/phoenix-spark-transactions-0.1.0.jar --resource-path phoenix-spark-transactions-0.1.0.jar
-
Replace HBase, Phoenix, and Phoenix Spark connector versions in the
spark-job.json as shown in the following sample
and create a CDE job using the following JSON and import commands.
{ "mounts":[ { "resourceName":"phoenix-spark-app-resource" } ], "name":"phoenix-spark-app", "spark":{ "className":"com.cloudera.cod.examples.spark.SparkApp", "args":[ "{{ phoenix_jdbc_url }}" ], "driverCores":1, "driverMemory":"1g", "executorCores":1, "executorMemory":"1g", "file":"phoenix-spark-transactions-0.1.0.jar", "pyFiles":[ ], "files":[ "hbase-shaded-mapreduce-2.4.6.7.2.14.0-133.jar", "opentelemetry-api-0.12.0.jar", "opentelemetry-context-0.12.0.jar" "phoenix5-spark-shaded-6.0.0.7.2.14.0-133.jar", ], "numExecutors":4 } } cde job import --file spark-job.json
-
Run the project.
-
Use the describe-client-connectivity command to
determine the base JDBC URL to pass.
cdp opdb describe-client-connectivity --database-name my-database --environment-name my-env | jq ".connectors[] | select(.name == \"phoenix-thick-jdbc\") | .configuration.jdbcUrl"
-
Run the job by passing the JDBC URL obtained from the previous command,
as an argument to the job.
cde job run --name phoenix-spark-app --variable phoenix_jdbc_url=<phoenix_jdbc_url>
-
Use the describe-client-connectivity command to
determine the base JDBC URL to pass.