Running Phoenix and HBase Spark Applications using Cloudera Data Engineering
Cloudera Data Engineering supports running Spark applications in the Cloudera on cloud. You can use Cloudera Data Engineering to run Phoenix and HBase Spark applications against Cloudera Operational Database.
- Set your Cloudera workload password. For more information, see Setting the workload password.
- Synchronize users from the User Management Service in the Cloudera Control Plane into the environment in which your Cloudera Operational Database is running.
- Ensure Cloudera Data Engineering service is enabled and a virtual cluster is created in the Data Engineering Experience. For more information, see Creating virtual clusters and Enabling a Cloudera Data Engineering service.
-
Set HBase and Phoenix versions in your Maven project.
-
Use the describe-client-connectivity command for the
HBase and Phoenix version information. The following code snippet shows
fetching the database connectivity information and parsing the required
HBase and Phoenix information to build your application.
echo "HBase version" cdp opdb describe-client-connectivity --database-name my-database --environment-name my-env | jq ".connectors[] | select(.name == \"hbase\") | .version" echo "Phoenix Connector Version" cdp opdb describe-client-connectivity --database-name my-database --environment-name my-env | jq ".connectors[] | select(.name == \"phoenix-thick-jdbc\") | .version"
HBase Version 2.4.6.7.2.14.0-133 Phoenix Spark Version "6.0.0.7.2.14.0-133"
-
Update the HBase and Phoenix connector versions in our Maven project or
configuration.
<properties> ... <phoenix.connector.version>6.0.0.7.2.14.0-133</phoenix.connector.version> <hbase.version>2.4.6.7.2.14.0-133</hbase.version> ... </properties>
-
Use the describe-client-connectivity command for the
HBase and Phoenix version information. The following code snippet shows
fetching the database connectivity information and parsing the required
HBase and Phoenix information to build your application.
-
Download hbase-site.xml and
hbase-omid-client-config.yml configuration files.
-
Use the describe-client-connectivity command to
determine the client configuration URL.
cdp opdb describe-client-connectivity --database-name spark-connector --environment-name cod-7213 | jq ".connectors[] | select(.name == \"hbase\") | .configuration.clientConfigurationDetails[] | select(.name == \"HBASE\") | .url "
"https://cod--XXXXXX-gateway0..xcu2-8y8x.dev.cldr.work/clouderamanager/api/v41/clusters/XXXXX/services/hbase/clientConfig"
-
Use the URL gathered from the previous command and run the
curl command to download the HBase
configurations.
curl -f -o "hbase-config.zip" -u "<csso_user>" "https://cod--XXXXXX-gateway0.cod-7213....xcu2-8y8x.dev.cldr.work/clouderamanager/api/v41/clusters/cod--XXXX/services/hbase/clientConfig"
-
Unzip hbase-config.zip and copy the
hbase-site.xml and
hbase-omid-client-config.yml to
src/main/resources path in the Maven
project.
unzip hbase-conf.zip cp hbase-conf/hbase-site.xml <path to src/main/resources> cp hbase-conf/hbase-omid-client-config.yml <path to src/main/resources>
-
Use the describe-client-connectivity command to
determine the client configuration URL.
-
Build the project.
$ mvn package
-
Create a Cloudera Data Engineering job.
- Configure Cloudera Data Engineering CLI to point to the virtual cluster. For more information, see Downloading the Cloudera Data Engineering command line interface.
-
Create a resource using the following command.
cde resource create --name phoenix-spark-app-resource
-
Upload the required jars which you downloaded while building the
project.
cde resource upload --name spark-app-resource --local-path ./target/connector-libs/hbase-shaded-mapreduce-2.4.6.7.2.14.0-133.jar --resource-path hbase-shaded-mapreduce-2.4.6.7.2.14.0-133.jar cde resource upload --name spark-app-resource --local-path ./target/connector-libs/opentelemetry-api-0.12.0.jar --resource-path opentelemetry-api-0.12.0.jar cde resource upload --name spark-app-resource --local-path ./target/connector-libs/opentelemetry-context-0.12.0.jar --resource-path opentelemetry-context-0.12.0.jar cde resource upload --name spark-app-resource --local-path ./target/connector-libs/phoenix5-spark-shaded-6.0.0.7.2.14.0-133.jar --resource-path phoenix5-spark-shaded-6.0.0.7.2.14.0-133.jar
-
Upload the Spark application app jar that you had built earlier.
cde resource upload --name spark-app-resource --local-path ./target/phoenix-spark-transactions-0.1.0.jar --resource-path phoenix-spark-transactions-0.1.0.jar
-
Replace HBase, Phoenix, and Phoenix Spark connector versions in the
spark-job.json as shown in the following sample
and create a Cloudera Data Engineering job using the following JSON and import commands.
{ "mounts":[ { "resourceName":"phoenix-spark-app-resource" } ], "name":"phoenix-spark-app", "spark":{ "className":"com.cloudera.cod.examples.spark.SparkApp", "args":[ "{{ phoenix_jdbc_url }}" ], "driverCores":1, "driverMemory":"1g", "executorCores":1, "executorMemory":"1g", "file":"phoenix-spark-transactions-0.1.0.jar", "pyFiles":[ ], "files":[ "hbase-shaded-mapreduce-2.4.6.7.2.14.0-133.jar", "opentelemetry-api-0.12.0.jar", "opentelemetry-context-0.12.0.jar" "phoenix5-spark-shaded-6.0.0.7.2.14.0-133.jar", ], "numExecutors":4 } } cde job import --file spark-job.json
-
Run the project.
-
Use the describe-client-connectivity command to
determine the base JDBC URL to pass.
cdp opdb describe-client-connectivity --database-name my-database --environment-name my-env | jq ".connectors[] | select(.name == \"phoenix-thick-jdbc\") | .configuration.jdbcUrl"
-
Run the job by passing the JDBC URL obtained from the previous command,
as an argument to the job.
cde job run --name phoenix-spark-app --variable phoenix_jdbc_url=<phoenix_jdbc_url>
-
Use the describe-client-connectivity command to
determine the base JDBC URL to pass.