Using Custom Spark Runtime Docker Images Through API/CLI
Learn about how to run jobs using custom spark runtime with examples.
Steps
- Create a custom docker image.Build
custom-spark-dex-runtimeimages based on thedex-spark-runtimeimage of the Cloudera Data Engineering version.The following
dex-spark-runtimeimages can be used:- Spark 3 Cloudera security hardened
images
<registry-host>/cloudera/dex/dex-spark-runtime-<spark version>-<cdh version>:<Cloudera Data Engineering version>Example
This examples shows a DockerFile for DEX 1.24.0-b711, Spark 3.3.2 and Cloudera Runtime version 7.1.9.1015.
FROM docker.repository.cloudera.com/cloudera/dex/dex-spark-runtime-3.3.2-7.1.9.1015:1.24.0-b711 USER root RUN apk add --no-cache git RUN pip3 install virtualenv-api USER ${DEX_UID} - Spark 3 Redhat insecure and deprecated
images
<registry-host>/cloudera/dex/dex-spark-runtime-<spark version>-<cdh version>-compat:<CDE version>Example
This example shows a DockerFile for DEX 1.24.0-b711, Spark 3.3.2 and Cloudera Runtime version 7.1.9.1015.
FROM docker.repository.cloudera.com/cloudera/dex/dex-spark-runtime-3.3.2-7.1.9.1015-compat:1.24.0-b711 USER root RUN yum install -y git && yum clean all && rm -rf /var/cache/yum RUN pip2 install virtualenv-api RUN pip3 install virtualenv-api USER ${DEX_UID} - Spark 2 Redhat insecure and deprecated
images
<registry-host>/cloudera/dex/dex-spark-runtime-<spark version>-<cdh version>:<CDE version>Example
This example shows a DockerFile for DEX 1.24.0-b711, Spark 2.4.8 and Cloudera Runtime version 7.1.9.1015.
FROM docker.repository.cloudera.com/cloudera/dex/dex-spark-runtime-2.4.8-7.1.9.1015:1.24.0-b711 USER root RUN yum install -y git && yum clean all && rm -rf /var/cache/yum RUN pip2 install virtualenv-api RUN pip3 install virtualenv-api USER ${DEX_UID}
- Spark 3 Cloudera security hardened
images
- Build the docker image tagging it with the custom registry to be used, and push it to the
custom registry.
Example:
mac@local:$ docker build --network=host -t docker.my-company.registry.com/custom-dex/dex-spark-runtime-2.4.8-7.2.14.0:1.15.0-b117-custom . -f Dockerfile mac@local:$ docker push docker.my-company.registry.com/custom-dex/dex-spark-runtime-2.4.8-7.2.14.0:1.15.0-b117-customIn this example the custom registry is
docker.my-company.registry.comand the registry namespace iscustom-dex. - Create a custom runtime image resource.
Register the
custom-spark-dex-runtimedocker image as a resource of typeCustom-runtime-image.- Create a resource for the registries that do not require any authentication. If using
a public Docker registry or if the Docker registry is in the same environment, for
example, the same AWS account or Azure subscription where the Cloudera Data Engineering service is running, then you do not need to create
credentials.
mac@local:$ cde resource create --name custom-image-resource --image docker.my-company.registry.com/custom-dex/dex-spark-runtime-2.4.8-7.2.14.0:1.15.0-b117-custom --image-engine spark2 --type custom-runtime-imagecurl -X POST -k 'https://<dex-vc-host>/dex/api/v1/resources \ -H "Authorization: Bearer ${CDE_TOKEN}" \ -H 'accept: application/json' \ -H 'Content-Type: application/json' \ --data '{ "customRuntimeImage": { "engine": "spark2", "image": "docker.my-company.registry.com/custom-dex/dex-spark-runtime-2.4.8-7.2.14.0:1.15.0-b117-custom" }, "name": "custom-image-resource", "type": "custom-runtime-image" }'Once done, skip to step 4 to submit the job.
- Create a resource which requires the credentials to access the registry. Use the CLI
command or the API request to create the credentials. These credentials are stored as a
secret.
mac@local:$ ./cde credential create --name docker-creds --type docker-basic --docker-server docker-sandbox.infra.cloudera.com --docker-username my-usernamecurl -X POST -k 'https://<dex-vc-host>/dex/api/v1/credentials' \ -H "Authorization: Bearer ${CDE_TOKEN}" \ -H 'accept: application/json' \ -H 'Content-Type: application/json' \ --data '{ "dockerBasic": { "password": "password123", "server": "docker-sandbox.infra.cloudera.com", "username": "my-username" }, "name": "docker-creds", "type": "docker-basic" }' - Register the
custom-spark-dex-runtimedocker image as a resource of typecustom-runtime-imageby specifying the name of the credential created earlier.mac@local:$ ./cde resource create --name custom-image-resource --image docker.my-company.registry.com/custom-dex/dex-spark-runtime-2.4.8-7.2.14.0:1.15.0-b117-custom --image-engine spark2 --type custom-runtime-image --image-credential docker-credscurl -X POST -k 'https://<dex-vc-host>/dex/api/v1/resources \ -H "Authorization: Bearer ${CDE_TOKEN}" \ -H 'accept: application/json' \ -H 'Content-Type: application/json' \ --data ‘{ "customRuntimeImage": { "credential": "docker-creds", "engine": "spark2", "image": "docker.my-company.registry.com/custom-dex/dex-spark-runtime-2.4.8-7.2.14.0:1.15.0-b117-custom" }, "name": "custom-image-resource", "type": "custom-runtime-image" }'
- Create a resource for the registries that do not require any authentication. If using
a public Docker registry or if the Docker registry is in the same environment, for
example, the same AWS account or Azure subscription where the Cloudera Data Engineering service is running, then you do not need to create
credentials.
-
Submit a job by setting the
custom-spark-dex-runtimeimage as a resource using the CDE CLImac@local:$ ./cde --user cdpuser1 spark submit /Users/my-username/spark-examples_2.11-2.4.4.jar --class org.apache.spark.examples.SparkPi 1000 --runtime-image-resource-name=custom-image-resourcemac@local:$ ./cde --user cdpuser1 resource create --name spark-jar mac@local:$ ./cde --user cdpuser1 resource upload --name spark-jar --local-path spark-examples_2.11-2.4.4.jar mac@local:$ ./cde --user cdpuser1 job create --name spark-pi-job-cli --type spark --mount-1-resource spark-jar --application-file spark-examples_2.11-2.4.4.jar --class org.apache.spark.examples.SparkPi --user cdpuser1 --arg 22 --runtime-image-resource-name custom-image-resource - The Spark driver or Spark executor pods are expected to use this image and you can confirm it by opening a shell into those pods and verifying if the external installed libraries or files exist.
Public docker registries
Create the resource for the registries that do not require any authentication. You do not need to specify the credentials.
mac@local:$ cde resource create --name custom-image-resource --image docker.my-company.registry.com/custom-dex/dex-spark-runtime-2.4.7-7.1.7.1000:1.18.2-b70-custom --image-engine spark2 --type custom-runtime-image
ccurl -X POST -k 'https://<dex-vc-host>/dex/api/v1/resources \
-H "Authorization: Bearer ${CDE_TOKEN}" \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
--data ‘{
"customRuntimeImage": {
"engine": "spark2",
"image": "docker.my-company.registry.com/custom-dex/dex-spark-runtime-2.4.7-7.1.7.1000:1.18.2-b70-custom"
},
"name": "custom-image-resource",
"type": "custom-runtime-image"
}’
Perform #step 5 to submit the job.
Error: Custom image resource with missing or wrong credentials
Creating a custom image resource with missing or wrong credentials might result in the followin error that can be seen in the logs or in Kubernetes pod events:
Failed to pull image "docker.my-company.registry.com/custom-dex/dex-spark-runtime-2.4.7-7.1.7.1000:1.18.2-b70-custom":
rpc error: code = Unknown desc = Error reading manifest 1.18.2-b70-custom in docker.my-company.registry.com/custom-dex/dex-spark-runtime-2.4.7-7.1.7.1000:
errors: denied: requested access to the resource is denied unauthorized: authentication required
