Creating jobs in Cloudera Data Engineering

A job in Cloudera Data Engineering (CDE) consists of defined configurations and resources (including application code). Jobs can be run on demand or scheduled.

In Cloudera Data Engineering (CDE), jobs are associated with virtual clusters. Before you can create a job, you must create a virtual cluster that can run it. For more information, see Creating virtual clusters.

The following steps are required to allow users to submit jobs. Perform these steps for each user that needs to submit jobs to the virtual cluster.

  1. If you already downloaded the utility script and uploaded it to an ECS or HDFS gateway cluster host as documented in Creating virtual clusters, you can skip to step 8.
  2. Download to your local machine.
  3. Create a directory to store the files, and change to that directory:
    mkdir -p /tmp/cde-1.3.4 && cd /tmp/cde-1.3.4
  4. Embedded Container Service (ECS)
    Copy the extracted utility script ( to one of the Embedded Container Service (ECS) cluster hosts. To identify the ECS cluster hosts:
    1. Log in to the Cloudera Manager web interface.
    2. Go to Clusters > Experience Cluster > ECS > Hosts.
    3. Select one of the listed hosts, and copy the script to that host.
    Red Hat OpenShift Container Platform (OCP)
    Copy the extracted utility script ( and the OpenShift kubeconfig file to one of the HDFS service gateway hosts, and install the kubectl utility:
    1. Log in to the Cloudera Manager web interface.
    2. Go to Clusters > Base Cluster > HDFS > Instances.
    3. Select one of the Gateway hosts, and copy the script to that host.
    4. Copy the OCP kubeconfig file to the same host.
    5. On that host, install the kubectl utility following the instructions in the Kubernetes documentation.
  5. On the cluster host that you copied the script to, set the script permissions to be executable:
    chmod +x /path/to/
  6. Identify the virtual cluster endpoint:
    1. In the Cloudera Manager web UI, go to the Experiences page, and then click Open CDP Private Cloud Experiences.
    2. Click the Data Engineering tile.
    3. Select the CDE service containing the virtual cluster you want to activate.
    4. Click Cluster Details.
    5. Click JOBS API URL to copy the URL to your clipboard.
    6. Paste the URL into a text editor to identify the endpoint host. For example, the URL is similar to the following:

      The endpoint host is

  7. On the ECS or HDFS gateway host, create a filename containing the user principal, and generate a keytab. If you do not have the ktutil utility, you might need to install the krb5-workstation package. The following example commands assume the user principal is psherman@EXAMPLE.COM.
    1. Create a file named <username>.principal (for example, psherman.principal) containing the user principal:
    2. Generate a keytab named <username>.keytab for the user using ktutil:
      sudo ktutil
      ktutil:  addent -password -p psherman@EXAMPLE.COM -k 1 -e aes256-cts
      Password for psherman@EXAMPLE.COM: 
      ktutil:  addent -password -p psherman@EXAMPLE.COM -k 2 -e aes128-cts
      Password for psherman@EXAMPLE.COM: 
      ktutil:  addent -password -p psherman@EXAMPLE.COM -k 3 -e rc4-hmac
      Password for psherman@EXAMPLE.COM: 
      ktutil:  wkt psherman.keytab
      ktutil:  q
  8. Validate the keytab using klist and kinit:
    klist -ekt psherman.keytab 
    Keytab name: FILE:psherman.keytab
    KVNO Timestamp           Principal
    ---- ------------------- ------------------------------------------------------
       1 08/01/2021 10:29:47 psherman@EXAMPLE.COM (aes256-cts-hmac-sha1-96) 
       1 08/01/2021 10:29:47 psherman@EXAMPLE.COM (aes128-cts-hmac-sha1-96) 
       1 08/01/2021 10:29:47 psherman@EXAMPLE.COM (arcfour-hmac) 
    kinit -kt psherman.keytab psherman@EXAMPLE.COM

    Make sure that the keytab is valid before continuing. If the kinit command fails, the user will not be able to run jobs in the virtual cluster. After verifying that the kinit command succeeds, you can destroy the Kerberos ticket by running kdestroy.

  9. Use the script to copy the user keytab to the virtual cluster hosts:
    ./ init-user-in-virtual-cluster -h <endpoint_host> -u <user> -p <principal_file> -k <keytab_file>
    For example, using the psherman user, for the endpoint host:
    ./ init-user-in-virtual-cluster -h -u psherman -p psherman.principal -k psherman.keytab
  10. Repeat these steps for all users that need to submit jobs to the virtual cluster.
  1. Navigate to the Cloudera Data Engineering Overview page by clicking the Data Engineering tile in the Cloudera Data Platform (CDP) management console.
  2. In the Environments column, select the environment containing the virtual cluster where you want to create the job.
  3. In the Virtual Clusters column on the right, click the View Jobs icon on the virtual cluster where you want to create the application.
  4. In the left hand menu, click Jobs.
  5. Click the Create Job button.
  6. Provide the Job Details:
    1. Select Spark for the job type.
    2. Specify the Name.
    3. Select File or URL for your application file, and provide or specify the file. You can upload a new file or select a file from an existing resource.
      If you select URL and specify an Amazon AWS S3 URL, add the following configuration to the job:

      config_key: spark.hadoop.fs.s3a.delegation.token.binding


    4. If your application code is a JAR file, specify the Main Class.
    5. Specify arguments if required. You can click the Add Argument button to add multiple command arguments as necessary.
    6. Enter Configurations if needed. You can click the Add Configuration button to add multiple configuration parameters as necessary.
    7. If your application code is a Python file, select the Python Version, and optionally select a Python Environment.
  7. Click Advanced Configurations to display more customizations, such as additional files, initial executors, executor range, driver and executor cores and memory.
    By default, the executor range is set to match the range of CPU cores configured for the virtual cluster. This improves resource utilization and efficiency by allowing jobs to scale up to the maximum virtual cluster resources available, without manually tuning and optimizing the number of executors per job.
  8. Click Schedule to display scheduling options.
    You can schedule the application to run periodically using the Basic controls or by specifying a Cron Expression.
  9. If you provided a schedule, click Schedule to create the job. If you did not specify a schedule, and you do not want the job to run immediately, click the drop-down arrow on Create and Run and select Create. Otherwise, click Create and Run to run the job immediately.