Creating jobs in Cloudera Data Engineering

A job in Cloudera Data Engineering consists of defined configurations and resources (including application code). Jobs can be run on demand or scheduled.

In Cloudera Data Engineering, jobs are associated with virtual clusters. Before you can create a job, you must create a virtual cluster that can run it. For more information, see Creating virtual clusters.

In the Cloudera Management Console, click the Data Engineering tile and click Overview.
In the Cloudera Data Engineering Services column, select the service that contains the virtual cluster that you want to create a job for.
In the Virtual Clusters column on the right, locate the virtual cluster that you want to use and click the View Jobs icon.
In the left hand menu, click Jobs.
Click the Create Job button.
Provide the Job Details:
1. Select Spark for the job type. For Airflow job types, see Apache Airflow in Cloudera Data Engineering.
2. Specify the Name.
3. Select Resources, URL, or Repository for your application file, and provide or specify the file. You can upload a new file or select a file from an existing resource.
  If you select URL and specify an Amazon AWS S3 URL, add the following configuration to the job:
  config_key: spark.hadoop.fs.s3a.delegation.token.binding
  config_value: org.apache.knox.gateway.cloud.idbroker.s3a.IDBDelegationTokenBinding
  
  If you want to use files from a repository, you must first create a repository. Once the repository is created, you can select Repository, click Add from Repository and select the file. Then, click Select File.
4. If your application code is a JAR file, specify the Main Class.
5. Specify arguments if required. You can click the Add Argument button to add multiple command arguments as necessary.
6. Enter Configurations if needed. You can click the Add Configuration button to add multiple configuration parameters as necessary.
  important
  For Spark jobs, setting the spark.app.id property at the Spark job level configuration or within the Spark application code is not supported in Cloudera Data Engineering.
  
  The Spark session is created during the job run or session creation. Most Spark configurations are not modifiable during runtime, you must specify them during job run or session creation. To check if you can modify a configuration, use spark.conf.isModifiable. For example,
  spark.conf.isModifiable("spark.executor.memory") False
7. If your application code is a Python file, select the Python Version, and optionally select a Python Environment.
Click Advanced Configurations to display more customizations, such as additional files, initial executors, executor range, driver and executor cores and memory.
By default, the executor range is set to match the range of CPU cores configured for the virtual cluster. This improves resource utilization and efficiency by allowing jobs to scale up to the maximum virtual cluster resources available, without manually tuning and optimizing the number of executors per job.
Click Schedule to display scheduling options.
You can schedule the application to run periodically using the Basic controls or by specifying a Cron Expression.

note
Scheduled job runs start at the end of the first full schedule interval after the start date, at the end of the scheduled period. For example, if you schedule a job with a daily interval with a start_date of 14:00, the first scheduled run is triggered at the end of the next day, after 23:59:59. However if the start_date is set to 00:00, it is triggered at the end of the same day, after 23:59:59.
If you provided a schedule, click Schedule to create the job. If you did not specify a schedule, and you do not want the job to run immediately, click the drop-down arrow on Create and Run and select Create. Otherwise, click Create and Run to run the job immediately.
Optional: Toggle Alerts to send mail to the email address that you choose. You have the option to select Job Failure to send an email upon job failure, and Job SLE Miss to send an email on a Job service-level agreement miss.

note
The Alerts option will only display if you have selected Configure Email Alerting during Virtual Cluster creation.

Creating jobs in Cloudera Data Engineering

We want your opinion

How can we improve this page?