Creating a Job
This topic describes how to automate analytics workloads with a built-in job and pipeline scheduling system that supports real-time monitoring, job history, and email alerts.
A job automates the action of launching an engine, running a script, and tracking the results, all in one batch process. Jobs are created within the purview of a single project and can be configured to run on a recurring schedule. You can customize the engine environment for a job, set up email alerts for successful or failed job runs, and email the output of the job to yourself or a colleague.
Jobs are created within the scope of a project. When you create a job, you will be asked to select a script to run as part of the job, and create a schedule for when the job should run. Optionally, you can configure a job to be dependent on another existing job, thus creating a pipeline of tasks to be accomplished in a sequence. Note that the script files and any other job dependencies must exist within the scope of the same project.
- Navigate to the project for which you want to create a job.
- On the left-hand sidebar, click Jobs.
- Click New Job.
- Enter a Name for the job.
- In Script, select a script to run for this job by clicking on the folder icon. You will be able to select a script from a list of files that are already part of the project. To upload more files to the project, see Managing Project Files.
In Arguments, enter command-line arguments to provide to the
This feature only works with R or Python engines.
Depending on the code you are running, select an Engine Kernel for the
job from one of the following option: Python 3.
The resources you can choose are dependent on the default engine you have chosen: ML Runtimes or Legacy Engines. For ML Runtimes, you can also choose a Kernel Edition and Version.
Select a Schedule for the job runs from one of the following options.
- Manual - Select this option if you plan to run the job manually each time.
Select this option if you want the job to run in a recurring pattern every X minutes,
or on an hourly, daily, weekly or monthly schedule. Set the recurrence interval with
the drop-down buttons.
As an alternative, select Use a cron expression to enter a Unix-style cron expression to set the interval. The expression must have five fields, specifying the minutes, hours, day of month, month, and day of week. If the cron expression is deselected, the schedule indicated in the drop-down settings takes effect.
- Dependent - Use this option when you are building a pipeline of jobs to run in a predefined sequence. From a dropdown list of existing jobs in this project, select the job that this one should depend on. Once you have configured a dependency, this job will run only after the preceding job in the pipeline has completed a successful run.
Select an Resource Profile to specify the number of cores and memory available for
- Enter an optional timeout value in minutes.
- Click Set environment variables if you want to set any values to override the overall project environment variables.
- Specify a list of Job Report Recipients to whom you can send email notifications with detailed job reports for job success, failure, or timeout. You can send these reports to yourself, your team (if the project was created under a team account), or any other external email addresses.
- Add any Attachments such as the console log to the job reports that will be emailed.
You can use the Jobs API to schedule jobs from third partly workflow tools. For details, see Cloudera Machine Learning Jobs API.