Creating a Job

This topic describes how to automate analytics workloads with a built-in job and pipeline scheduling system that supports real-time monitoring, job history, and email alerts.

A job automates the action of launching an engine, running a script, and tracking the results, all in one batch process. Jobs are created within the purview of a single project and can be configured to run on a recurring schedule. You can customize the engine environment for a job, set up email alerts for successful or failed job runs, and email the output of the job to yourself or a colleague.

Jobs are created within the scope of a project. When you create a job, you will be asked to select a script to execute as part of the job, and create a schedule for when the job should run. Optionally, you can configure a job to be dependent on another existing job, thus creating a pipeline of tasks to be accomplished in a sequence. Note that the script files and any other job dependencies must exist within the scope of the same project.

  1. Navigate to the project for which you want to create a job.
  2. On the left-hand sidebar, click Jobs.
  3. Click New Job.
  4. Enter a Name for the job.
  5. Select a script to execute for this job by clicking on the folder icon. You will be able to select a script from a list of files that are already part of the project. To upload more files to the project, see Managing Project Files.
  6. Depending on the code you are running, select an Engine Kernel for the job from one of the following options: Python 2, Python 3, R, or Scala.
  7. Select a Schedule for the job runs from one of the following options.
    • Manual - Select this option if you plan to run the job manually each time.
    • Recurring - Select this option if you want the job to run in a recurring pattern every X minutes, or on an hourly, daily, weekly or monthly schedule.
    • Dependent - Use this option when you are building a pipeline of jobs to run in a predefined sequence. From a dropdown list of existing jobs in this project, select the job that this one should depend on. Once you have configured a dependency, this job will run only after the preceding job in the pipeline has completed a successful run.
  8. Select an Engine Profile to specify the number of cores and memory available for each session.
  9. Enter an optional timeout value in minutes.
  10. Click Set environment variables if you want to set any values to override the overall project environment variables.
  11. Specify a list of Job Report Recipients to whom you can send email notifications with detailed job reports for job success, failure, or timeout. You can send these reports to yourself, your team (if the project was created under a team account), or any other external email addresses.
  12. Add any Attachments such as the console log to the job reports that will be emailed.
  13. Click Create Job.

    You can use the Jobs API to schedule jobs from third partly workflow tools. For details, see Cloudera Machine Learning Jobs API.