Creating a Job

Jobs are created within the scope of a project. When you create a job, you will be asked to select a script to run as part of the job, and create a schedule for when the job should run. Optionally, you can configure a job to be dependent on another existing job, thus creating a pipeline of tasks to be accomplished in a sequence.

  1. Navigate to the project for which you want to create a job.
  2. On the left-hand sidebar, click Jobs.
  3. Click New Job.
  4. Enter a Name for the job.
  5. Select a script to run for this job by clicking on the folder icon. You will be able to select a script from a list of files that are already part of the project. To upload more files to the project, see Managing Files.
  6. (Optional) Specify command-line arguments that are needed by the scripts that are running within your job in the Arguments field.
  7. Depending on the code you are running, select an Engine Kernel for the job from one of the following options: Python 2, Python 3, R, or Scala.
  8. Select a Schedule for the job runs from one of the following options.
    • Manual - Select this option if you plan to run the job manually each time.
    • Recurring - Select this option if you want the job to run in a recurring pattern every X minutes, or on an hourly, daily, weekly or monthly schedule. Set the recurrence interval with the drop-down buttons.

      As an alternative, select Use a cron expression to enter a Unix-style cron expression to set the interval. The expression must have five fields, specifying the minutes, hours, day of month, month, and day of week. If the cron expression is deselected, the schedule indicated in the drop-down settings takes effect.

    • Dependent - Use this option when you are building a pipeline of jobs to run in a predefined sequence. From a dropdown list of existing jobs in this project, select the job that this one should depend on. Once you have configured a dependency, this job will run only after the preceding job in the pipeline has completed a successful run.
  9. Select an Engine Profile to specify the number of cores and memory available for each session.
  10. Enter an optional timeout value in minutes.
  11. To override the overall project environment variables, under Environmental Variables enter the name and value of your new variable and click Add.
    You can also delete an existing environment variable by selecting it and clicking Delete.
  12. Specify a list of Job Report Recipients to whom you can send email notifications with detailed job reports for job success, failure, or timeout. You can send these reports to yourself, your team (if the project was created under a team account), or any other external email addresses.
  13. Add any Attachments such as the console log to the job reports that will be emailed.
  14. Click Create Job.
    Starting with version 1.1.x, you can use the Jobs API to schedule jobs from third partly workflow tools. For details, see Cloudera Data Science Workbench Jobs API.