Configuring Job Retry settings for the Administrator

The Job Retry feature enables automatic retries of jobs based on their terminal execution states, namely failed, timed out, or skipped. It also supports concurrent execution of job retry runs, ensuring that scheduled job runs remain unaffected and are not blocked by retry processes. Users have the flexibility to configure various options to define the retry behavior. These retries are fully automated, eliminating the need for manual intervention.

The Administrator can define default values for the Job Retry parameters and only the Administrator can configure a hard limit on the maximum number of job retry runs that can be executed alongside normal job runs. This setting must only be enabled if you want to manage and limit resource usage for job retry runs.

  1. In the Cloudera console, click the Cloudera AI tile.

    The Cloudera AI Workbenches page displays.

  2. Click on the name of the workbench.
    The workbench Home page displays.
  3. Select Site Administration in the left Navigation pane.
  4. Select the Settings tab.
  5. Select Job Retry Configuration > Limit Concurrent Retries.
  6. Enable Limit Concurrent Retries by selecting the checkbox.

    Enabling this option sets a limit to how many job retry runs (at maximum) can be active at the same time.

  7. Define the limit value for Maximum Concurrent Retry Limit.

    The Maximum Concurrent Retry Limit specifies the maximum number of job retry runs that can execute concurrently across the entire workbench, regardless of the total number of jobs running.

    If the maximum limit value defined as Maximum Concurrent Retry Limit is reached, any additional job retry runs are rescheduled until the number of active retry runs falls below the limit.

    Enable this hard limit only if job retry runs are consuming excessive resources, otherwise, avoid setting a hard limit.

    Administrators can set this value if the Limit Concurrent Retires option is enabled.

  8. Under Default Settings for all jobs, select Enable Retry to enable a retry run for the job.

    If the administrator configures these settings, the specified values automatically populate the fields in the new job creation form when a user creates a job. In this case, the Job Retry settings act as default values that the administrator can recommend to users. If the administrator does not configure these settings, the fields remain blank in the new job creation form. In both cases, users have the flexibility to customize these values during job creation or update them later through the job settings page.

    Define the following parameters for Job Retry:

    • Maximum Retry – The maximum number of retry attempts which can be triggered for a single job run in case of continuous failure of retry job runs.

      The minimum value is 1.

    • Retry Delay (minutes) – The delay between two consecutive retry job runs for a failed instance of the run.

      The minimum value is 1 minute.

    • Retry Conditions – Different options can be configured to control the terminal states of a job run that trigger a retry. The Retry process completes as soon as at least one (or more) option is selected.

      Select at least one of the following criteria if Retry is enabled, but you can select any combination of the following Retry Conditions options:

      • Script Failure – Runs the Retry process for user script failures if the user script exits with a non-zero exit code after the execution of the script.

      • System Failure – Runs the Retry process for any kind of system- or engine-related failures not including user script failures.

      • Timed-out Runs – Runs the Retry process for timed-out job runs.

      • Skipped Runs – Runs the Retry process for skipped job runs.

  9. Click on Update to save the settings.