The Job Retry feature enables automatic retries of jobs based
on their terminal execution states, namely failed, timed out, or skipped. It also
supports concurrent execution of job retry runs, ensuring that scheduled job runs remain
unaffected and are not blocked by retry processes. Users have the flexibility to
configure various options to define the retry behavior. These retries are fully
automated, eliminating the need for manual intervention.
The Administrator can define default values for the Job Retry parameters and only the
Administrator can configure a hard limit on the maximum number of job retry runs
that can be executed alongside normal job runs. This setting must only be enabled if
you want to manage and limit resource usage for job retry runs.
-
In the Cloudera console, click the Cloudera AI tile.
The Cloudera AI Workbenches page displays.
-
Click on the name of the workbench.
The workbench Home page displays.
-
Select Site Administration in the left Navigation
pane.
-
Select the Settings tab.
-
Select .
-
Enable Limit Concurrent Retries by selecting the
checkbox.
Enabling this option sets a limit to how many job retry runs (at maximum) can
be active at the same time.
-
Define the limit value for Maximum Concurrent Retry
Limit.
The Maximum Concurrent Retry Limit specifies
the maximum number of job retry runs that can execute concurrently across
the entire workbench, regardless of the total number of jobs running.
If the maximum limit value defined as Maximum Concurrent
Retry Limit is reached, any additional job retry runs are
rescheduled until the number of active retry runs falls below the limit.
Enable this hard limit only if job retry runs are consuming
excessive resources, otherwise, avoid setting a hard limit.
Administrators can set this value if the Limit Concurrent
Retires option is enabled.
-
Under Default Settings for all jobs, select
Enable Retry to enable a retry run for the job.
If the administrator configures these settings, the specified values
automatically populate the fields in the new job creation form when a user
creates a job. In this case, the Job Retry settings act as default values
that the administrator can recommend to users. If the administrator does not
configure these settings, the fields remain blank in the new job creation
form. In both cases, users have the flexibility to customize these values
during job creation or update them later through the job settings page.
Define the following parameters for Job Retry:
- Maximum Retry – The maximum number of
retry attempts which can be triggered for a single job run in case of
continuous failure of retry job runs.
The minimum value is
1.
- Retry Delay (minutes) – The delay
between two consecutive retry job runs for a failed instance of the
run.
The minimum value is 1 minute.
- Retry Conditions – Different options
can be configured to control the terminal states of a job run that
trigger a retry. The Retry process completes as soon as at least one (or
more) option is selected.
Select at least one of the
following criteria if Retry is enabled, but you can select any
combination of the following Retry Conditions
options:
-
Script Failure – Runs the
Retry process for user script failures if the user script
exits with a non-zero exit code after the execution of the
script.
-
System Failure – Runs the
Retry process for any kind of system- or engine-related
failures not including user script failures.
-
Timed-out Runs – Runs the
Retry process for timed-out job runs.
-
Skipped Runs – Runs the Retry process
for skipped job runs.
-
Click on Update to save the settings.