Application upgrades

Learn more about Flink application upgrades.

When the job specifications are changed for a FlinkDeployment or FlinkSessionJob resource, the running application must be upgraded. In case of upgrades, the Flink Operator automatically stops the currently running application, if it’s not in a suspended state. After stopping, the Flink Operator redeploys the application using the new specification. When redeploying stateful applications, their state is carried over from (suspended remains suspended, running will be started again).

You can configure how states are managed when stopping and restarting stateful applications using the upgradeMode setting in spec.job. The following values are supported for upgradeMode:
  • stateless: stateless application upgrades from empty state
  • savepoint: a savepoint is created during the upgrade process to provide safety and possibility for the savepoint to be used as backup. The Flink application must be in running state to allow the savepoint to be created. In case the application is in an unhealthy state, the last checkpoint will be used, unless kubernetes.operator.job.upgrade.last-state-fallback.enabled is set to false. If the last checkpoint is not available, the job upgrade will fail. For more information, see Savepoint management.
  • last-state: the latest checkpoint information is used for quick upgrades in any application state (even for failing jobs). Healthy application state is not required as the latest checkpoint information is used. Manual recovery might be necessary in case the high availability metadata is lost. You can configure the kubernetes.operator.job.upgrade.last-state.max.allowed.checkpoint.age to limit the time the application may fall back to when picking up the latest checkpoint. If the checkpoint is older than the configured value, a savepoint will be created instead (for healthy applications only).
The upgradeMode configuration controls both the stop and restore mechanisms as shown in the following table:
Table 1.
Stateless Last state Savepoint
Configuration Requirement None Checkpointing & HA Enabled Checkpoint/Savepoint directory defined
Job Status Requirement None HA metadata available Job Running1
Suspend Mechanism Cancel/Delete Delete Flink deployment (keep HA metadata) Cancel with savepoint
Restore Mechanism Deploy from empty state Recover last state using HA metadata Restore From savepoint
Production Use Not recommended Recommended Recommended
1 When HA is enabled and the application is in an unhealthy state, the savepoint upgrade mode might fall back to the last-state behavior.