Application upgrades
Learn more about Flink application upgrades.
When the job specifications are changed for a FlinkDeployment or FlinkSessionJob resource, the running application must be upgraded. In case of upgrades, the Flink Operator automatically stops the currently running application, if it’s not in a suspended state. After stopping, the Flink Operator redeploys the application using the new specification. When redeploying stateful applications, their state is carried over from (suspended remains suspended, running will be started again).
You can configure how states are managed when stopping and restarting stateful applications
using the
upgradeMode
setting in spec.job
. The following
values are supported for upgradeMode
:stateless
: stateless application upgrades from empty statesavepoint
: a savepoint is created during the upgrade process to provide safety and possibility for the savepoint to be used as backup. The Flink application must be in running state to allow the savepoint to be created. In case the application is in an unhealthy state, the last checkpoint will be used, unlesskubernetes.operator.job.upgrade.last-state-fallback.enabled
is set tofalse
. If the last checkpoint is not available, the job upgrade will fail. For more information, see Savepoint management.last-state
: the latest checkpoint information is used for quick upgrades in any application state (even for failing jobs). Healthy application state is not required as the latest checkpoint information is used. Manual recovery might be necessary in case the high availability metadata is lost. You can configure thekubernetes.operator.job.upgrade.last-state.max.allowed.checkpoint.age
to limit the time the application may fall back to when picking up the latest checkpoint. If the checkpoint is older than the configured value, a savepoint will be created instead (for healthy applications only).
upgradeMode
configuration controls both the
stop and restore mechanisms as shown in the following table:Stateless | Last state | Savepoint | |
---|---|---|---|
Configuration Requirement | None | Checkpointing & HA Enabled | Checkpoint/Savepoint directory defined |
Job Status Requirement | None | HA metadata available | Job Running1 |
Suspend Mechanism | Cancel/Delete | Delete Flink deployment (keep HA metadata) | Cancel with savepoint |
Restore Mechanism | Deploy from empty state | Recover last state using HA metadata | Restore From savepoint |
Production Use | Not recommended | Recommended | Recommended |
1 When HA is enabled and the application is in an unhealthy state,
the
savepoint
upgrade mode might fall back to the
last-state
behavior.