Savepoint management
Learn more about Flink savepoint management.
Savepoints are triggered automatically by the system during the upgrade process, as described in the previous section. You can also trigger savepoints manually or periodically, but user-created savepoints will not be used during the restoration process after the upgrade, and are not required for correct operation.
For savepoints to work, Flink requires a durable storage to save its data. You can use any type of (local or networked) mounted volumes, or object storage (for example S3, Longhorn, NFS, etc). In this documentation we use an NFS volume type.
- Define a new volume to store the savepoint and mount it to the
flink-main-container
container. - Enable savepoints by adding the savepoint directory to
spec.flinkConfiguration
. - Enable checkpoints by adding the checkpoint directory to
spec.flinkConfiguration
. - Enable periodic savepoints triggered by the Flink Operator by adding
kubernetes.operator.periodic.savepoint.interval: 2h.
- Set
upgradeMode
tosavepoint
to create savepoints and resume from them before each restart.
apiVersion: flink.apache.org/v1beta1
kind: FlinkDeployment
metadata:
name: flink-kubernetes-tutorial
spec:
image: [***REGISTRY HOST***]:[***PORT***]/[***PROJECT***]/flink-kubernetes-tutorial:latest
flinkVersion: v1_18
flinkConfiguration:
taskmanager.numberOfTaskSlots: "4"
state.savepoints.dir: file:///opt/flink/durable/savepoints
state.checkpoints.dir: file:///opt/flink/durable/checkpoints
high-availability.storageDir: file:///opt/flink/durable/ha
kubernetes.operator.periodic.savepoint.interval: 2h
serviceAccount: flink
mode: native
jobManager:
resource:
memory: "2048m"
cpu: 1
taskManager:
resource:
memory: "2048m"
cpu: 1
podTemplate:
spec:
containers:
- name: flink-main-container
volumeMounts:
- mountPath: /opt/flink/durable
name: flink-volume
volumes:
- name: flink-volume
nfs:
server: my-nfs-server.example.com
path: /data/flink/
job:
args: ["--rowsPerSec", "10", "--outputPath", "/opt/flink/durable"]
jarURI: local:///opt/flink/usrlib/flink-kubernetes-tutorial.jar
parallelism: 4
state: running
upgradeMode: savepoint
kubectl -n flink delete FlinkDeployment flink-kubernetes-tutorial
kubectl -n flink apply -f flink-deployment.yaml
kubectl -n flink patch FlinkDeployment flink-kubernetes-tutorial \
--type=merge \
--patch='{"spec":{"job":{"savepointTriggerNonce":1234}}}'
In case the application is suspended, the Flink Operator automatically creates a savepoint and resumes the application from the savepoint when restarted.
kubernetes.operator.savepoint.history.max.age: 24 h
kubernetes.operator.savepoint.history.max.count: 5
You can disable the savepoint cleanup completely by setting the
kubernetes.operator.savepoint.cleanup.enabled
property to
false
. In this case, the Flink Operator still collects and saves the
savepoint history, but does not perform any cleanup operations.
Additional savepoint operations
Even though savepoints are triggered automatically during an upgrade process, you can also trigger a savepoint manually or periodically. These configurations are optional and have no impact on the automatic savepoint triggering, and not required for the correct operation of the Flink cluster.
- Manually triggering a savepoint
- You can use the
savepointTriggerNonce
property inspec.job
to create a new savepoint by defining a new (different or random) value to the property:
This change will be applied by the Flink Operator as described in the previous sections.job: ... savepointTriggerNonce: 123
- Periodically triggering a savepoint
- You can use the kubernetes.operator.periodic.savepoint.interval property, on a per-job
level, to trigger a savepoint after a specified
period:
flinkConfiguration: ... kubernetes.operator.periodic.savepoint.interval: 6h
The timely execution of the periodic savepoint is not guaranteed as it can be delayed due to unhealthy job status or other user operation.