Configuring out of memory recovery
You can optionally specify the step size in memory increase to prevent out of memory (OOM) crashes to your pods. You can also specify an upper bound to memory increase, to prevent infinite scaling.
The Cloudera Flow Management - Kubernetes Operator can detect an Out of Memory event in a NiFi cluster and scale up the memory footprint when configured for Out of Memory Recovery. This feature is not preventative but responsive, the NiFi cluster must first run out of memory and fail a Readiness check before the recovery attempt will be made, potentially impacting Flow performance. OOM Recovery is intended to be a safe guard and is not a replacement for good cluster sizing. If OOM Recovery has triggered, it is recommended that you reevaluate your NiFi resource sizing.
OOM Recovery has two fields to configure: stepSize
and
upperBound
. stepSize
defines the amount of memory that
should be added for each OOM event. upperBound defines the maximum amount of memory to
which the OOM Recovery process is allowed to grow.
spec:
outOfMemoryRecovery:
stepSize: [***DEFINES THE MEMORY INCREASE EVERY TIME PODS ARE OOMKILLED***]
upperBound: [***SPECIFIES THE UPPER LIMIT OF MEMORY INCREASE FOR MEMORY PROTECTION***]
For example:
spec:
outOfMemoryRecovery:
stepSize: 1Gi
upperBound: 8Gi
resources:
nifi:
requests:
cpu: "1"
memory: 4Gi
The above spec
starts with NiFi containers at 4Gi and will grow
by 1Gi for every OOM that occurs until the NiFi container memory reaches 8Gi. When only
memory requests are provided, the NiFi container memory request will grow. If memory limits
are provided, only the memory limit will grow.
Note: This can break Quality of Service for the Pod, in the future the requests and limits will grow proportionately.
Once the OOM Recovery has taken effect, it will never automatically scale down. Removal of the OOM Recovery growth will occur when a NiFi resource spec change is detected or when OOM Recovery is removed from the NiFi spec.
NiFi Resource Conditions
The following status field and condition have been added to track the OOM Recovery process:
status:
conditions:
- lastTransitionTime: "2025-04-15T16:16:15Z"
message: NiFi has vertically scaled for OOM recovery
observedGeneration: 2
reason: OOMRecoveryScaleUp
status: "False"
type: VerticallyScaleUp
outOfMemoryRecoveryGrowth: 500Mi
The field outOfmemoryRecoveryGrowth
tracks how much the NiFi memory has
already grown. The VerticallyScaleUp
condition provides the last time
the cluster scaled up as well as if the scaling action is complete or not. While the
status of VerticallyScaleUp
is “True
”, the scaling is
in progress. Once the scaling action is complete, the status is set to
“False
”.