Pod scheduling
Learn about the default affinity rules and tolerations that Strimzi sets for pod scheduling. Additionally, learn what affinity rules Cloudera recommends for making pod scheduling stricter.
The scheduling of Kafka broker, KRaft controller, and ZooKeeper pods can be customized in the Kafka and KafkaNodePool resources through various configurations such as storage configurations, affinity rules, and tolerations. Strimzi by default only sets a few of the pod scheduling configurations. It is your responsibility to ensure that pod scheduling configurations are customized correctly for your environment and use case.
Both storage and rack awareness configuration might have an impact on pod scheduling. For storage, depending on the configuration, it is possible that a pod is bound to a node or a group of nodes and cannot be scheduled elsewhere.
If rack awareness is configured, your pods by default get preferred and required affinity rules, which influence pod scheduling.
Default tolerations
The Strimzi Cluster Operator does not set any tolerations on the Kafka broker, KRaft controller, and ZooKeeper pods by default. The pods get a default toleration from the Kubernetes platform.
The default tolerations are as follows.
#...
kind: Kafka
spec:
kafka:
template:
pod:
tolerations:
- effect: NoExecute
key: node.kubernetes.io/not-ready
operator: Exists
tolerationSeconds: 300
- effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
tolerationSeconds: 300
This means that whenever the Kubernetes node running the pod is tainted as
unreachable
or not-ready
, the pod should be terminated after
five minutes. This means that even if you lose an entire Kubernetes node, the pod will be
terminated and rescheduled only after five minutes.
Depending on your platform and the type of failure of a Kubernetes worker node, it is possible that the pods will not be rescheduled from a dead worker node and the pod will stay in terminating state forever. In this case manual intervention is needed to move forward.
Pod scheduling recommendations
Learn about the pod scheduling configurations recommended by Cloudera.
Tolerations
Instead of using the default tolerations with 300 seconds, you can consider setting tolerations with smaller timeouts if a five minute downtime of Kafka brokers, KRaft controllers or ZooKeeper nodes is not acceptable for you.
For Kafka brokers it is possible to set tolerations globally using
spec.kafka.template.pod.tolerations
in the Kafka
resource. Alternatively, you can set tolerations for a group of broker nodes only using
spec.template.pod.tolerations
in the KafkaNodePool
resource.
For KRaft controllers, configuration of the tolerations is the same as for Kafka brokers.
You can set tolerations globally using spec.kafka.template.pod.tolerations
in the Kafka resource. Alternatively, you can set tolerations for a group
of controller nodes only using spec.template.pod.tolerations
in the
KafkaNodePool resource.
For ZooKeeper it is only possible to set tolerations globally in
spec.zookeeper.template.pod.tolerations
in the Kafka
resource.
Other affinity rules
You can use required and preferred rules to fine tune scheduling according to your needs.
If you use required rules, it is your platform’s responsibility to always have enough resources (for example, enough nodes) to satisfy the rules. Otherwise, the scheduler will not be able to schedule pods and they will be in a pending state.
If you use preferred rules with any weight, ensure that the rule weight is correctly set. The scheduler will consider the rules with higher weight more important than others with lower weight.
For Kafka brokers it is possible to set affinity rules globally using
spec.kafka.template.pod.affinity
in the Kafka
resource. Alternatively, you can set affinity rules for a group of broker nodes only using
spec.template.pod.affinity
in the KafkaNodePool
resource.
For KRaft controllers, configuration of affinity rules is the same as for Kafka brokers.
You can set affinity rules globally using spec.kafka.template.pod.affinity
in the Kafka resource. Alternatively, you can set affinity rules for a
group of controller nodes only using spec.template.pod.affinity
in the
KafkaNodePool resource.
For ZooKeeper it is only possible to set affinity rules globally in
spec.zookeeper.template.pod.affinity
in the Kafka
resource.
The following collects a number of example required rules for typical use cases.
#...
kind: KafkaNodePool
spec:
template:
pod:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: strimzi.io/cluster
operator: In
values:
- [***CLUSTER NAME***]
- key: strimzi.io/broker-role
operator: In
values:
- "true"
topologyKey: kubernetes.io/hostname
#...
kind: KafkaNodePool
spec:
template:
pod:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: strimzi.io/cluster
operator: In
values:
- [***CLUSTER NAME***]
- key: strimzi.io/controller-role
operator: In
values:
- "true"
topologyKey: kubernetes.io/hostname
#...
kind: Kafka
spec:
zookeeper:
template:
pod:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: strimzi.io/component-type
operator: In
values:
- zookeeper
- key: strimzi.io/cluster
operator: In
values:
- [***CLUSTER NAME***]
topologyKey: kubernetes.io/hostname
#...
kind: Kafka
spec:
kafka:
template:
pod:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: strimzi.io/cluster
operator: In
values:
- [***CLUSTER NAME***]
topologyKey: kubernetes.io/hostname
zookeeper:
template:
pod:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: strimzi.io/cluster
operator: In
values:
- [***CLUSTER NAME***]
topologyKey: kubernetes.io/hostname
#...
kind: Kafka
spec:
kafka:
template:
pod:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: strimzi.io/cluster
operator: In
values:
- [***CLUSTER NAME***]
topologyKey: kubernetes.io/hostname