Pod scheduling

Learn about the default affinity rules and tolerations that Strimzi sets for pod scheduling. Additionally, learn what affinity rules Cloudera recommends for making pod scheduling stricter.

The scheduling of Kafka broker, KRaft controller, and ZooKeeper pods can be customized in the Kafka and KafkaNodePool resources through various configurations such as storage configurations, affinity rules, and tolerations. Strimzi by default only sets a few of the pod scheduling configurations. It is your responsibility to ensure that pod scheduling configurations are customized correctly for your environment and use case.

Both storage and rack awareness configuration might have an impact on pod scheduling. For storage, depending on the configuration, it is possible that a pod is bound to a node or a group of nodes and cannot be scheduled elsewhere.

If rack awareness is configured, your pods by default get preferred and required affinity rules, which influence pod scheduling.

Default tolerations

The Strimzi Cluster Operator does not set any tolerations on the Kafka broker, KRaft controller, and ZooKeeper pods by default. The pods get a default toleration from the Kubernetes platform.

The default tolerations are as follows.

#...
kind: Kafka
spec:
  kafka:
    template:
      pod:
        tolerations:
          - effect: NoExecute
            key: node.kubernetes.io/not-ready
            operator: Exists
            tolerationSeconds: 300
          - effect: NoExecute
            key: node.kubernetes.io/unreachable
            operator: Exists
            tolerationSeconds: 300

This means that whenever the Kubernetes node running the pod is tainted as unreachable or not-ready, the pod should be terminated after five minutes. This means that even if you lose an entire Kubernetes node, the pod will be terminated and rescheduled only after five minutes.

Depending on your platform and the type of failure of a Kubernetes worker node, it is possible that the pods will not be rescheduled from a dead worker node and the pod will stay in terminating state forever. In this case manual intervention is needed to move forward.

Pod scheduling recommendations

Learn about the pod scheduling configurations recommended by Cloudera.

Tolerations

Instead of using the default tolerations with 300 seconds, you can consider setting tolerations with smaller timeouts if a five minute downtime of Kafka brokers, KRaft controllers or ZooKeeper nodes is not acceptable for you.

For Kafka brokers it is possible to set tolerations globally using spec.kafka.template.pod.tolerations in the Kafka resource. Alternatively, you can set tolerations for a group of broker nodes only using spec.template.pod.tolerations in the KafkaNodePool resource.

For KRaft controllers, configuration of the tolerations is the same as for Kafka brokers. You can set tolerations globally using spec.kafka.template.pod.tolerations in the Kafka resource. Alternatively, you can set tolerations for a group of controller nodes only using spec.template.pod.tolerations in the KafkaNodePool resource.

For ZooKeeper it is only possible to set tolerations globally in spec.zookeeper.template.pod.tolerations in the Kafka resource.

Other affinity rules

You can use required and preferred rules to fine tune scheduling according to your needs.

If you use required rules, it is your platform’s responsibility to always have enough resources (for example, enough nodes) to satisfy the rules. Otherwise, the scheduler will not be able to schedule pods and they will be in a pending state.

If you use preferred rules with any weight, ensure that the rule weight is correctly set. The scheduler will consider the rules with higher weight more important than others with lower weight.

For Kafka brokers it is possible to set affinity rules globally using spec.kafka.template.pod.affinity in the Kafka resource. Alternatively, you can set affinity rules for a group of broker nodes only using spec.template.pod.affinity in the KafkaNodePool resource.

For KRaft controllers, configuration of affinity rules is the same as for Kafka brokers. You can set affinity rules globally using spec.kafka.template.pod.affinity in the Kafka resource. Alternatively, you can set affinity rules for a group of controller nodes only using spec.template.pod.affinity in the KafkaNodePool resource.

For ZooKeeper it is only possible to set affinity rules globally in spec.zookeeper.template.pod.affinity in the Kafka resource.

The following collects a number of example required rules for typical use cases.

Run each Kafka broker pod on different nodes

#...
kind: KafkaNodePool
spec:
  template:
    pod:
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            - labelSelector:
                matchExpressions:
                  - key: strimzi.io/cluster
                    operator: In
                    values:
                      - [***CLUSTER NAME***]
                  - key: strimzi.io/broker-role
                    operator: In
                    values:
                      - "true"
              topologyKey: kubernetes.io/hostname

Run each KRaft controller pod on different nodes

#...
kind: KafkaNodePool
spec:
  template:
    pod:
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            - labelSelector:
                matchExpressions:
                  - key: strimzi.io/cluster
                    operator: In
                    values:
                      - [***CLUSTER NAME***]
                  - key: strimzi.io/controller-role
                    operator: In
                    values:
                      - "true"
              topologyKey: kubernetes.io/hostname

Run each Zookeeper pod on different nodes

#...
kind: Kafka
spec:
  zookeeper:
    template:
      pod:
        affinity:
          podAntiAffinity:
            requiredDuringSchedulingIgnoredDuringExecution:
              - labelSelector:
                  matchExpressions:
                    - key: strimzi.io/component-type
                      operator: In
                      values:
                        - zookeeper
                    - key: strimzi.io/cluster
                      operator: In
                      values:
                        - [***CLUSTER NAME***]
                topologyKey: kubernetes.io/hostname

Run ZooKeeper and Kafka broker pods on different nodes

#...
kind: Kafka
spec:
  kafka:
    template:
      pod:
        affinity:
          podAntiAffinity:
            requiredDuringSchedulingIgnoredDuringExecution:
              - labelSelector:
                  matchExpressions:
                    - key: strimzi.io/cluster
                      operator: In
                      values:
                        - [***CLUSTER NAME***]
                topologyKey: kubernetes.io/hostname
  zookeeper:
    template:
      pod:
        affinity:
          podAntiAffinity:
            requiredDuringSchedulingIgnoredDuringExecution:
              - labelSelector:
                  matchExpressions:
                    - key: strimzi.io/cluster
                      operator: In
                      values:
                        - [***CLUSTER NAME***]
                topologyKey: kubernetes.io/hostname

Run KRaft controller and Kafka broker pods on different nodes

#...
kind: Kafka
spec:
  kafka:
    template:
      pod:
        affinity:
          podAntiAffinity:
            requiredDuringSchedulingIgnoredDuringExecution:
            - labelSelector:
                matchExpressions:
                - key: strimzi.io/cluster
                  operator: In
                  values:
                  - [***CLUSTER NAME***]
              topologyKey: kubernetes.io/hostname