Pod scheduling

Learn about the default affinity rules and tolerations that Strimzi sets for pod scheduling. Additionally, learn what affinity rules Cloudera recommends for making pod scheduling stricter.

The scheduling of Kafka and ZooKeeper pods can be customized in the Kafka and KafkaNodePool resources through various configurations such as storage configurations, affinity rules, and tolerations. Strimzi by default only sets a few of the pod scheduling configurations. It is your responsibility to ensure that pod scheduling configurations are customized correctly for your environment and use case.

Both storage and rack awareness configuration might have an impact on pod scheduling. For storage, depending on the configuration, it is possible that a pod is bound to a node or a group of nodes and cannot be scheduled elsewhere.

If rack awareness is configured, your pods by default get preferred and required affinity rules, which influence pod scheduling.

Default tolerations

The Strimzi Cluster Operator does not set any tolerations on the Kafka broker and ZooKeeper pods by default. The pods get a default toleration from the Kubernetes platform.

The default tolerations are as follows.

#...
kind: Kafka
spec:
  kafka:
    template:
      pod:
        tolerations:
          - effect: NoExecute
            key: node.kubernetes.io/not-ready
            operator: Exists
            tolerationSeconds: 300
          - effect: NoExecute
            key: node.kubernetes.io/unreachable
            operator: Exists
            tolerationSeconds: 300

This means that whenever the Kubernetes node running the pod is tainted as unreachable or not-ready, the pod should be terminated after five minutes. This means that even if you lose an entire Kubernetes node, the pod will be terminated and rescheduled only after five minutes.

Depending on your platform and the type of failure of a Kubernetes worker node, it is possible that the pods will not be rescheduled from a dead worker node and the pod will stay in terminating state forever. In this case manual intervention is needed to move forward.

Pod scheduling recommendations

Learn about the pod scheduling configurations recommended by Cloudera.

Tolerations

Instead of using the default tolerations with 300 seconds, you can consider setting tolerations with smaller timeouts if a five minute downtime of a Kafka broker or ZooKeeper is not acceptable for you.

For Kafka brokers it is possible to set tolerations globally in spec.kafka.template.pod.tolerations in the Kafka resource or you can set it only for a group of nodes in spec.template.pod.tolerations in the KafkaNodePool resource.

For ZooKeeper it is only possible to set tolerations globally in spec.zookeeper.template.pod.tolerations in the Kafka resource.

Other affinity rules

You can use required and preferred rules to fine tune scheduling according to your needs.

If you use required rules, it is your platform’s responsibility to always have enough resources (for example, enough nodes) to satisfy the rules. Otherwise, the scheduler will not be able to schedule pods and they will be in a pending state.

If you use preferred rules with any weight, ensure that the rule weight is correctly set. The scheduler will consider the rules with higher weight more important than others with lower weight.

For Kafka brokers it is possible to set affinity rules globally in spec.kafka.template.pod.affinity in the Kafka resource or you can set it only for a group of nodes in spec.template.pod.affinity in the KafkaNodePool resource.

For ZooKeeper it is only possible to set affinity rules globally in spec.zookeeper.template.pod.affinity in the Kafka resource.

The following collects preferred and required rules for typical use cases. Use either the preferred or required rules from the following examples.

Run each Kafka broker pod on different nodes
#...
kind: Kafka
spec:
 kafka:
   template:
     pod:
       affinity:
         podAntiAffinity:
           preferredDuringSchedulingIgnoredDuringExecution:
             - podAffinityTerm:
                 labelSelector:
                   matchExpressions:
                     - key: strimzi.io/name
                       operator: In
                       values:
                         - [***CLUSTER NAME***]-kafka
                 topologyKey: "kubernetes.io/hostname"
               weight: 99
#...
kind: Kafka
spec:
 kafka:
   template:
     pod:
       affinity:
         podAntiAffinity:
           requiredDuringSchedulingIgnoredDuringExecution:
             - labelSelector:
                 matchExpressions:
                   - key: strimzi.io/name
                     operator: In
                     values:
                       - [***CLUSTER NAME***]-kafka
               topologyKey: "kubernetes.io/hostname"
Run each ZooKeeper pod on different nodes
#...
kind: Kafka
spec:
 zookeeper:
   template:
     pod:
       affinity:
         podAntiAffinity:
           preferredDuringSchedulingIgnoredDuringExecution:
             - podAffinityTerm:
                 labelSelector:
                   matchExpressions:
                     - key: strimzi.io/name
                       operator: In
                       values:
                         - [***CLUSTER NAME***]-zookeeper
                 topologyKey: "kubernetes.io/hostname"
               weight: 99
#...
kind: Kafka
spec:
 zookeeper:
   template:
     pod:
       affinity:
         podAntiAffinity:
           requiredDuringSchedulingIgnoredDuringExecution:
             - labelSelector:
                 matchExpressions:
                   - key: strimzi.io/name
                     operator: In
                     values:
                       - [***CLUSTER NAME***]-zookeeper
               topologyKey: "kubernetes.io/hostname"
Run ZooKeeper and Kafka broker pods on different nodes
#...
kind: Kafka
spec:
 kafka:
   template:
     pod:
       affinity:
         podAntiAffinity:
           preferredDuringSchedulingIgnoredDuringExecution:
             - podAffinityTerm:
                 labelSelector:
                   matchExpressions:
                     - key: strimzi.io/cluster
                       operator: In
                       values:
                         - [***CLUSTER NAME***]
                 topologyKey: "kubernetes.io/hostname"
               weight: 99
#...
 zookeeper:
   template:
     pod:
       affinity:
         podAntiAffinity:
           preferredDuringSchedulingIgnoredDuringExecution:
             - podAffinityTerm:
                 labelSelector:
                   matchExpressions:
                     - key: strimzi.io/cluster
                       operator: In
                       values:
                         - [***CLUSTER NAME***]
                 topologyKey: "kubernetes.io/hostname"
               weight: 99
#...
kind: Kafka
spec:
 kafka:
   template:
     pod:
       affinity:
         podAntiAffinity:
           requiredDuringSchedulingIgnoredDuringExecution:
             - labelSelector:
                 matchExpressions:
                   - key: strimzi.io/cluster
                     operator: In
                     values:
                       - [***CLUSTER NAME***]
               topologyKey: "kubernetes.io/hostname"
#...
 zookeeper:
   template:
     pod:
       affinity:
         podAntiAffinity:
           requiredDuringSchedulingIgnoredDuringExecution:
             - labelSelector:
                 matchExpressions:
                   - key: strimzi.io/cluster
                     operator: In
                     values:
                       - [***CLUSTER NAME***]
               topologyKey: "kubernetes.io/hostname"