Autoscaling Overview

The autoscaler used by Amazon Elastic Kubernetes Services (EKS) for cluster autoscaling is the Kubernetes Cluster Autoscaler, which works together with AWS EC2 Autoscaling to achieve dynamic cluster resizing. The cluster autoscaler changes the desired capacity of an autoscaling group to expand or contract a cluster based on pod resource requests.

Scaling Up

The primary trigger for scaling up (or expanding) an autoscaling group is failure by the Kubernetes pod scheduler to find a node that meets the pod’s resource requirements. Concretely for Cloudera Machine Learning (CML), if the scheduler cannot find a node to schedule an engine pod because of lack of sufficient CPU/Memory, the engine pod will be in “pending” state. When the autoscaler notices this situation, it will change the desired capacity of the autoscaling group (CPU or GPU) to provision a new node in the cluster. As soon as the new node is ready, the scheduler will place the session/engine pod there. In addition to the engine pod, certain CML daemonset pods will also be scheduled on the new node.

The time taken to schedule an engine pod on a new node depends on the amount of time the autoscaler takes to add a new node into the cluster, plus time taken to pull the engine’s Docker image to the new node.

Scaling Down

A cluster will be scaled down by the autoscaler by removing a node, when the resource utilization on the “victim” node is less than a pre-defined threshold, provided the node does not have any non-evictable pods running on it. This threshold is currently set to 20% CPU utilization. That is, a node will be evicted if the following criteria are met:
  • The does not have non-evictable pods
  • The node's CPU utilization is less than 20%
  • The number of active nodes in the autoscaling group is more than the configured minimum capacity

It is possible that certain pods might be moved from the victim node to some other node during the down-scaling process.