Refreshing the YuniKorn configuration

Sometimes it is possible for the scheduler state to go out of sync from the cluster state. This may result in pods in Pending and ApplicationRejected states, with pod events showing Placement Rule related errors. To recover from this, you may need to refresh the YuniKorn configuration.

  1. Run the following commands to scale down the YuniKorn pods:
    kubectl scale deployment  yunikorn-admission-controller --replicas=0 -n yunikorn
    kubectl scale deployment  yunikorn-scheduler --replicas=0 -n yunikorn

    The yunikorn-scheduler and yunikorn-admission-controller pods are managed by the yunikorn-scheduler and yunikorn-admission-controller deployments in the yunikorn namespace, so you can scale down these deployments to 0.

  2. Run the following command to delete the yunikorn-configs ConfigMap:
    kubectl delete cm yunikorn-configs -n yunikorn
  3. Run the following commands to restart the resource-pool-manager pod:
    kubectl scale deployment cdp-release-resource-pool-manager --replicas=0 -n <cdp-namespace>
    kubectl scale deployment cdp-release-resource-pool-manager --replicas=1 -n <cdp-namespace>

    The resource-pool-manager pod is managed by the cdp-release-resource-pool-manager deployment in your CDP control plane namespace, so you can scale that deployment down to 0 and then scale it back up to 1.

  4. Run the following commands to scale up the YuniKorn pods:
    kubectl scale deployment  yunikorn-scheduler --replicas=1 -n yunikorn
    kubectl scale deployment  yunikorn-admission-controller --replicas=1 -n yunikorn

    The yunikorn-scheduler and yunikorn-admission-controller pods are managed by the yunikorn-scheduler and yunikorn-admission-controller deployments in the yunikorn namespace, so you can scale up these deployments to 1.

The preceding steps will refresh the YuniKorn configuration for the applicable control plane.

After the YuniKorn restart, Pending pods will be picked up and recovered automatically, but pods left in the ApplicationRejected state may need to be redeployed. If the pod is managed by a deployment, you can simply delete the pod. If the pod is unmanaged, you must delete and redeploy the pod.