Validating Sessions on New Worker Host

After adding one or more worker hosts or nodes to your cluster, you should validate that the CDSW/CML sessions run properly on the new hosts. If you do not validate that the sessions run properly on the new hosts, you might find intermittent issues that are difficult to debug when a session is scheuled on the worker host.

You can validate that the session works on the new host by forcing a new session to be schedule on the host with the following procedure on the master node.
  1. Enter the following command:
    kubectl get nodes #
  2. Validate that the new node is displayed.
  3. Cordon all the CDSW nodes to prevent them from being scheduled:
    kubectl get nodes | awk '{if (NR!=1) {print $1}}' | xargs -I {} kubectl cordon {}
  4. Uncordon the new worker node:
    kubectl uncordon [new cdsw worker node]
  5. Start a new session in CDSW and verify that this new session is scheduled on the worker node.
    kubectl get pods -A -o wide # 
    You should see your session in this list.
  6. Verify that the CDSW session starts on the new node without any issues.
  7. Uncordon all of the CDSW nodes after testing the new worker node:
    kubectl get nodes | awk '{if (NR!=1) {print $1}}' | xargs -I {} kubectl uncordon {}
  8. If the session does not properly start, you can check the following
    1. Ensure that the new worker node has the same system level configurations as all other worker nodes, including things such as the /etc/resolv.conf file.
    2. Ensure that any folders in the CDSW -> Admin > Mounts section exist on the new worker host.
    3. If you see errors starting the session such as ImgPullBack error, then validate that any custom Docker images also exist on the new worker node.