After adding one or more worker hosts or nodes to your cluster, you should validate
that the CDSW/CML sessions run properly on the new hosts. If you do not validate that the
sessions run properly on the new hosts, you might find intermittent issues that are difficult
to debug when a session is scheuled on the worker host.
You can validate that the session works on the new host by
forcing a new session to be schedule on the host with the following procedure on the master
node.
-
Enter the following command:
-
Validate that the new node is displayed.
-
Cordon all the CDSW nodes to prevent them from being scheduled:
kubectl get nodes | awk '{if (NR!=1) {print $1}}' | xargs -I {} kubectl cordon {}
-
Uncordon the new worker node:
kubectl uncordon [new cdsw worker node]
-
Start a new session in CDSW and verify that this new session is scheduled on the
worker node.
kubectl get pods -A -o wide #
You should see your session in this list.
-
Verify that the CDSW session starts on the new node without any issues.
-
Uncordon all of the CDSW nodes after testing the new worker node:
kubectl get nodes | awk '{if (NR!=1) {print $1}}' | xargs -I {} kubectl uncordon {}
-
If the session does not properly start, you can check the following
-
Ensure that the new worker node has the same system level configurations as
all other worker nodes, including things such as the
/etc/resolv.conf file.
-
Ensure that any folders in the CDSW -> Admin > Mounts
section exist on the new worker host.
-
If you see errors starting the session such as
ImgPullBack error, then validate that any custom Docker
images also exist on the new worker node.