Troubleshooting ML Workspaces on Azure
You can collect logs to troubleshoot issues that occur in ML Workspaces with Azure.
How to access Azure logs
Logs from the AKS control plane can be found in the "Logs" blade of the
liftie-xxxxxxx
resource group (not to be confused with the "Logs" blade
of the AKS cluster itself or the Log Analytics Workspace in that resource group). The logs
can be looked up using a query language developed by Microsoft.
Cluster fails to scale down
If a worker node is idle but is not being scaled down, check the cluster autoscaler logs.
AzureDiagnostics | where Category ==
"cluster-autoscaler"
The logs list the pods that are scheduled on a given node that are preventing it from being scaled down, or other reasons for its scaling decisions. Services running in the kube-system namespace (such as tunnelfront, or metrics-server) have been known to delay scale-down when scheduled on an otherwise idle node.
Delete ML Workspace fails
If you delete a workspace, and the delete operation fails, you can use Force delete to remove the workspace.
In this case, CML attempts to delete associated cloud resources for the workspace including metadata files. However, users should check that all such resources have been deleted, and delete manually if necessary.