MachineLearningKubernetesCriticalPodNotHealthy

Reference information for MachineLearningKubernetesCriticalPodNotHealthy used as part of Pulse in CDP Private Cloud Data Services.

Summary
Machine Learning Pod has been in a non-ready state for longer than an hour
PromQL Expression
min_over_time(sum by (namespace, pod) (kube_pod_status_phase{phase=~"Pending|Unknown|Failed", appType="ml", pod=~"{{ .Values.alerts.mlCriticalPodsRegexp }}"})[10m:]) > 0
Time Period
0s
Severity
critical
Source
environments/ml