This section helps you to run the onboarding script located in the
configure-observability-pipeline.sh file to start the monitoring
pods.
For a non-GPU-based cluster, this is the reference command:
./configure-observability-pipeline.sh --storage-metrics-ingestion-url
"[***metrics-ingestion-url***]" --storage-dbus-api-url "[***dbus-url***]"
--storage-credentials-file "[***access-credentials***]"
--controlplane-namespace [***control-plane-namespace***]
For a GPU-based cluster, this is the reference command:
OpenShift Container Platform Cluster
./configure-observability-pipeline.sh
--storage-metrics-ingestion-url "[***metrics-ingestion-url***]"
--storage-dbus-api-url "[***dbus-url***]" --storage-credentials-file
"[***access-credentials***]" --controlplane-namespace
[***control-plane-namespace***] --dcgm-exporter-enable
--dcgm-exporter-enable-privileged "true"
--dcgm-exporter-runtime-class nvidia --dcgm-exporter-enable-scc
"true"
Embedded Container Service Cluster
./configure-observability-pipeline.sh
--storage-metrics-ingestion-url "[***metrics-ingestion-url***]"
--storage-dbus-api-url "[***dbus-url***]" --storage-credentials-file
"[***access-credentials***]" --controlplane-namespace
[***control-plane-namespace***] --dcgm-exporter-enable
--dcgm-exporter-enable-privileged "true"
--dcgm-exporter-runtime-class nvidia
After completing these steps, the metrics collector, workload collector, and
observability agent pods will start running in the observability namespace. These
pods collect Real-Time Monitoring metrics and Cloudera AI
workload logs.
Run several Cloudera AI workloads and then search for your
workbench name on the Cloudera Observability page in the AI workbench
environment. You can use the search feature to verify that the environment is
correctly displaying your workload data.
note
Contact Cloudera support in
case you have any issues, like frequent pod restarts or resource
overallocation.