Accessing Cloudera AI Inference service Metrics

Cloudera AI Inference service exposes Prometheus metrics for the deployed Model Endpoints. The UI displays plots of a few chosen metrics for each model endpoint.

For additional metrics, the Prometheus server can be accessed directly at https://${DOMAIN_NAME}/prometheus by passing the authorization bearer token.

Predictive models deployed with the NVIDIA Triton Inference Server and NVIDIA NIM embedding models export metrics prefixed with nv_. Refer to the NVIDIA NIM documentation for metrics exported by the NVIDIA NIM Large Language Model (LLM) runtimes for text-generation.