Accessing Cloudera AI Inference service Metrics
Cloudera AI Inference service exposes Prometheus metrics for the deployed Model Endpoints. The UI displays plots of a few chosen metrics for each model endpoint.
For additional metrics, the Prometheus server can be accessed directly at
https://${DOMAIN_NAME}/prometheus
by passing the authorization bearer
token.
Predictive models deployed with the NVIDIA Triton Inference Server and NVIDIA NIM embedding
models export metrics prefixed with nv_
. Refer to the NVIDIA NIM
documentation for metrics exported by the NVIDIA NIM Large Language Model (LLM)
runtimes for text-generation.