Viewing details of a Model Endpoint using UI
Model Endpoint configuration settings in the Cloudera AI console display deployment summaries, access controls, resource profiles, and vLLM engine arguments for served models..
-
In the Cloudera console, click the
Cloudera AI tile.
The Cloudera AI Workbenches page displays.
-
Click Model Endpoints under Deployments
on the left navigation menu.
The Model Endpoints landing page is displayed.
- Select a model endpoint to view its details.
-
On the Model Endpoint Details page, go to the
Configurations tab. This tab is selected by default. Use the
left-side navigator to select a configuration view:
- Served Models: Displays deployment summaries including model
names, active traffic allocation percentages, desired replicas, and live running
replicas. This view is critical for auditing multi-revision, canary, or blue-green
endpoint deployments.
-
Viewing optimization profiles for NGC models: For Model Endpoints that serve NGC (NVIDIA GPU Cloud) model versions with an optimization specification, you can view the optimization profile details directly from the Served Models configuration view.
An optimization profile defines the hardware and engine optimizations applied to a specific NGC model version, such as the TensorRT-LLM engine configuration, GPU type, precision settings, and other performance-tuning parameters.
Figure 1. Optimization Profiles View 
-
- Access Control: Outlines the security configurations and user or group permissions mapped to the endpoint.
- Resource Profile: Displays the computing resources (CPU, GPU, Memory) provisioned for the endpoint's execution.
- Environment Variables: Lists the environment-specific key-value configurations injected into the container runtime.
- vLLM Arguments: Displays targeted vLLM engine flags. Note: This panel automatically hides if the model runtime configuration is not eligible for vLLM optimization.
- Tags: Displays the metadata and organizational labels assigned to the endpoint.
- Served Models: Displays deployment summaries including model
names, active traffic allocation percentages, desired replicas, and live running
replicas. This view is critical for auditing multi-revision, canary, or blue-green
endpoint deployments.
