Limitations and Restrictions

Lists the limitations and restrictions when using Cloudera AI Inference service.

API Stability: Both the Cloudera AI Control Plane and Cloudera AI Inference service workload APIs and CLIs are under active development and are subject to change in a backward-incompatible way.
Cloud Platforms: Cloudera AI Inference service is available on AWS and Azure.
Supported Instance Types: Cloudera AI Inference service supports the same cloud instance types as those of Cloudera AI Workbenches with a few exceptions. See Known Issues for information on unsupported instance types. The type or size of the model you want to deploy determines the cloud compute instance type. Some highly optimized versions of Large Language Models, for instance, work only on specific GPU architectures.
No Non-Transparent Proxy Support: Cloudera AI Inference service has not been tested with a non-transparent proxy (NTP) setup in a private cluster. However, it works in a vanilla private cluster.
User-Defined Route (UDR) Support in Azure: Cloudera AI Inference service provides support for the UDR setup in Azure Kubernetes Service (AKS) clusters. Currently, the compute clusters UI does not support specification of subnets attached to UDRs. As a result, compute clusters utilizing UDR-attached subnets must be created using the CLI.
note
When creating a cluster, ensure that the specified subnet is not used by another AKS cluster.
Example payload for creating compute clusters with a UDR-attached subnet using CLI:
```
{
    "environment": "[***ENVIRONMENT_NAME***]",
    "name": "[***CLUSTER_NAME***]",
    "network": {
        "subnets": [
            "[***SUBNET_NAME***]"
        ],
        "outboundType": "udr"
    },
    "skipValidation": false
}
```
Public Load Balancer: By default, Cloudera AI Inference service uses a private load balancer for cluster ingress. If you use a public load balancer instead, set the use PublicLoadBalancer parameter value to true in the creation payload.
If you are on AWS and use a private load balancer for cluster ingress, you must have a VPN connection between your corporate network and the Virtual Private Cloud (VPC) in which the Cloudera AI Inference service is deployed. The Cloudera AI Inference service UI requires VPN connection.
Logging: All Kubernetes pod logs, including pods that are running model servers, are scraped by the platform log aggregator service (fluentd). Model endpoint logs can be viewed from the Cloudera AI Inference service GUI. To view logs of other pods, you must first obtain the kubeconfig of the cluster and use the kubectl command. Historical logs can be retrieved using the Generate Log Archive feature on the Cloudera AI Inference service administration UI.
Namespace: Model endpoints can only be deployed in the serving-default namespace.