Known issues with Cloudera AI Registry standalone API

These are some of the known issues you might run into while using Cloudera AI Registry standlone API.

NGC model download timeout: The NGC model import might time out, and the corresponding model version status is shown as “failed”. You can access the logs found in the API v2 pod by performing the steps mentioned in the Debugging the model import failure troubleshooting section.; Retry the model import request again.
Cloudera AI Inference service is unable to discover the Cloudera AI Registry: In certain cases, the Cloudera AI Inference service is unable to discover the Cloudera AI Registry.; After upgrading Cloudera AI Registry, follow the Manually Updating Cloudera AI Registry Configuration steps listed in the Troubleshooting section or delete and create model-registry to ensure that Cloudera AI Inference service continues to work.
Model import failure: You can download the models concurrently only if their combined size is below approximately 400 GB. Exceeding this limit may result in import failures and unexpected behavior.
Request Throttling: Currently, there is no request throttling mechanism implemented. As a result, excessive concurrent requests may lead to model import failures. To minimize the risk, it is recommended to limit concurrent requests to a maximum of 5, which is considered a safe threshold.
Model Import progress indicator: A progress bar is not available for model imports. For reference, importing a 70 GB model typically takes approximately 1 hour. Users should plan accordingly and monitor the process through alternative options, if necessary.
Lack of Model Import failure details: Currently, the UI and API does not provide specific reasons for model import failures. You have to use Kubernetes logs to diagnose the issues.

Known issues with Cloudera AI Registry standalone API

We want your opinion

How can we improve this page?