Known Issues and Limitations

Known Issues with Model Builds and Deployed Models
- Re-deploying or re-building models results in model downtime (usually brief).
- Re-starting Cloudera Machine Learning does not automatically restart active models. These models must be manually restarted so they can serve requests again.
  
  Cloudera Bug: DSE-4950
- Model builds will fail if your project filesystem includes a .git directory (likely hidden or nested). Typical build stage errors include:
```
Error: 2 UNKNOWN: Unable to schedule build: [Unable to create a checkpoint of current source: [Unable to push sources to git server: ...
```
  To work around this, rename the .git directory (for example, NO.git) and re-build the model.
  
  Cloudera Bug: DSE-4657
- JSON requests made to active models should not be more than 5 MB in size. This is because JSON is not suitable for very large requests and has high overhead for binary objects such as images or video. Call the model with a reference to the image or video, such as a URL, instead of the object itself.
- Any external connections, for example, a database connection or a Spark context, must be managed by the model's code. Models that require such connections are responsible for their own setup, teardown, and refresh.
- Model logs and statistics are only preserved so long as the individual replica is active. Cloudera Machine Learning may restart a replica at any time it is deemed necessary (such as bad input to the model).
- (MLLib) The MLLib model.save() function fails with the following sample error. This occurs because the Spark executors on CML all share a mount of /home/cdsw which results in a race condition as multiple executors attempt to write to it at the same time.
```
Caused by:
                  java.io.IOException: Mkdirs failed to create
                  file:/home/cdsw/model.mllib/metadata/_temporary ....
```
  Recommended workarounds:
  - Save the model to /tmp, then move it into /home/cdsw on the driver/session.
  - Save the model to either an S3 URL or any other explicit external URL.
Limitations
- Scala models are not supported.
- Spawning worker threads are not supported with models.
- Models deployed using Cloudera Machine Learning Private Cloud are highly available subject to the following limitations:
  - Model high availability is dependent on the high availability of the Kubernetes service. If using a third-party Kubernetes service to host CDP Private Cloud, please refer to your chosen provider for precise SLAs.
  - In the event that the Kubernetes pod running either the model proxy service or the load balancer becomes unavailable, the Model may be unavailable for multiple seconds during failover.
- Dynamic scaling and auto-scaling are not currently supported. To change the number of replicas in service, you will have to re-deploy the build.

We want your opinion

How can we improve this page?

What kind of feedback do you have?