Known Issues and Limitations
- Known Issues with Model Builds and Deployed Models
-
Re-deploying or re-building models results in model downtime (usually brief).
-
Re-starting Cloudera Machine Learning does not automatically restart active models. These models must be manually restarted so they can serve requests again.
Cloudera Bug: DSE-4950
-
Model builds will fail if your project filesystem includes a
.git
directory (likely hidden or nested). Typical build stage errors include:Error: 2 UNKNOWN: Unable to schedule build: [Unable to create a checkpoint of current source: [Unable to push sources to git server: ...
To work around this, rename the
.git
directory (for example,NO.git
) and re-build the model.Cloudera Bug: DSE-4657
-
JSON requests made to active models should not be more than 5 MB in size. This is because JSON is not suitable for very large requests and has high overhead for binary objects such as images or video. Call the model with a reference to the image or video, such as a URL, instead of the object itself.
-
Any external connections, for example, a database connection or a Spark context, must be managed by the model's code. Models that require such connections are responsible for their own setup, teardown, and refresh.
-
Model logs and statistics are only preserved so long as the individual replica is active. Cloudera Machine Learning may restart a replica at any time it is deemed necessary (such as bad input to the model).
-
(MLLib) The MLLib
model.save()
function fails with the following sample error. This occurs because the Spark executors on CML all share a mount of/home/cdsw
which results in a race condition as multiple executors attempt to write to it at the same time.Caused by: java.io.IOException: Mkdirs failed to create file:/home/cdsw/model.mllib/metadata/_temporary ....
Recommended workarounds:- Save the model to
/tmp
, then move it into/home/cdsw
on the driver/session. - Save the model to either an S3 URL or any other explicit external URL.
- Save the model to
-
-
Limitations
-
Scala models are not supported.
-
Spawning worker threads are not supported with models.
- Models deployed using Cloudera Machine Learning Private Cloud
are highly available subject to the following limitations:
- Model high availability is dependent on the high availability of the Kubernetes service. If using a third-party Kubernetes service to host CDP Private Cloud, please refer to your chosen provider for precise SLAs.
- In the event that the Kubernetes pod running either the model proxy service or the load balancer becomes unavailable, the Model may be unavailable for multiple seconds during failover.
-
Dynamic scaling and auto-scaling are not currently supported. To change the number of replicas in service, you will have to re-deploy the build.
-