Model Request and Response Formats
Every model function in Cloudera Machine Learning takes a single argument in the form of a JSON-encoded object, and returns another JSON-encoded object as output. This format ensures compatibility with any application accessing the model using the API, and gives you the flexibility to define how JSON data types map to your model's datatypes.
Model Requests
When making calls to a model, keep in mind that JSON is not suitable for very large requests and has high overhead for binary objects such as images or video. Consider calling the model with a reference to the image or video such as a URL instead of the object itself. Requests to models should not be more than 5 MB in size. Performance may degrade and memory usage increase for larger requests.
Ensure that the JSON request represents all objects in the request or response of a model call. For example, JSON does not natively support dates. In such cases consider passing dates as strings, for example in ISO-8601 format, instead.
For a simple example of how to pass JSON arguments to the model function and make calls to deployed model, see Creating and Deploying a Model.
Model Responses
Models return responses in the form of a JSON-encoded object. Model response times depend on how long it takes the model function to perform the computation needed to return a prediction. Model replicas can only process one request at a time. Concurrent requests are queued until a replica is available to process them.
When Cloudera Machine Learning receives a call request for a model,
it attempts to find a free replica that can answer the call. If the first arbitrarily selected
replica is busy, Cloudera Machine Learning will keep trying to contact a free replica
for 30 seconds. If no replica is available, Cloudera Machine Learning will return a
model.busy
error with HTTP status code 429 (Too Many Requests). If you see
such errors, re-deploy the model build with a higher number of replicas.
Model request timeout
You can set the model request timeout duration to a custom value. The default value is 30 seconds. The timeout can be changed if model requests might take more than 30 seconds.
- As an Admin user, open a CLI.
- At the prompt, execute the following command. Substitute
<value>
with the number of seconds to set.kubectl set env deployment model-proxy MODEL_REQUEST_TIMEOUT_SECONDS=<value> -n mlx
This edits the kubeconfig file and sets a new value for the timeout duration.