Models

Model request timeout

You can set the model request timeout duration to a custom value. The default value is 30 seconds. The timeout can be changed if model requests might take more than 30 seconds.

To set the timeout value:

As an Admin user, open a CLI.
At the prompt, execute the following command. Substitute <value> with the number of seconds to set.
```
kubectl set env deployment model-proxy MODEL_REQUEST_TIMEOUT_SECONDS=<value> -n mlx
```
This edits the kubeconfig file and sets a new value for the timeout duration.

During build time, the model/experiment does not have access to user level environment variables.

Workaround: Add user level environment variables on the Administrative/Project level instead.

Cloudera Bug: DSE-19067

Model creation fails with ambiguous "Failed to create model" message when there is already a model with the same name. Issue also applies when using APIv2 directly.

Cloudera Bug: DSE-7509

(If quotas are enabled) Models that are stuck in the Scheduled state due to lack of resources do not automatically start even if you free up existing resources.

Workaround: Stop the Model that is stuck in the Scheduled state. Then manually reschedule that Model.

Cloudera Bug: DSE-6886

Unable to create a model with the name of a deleted Model.

Workaround: For now, Models shall have unique names across the lifespan of the cluster installation.

Cloudera Bug: DSE-4237

Re-deploying or re-building models results in model downtime (usually brief).

Model deployment will fail if your project filesystem is too large for the Git snapshot process. As a general rule, any project files (code, generated model artifacts, dependencies, etc.) larger than 50 MB must be part of your project's .gitignore file so that they are not included in snapshots for model builds.

Model builds will fail if your project filesystem includes a .git directory (likely hidden or nested). Typical build stage errors include:

Error: 2 UNKNOWN: Unable to schedule build: [Unable to create a checkpoint of current
              source: [Unable to push sources to git server: ...

To work around this, rename the .git directory (for example, NO.git) and re-build the model.

JSON requests made to active models should not be more than 5 MB in size. This is because JSON is not suitable for very large requests and has high overhead for binary objects such as images or video. Call the model with a reference to the image or video, such as a URL, instead of the object itself.

Any external connections, for example, a database connection or a Spark context, must be managed by the model's code. Models that require such connections are responsible for their own setup, teardown, and refresh.

Model logs and statistics are only preserved so long as the individual replica is active. Cloudera Data Science Workbench may restart a replica at any time it is deemed necessary (such as bad input to the model).

The use_model_metrics.py file which is available within the CDSW Templates misses the code for setting the user_api_key and is not up-to-date. Use the following code instead:

import cdsw
import time
from sklearn import datasets
import numpy as np

# This script demonstrates the usage of several model metrics-
# related functions:
# - call_model: Calls a model deployed on CDSW as an HTTP endpoint.
# - read_metrics: Reads metrics tracked for all model predictions
#   made within a time window. This is useful for  doing analytics 
#   on the tracked metrics.
# - track_delayed_metrics: Adds metrics for a given prediction 
#   retrospectively, after the prediction has already been made.
#   Common examples of such metrics are ground truth and various
#   per-prediction accuracy metrics.
# - track_aggregate_metrics: Adds metrics for a set or batch of
#   predictions within a given time window, not an individual 
#   prediction. Common examples of such metrics are mean or 
#   median accuracy, and various measures of drift.

# This script can be used in a local development mode, or in
# deployment mode. To use it in deployment mode, please: 
# - Set dev = False
# - Create a model deployment from the function 'predict' in
#   predict_with_metrics.py 
# - Obtain the model deployment's CRN from the model's overview
#   page and the model's access key from its settings page and 
#   paste them below.
# - If you selected "Enable Authentication" when creating the
#   model, then create a model API key from your user settings 
#   page and paste it below as well.

dev = True

# Conditionally import the predict function only if we are in
# dev mode
try:
    if dev:
        raise RuntimeError("In dev mode")
except:
    from predict_with_metrics import predict

if dev:
    model_deployment_crn=cdsw.dev_model_deployment_crn # update modelDeploymentCrn
    model_access_key=None
else: 
    # The model deployment CRN can be obtained from the model overview
    # page.
    model_deployment_crn=None 
    if model_deployment_crn is None:
        raise ValueError("Please set a valid model deployment Crn")

    # The model access key can be obtained from the model settings page.
    model_access_key=None
    if model_access_key is None:
        raise ValueError("Please set the model's access key")

    # You can create a models API key from your user settings page.
    # Not required if you did not select "Enable Authentication"
    # when deploying the model. In that case, anyone with the
    # model's access key can call the model.
    user_api_key = None

# First, we use the call_model function to make predictions for 
# the held-out portion of the dataset in order to populate the 
# metrics database.
iris = datasets.load_iris()
test_size = 20

# This is the input data for which we want to make predictions.
# Ground truth is generally not yet known at prediction time.
score_x = iris.data[:test_size, 2].reshape(-1, 1) # Petal length

# Record the current time so we can retrieve the metrics
# tracked for these calls.
start_timestamp_ms=int(round(time.time() * 1000))

uuids = []
predictions = []
for i in range(len(score_x)):
    if model_access_key is not None:
        output = cdsw.call_model(model_access_key, {"petal_length": score_x[i][0]}, api_key=user_api_key)["response"]
    else:
        output = predict({"petal_length": score_x[i][0]})
    # Record the UUID of each prediction for correlation with ground truth.
    uuids.append(output["uuid"])
    predictions.append(output["prediction"])

# Record the current time.
end_timestamp_ms=int(round(time.time() * 1000))

# We can now use the read_metrics function to read the metrics we just
# generated into the current session, by querying by time window.
data = cdsw.read_metrics(model_deployment_crn=model_deployment_crn,
            start_timestamp_ms=start_timestamp_ms,
            end_timestamp_ms=end_timestamp_ms, dev=dev)
data = data['metrics']

# Now, ground truth is known and we want to track the true value
# corresponding to each prediction above.
score_y = iris.data[:test_size, 3].reshape(-1, 1) # Observed petal width

# Track the true values alongside the corresponding predictions using
# track_delayed_metrics. At the same time, calculate the mean absolute
# prediction error.
mean_absolute_error = 0
n = len(score_y)
for i in range(n):
    ground_truth = score_x[i][0]
    cdsw.track_delayed_metrics({"actual_result":ground_truth}, uuids[i], dev=dev)

    absolute_error = np.abs(ground_truth - predictions[i])
    mean_absolute_error += absolute_error / n

# Use the track_aggregate_metrics function to record the mean absolute
# error within the time window where we made the model calls above.
cdsw.track_aggregate_metrics(
    {"mean_absolute_error": mean_absolute_error}, 
    start_timestamp_ms, 
    end_timestamp_ms, 
    model_deployment_crn=model_deployment_crn,
    dev=dev
)

Limitations

Scala models are not supported.
Spawning worker threads is not supported with models.
Models deployed using Cloudera Data Science Workbench are not highly-available.
Dynamic scaling and auto-scaling are not currently supported. To change the number of replicas in service, you will have to re-deploy the build.