Models

Starting with version 1.4, Cloudera Data Science Workbench allows data scientists to build, deploy, and manage models as REST APIs to serve predictions.

Demo: Watch the following video for a quick demonstration of the steps described in this topic:

Continue reading:

Purpose
Introduction to Production Machine Learning
Concepts and Terminology
Creating and Deploying a Model (QuickStart)
Calling a Model
Updating Active Models
Securing Models using Model API Key
Enabling Model Metrics
Tracking Model Metrics
Usage Guidelines
Known Issues and Limitations

Related:

Purpose

Challenge

Data scientists often develop models using a variety of Python/R open source packages. The challenge lies in actually exposing those models to stakeholders who can test the model. In most organizations, the model deployment process will require assistance from a separate DevOps team who likely have their own policies about deploying new code.

For example, a model that has been developed in Python by data scientists might be rebuilt in another language by the devops team before it is actually deployed. This process can be slow and error-prone. It can take months to deploy new models, if at all. This also introduces compliance risks when you take into account the fact that the new re-developed model might not be even be an accurate reproduction of the original model.

Once a model has been deployed, you then need to ensure that the devops team has a way to rollback the model to a previous version if needed. This means the data science team also needs a reliable way to retain history of the models they build and ensure that they can rebuild a specific version if needed. At any time, data scientists (or any other stakeholders) must have a way to accurately identify which version of a model is/was deployed.

Solution

Starting with version 1.4, Cloudera Data Science Workbench allows data scientists to build and deploy their own models as REST APIs. Data scientists can now select a Python or R function within a project file, and Cloudera Data Science Workbench will:

Create a snapshot of model code, model parameters, and dependencies.
Package a trained model into an immutable artifact and provide basic serving code.
Add a REST endpoint that automatically accepts input parameters matching the function, and that returns a data structure that matches the function’s return type.
Save the model along with some metadata.
Deploy a specified number of model API replicas, automatically load balanced.

Introduction to Production Machine Learning

Machine learning (ML) has become one of the most critical capabilities for modern businesses to grow and stay competitive today. From automating internal processes to optimizing the design, creation, and marketing processes behind virtually every product consumed, ML models have permeated almost every aspect of our work and personal lives.

Each CDSW installation enables teams of data scientists to develop, test, train and ultimately deploy machine learning models for building predictive applications all on the data under management within the enterprise data cloud. Each ML workspace supports fully-containerized execution of Python, R, Scala, and Spark workloads through flexible and extensible engines.

Core capabilities

Seamless portability across private cloud, public cloud, and hybrid cloud powered by Kubernetes
Fully containerized workloads - including Python, and R - for scale-out data engineering and machine learning with seamless distributed dependency management
High-performance deep learning with distributed GPU scheduling and training
Secure data access across HDFS, cloud object stores, and external databases

CDSW users

CDSW users are:

Data management and data science executives at large enterprises who want to empower teams to develop and deploy machine learning at scale.
Data scientist developers (use open source languages like Python, R, Scala) who want fast access to compute and corporate data, the ability to work collaboratively and share, and an agile path to production model deployment.
IT architects and administrators who need a scalable platform to enable data scientists in the face of shifting cloud strategies while maintaining security, governance, and compliance. They can easily provision environments and enable resource scaling so they - and the teams they support - can spend less time on infrastructure and more time on innovation.

Challenges with model deployment and serving

After models are trained and ready to deploy in a production environment, lack of consistency with model deployment and serving workflows can present challenges in terms of scaling your model deployments to meet the increasing numbers of ML use-cases across your business.

Many model serving and deployment workflows have repeatable, boilerplate aspects that you can automate using modern DevOps techniques like high-frequency deployment and microservices architectures. This approach can enable machine learning engineers to focus on the model instead of the surrounding code and infrastructure.

Challenges with model monitoring

Machine Learning (ML) models predict the world around them which is constantly changing. The unique and complex nature of model behavior and model lifecycle present challenges after the models are deployed.

You can monitor the performance of the model on two levels: technical performance (latency, throughput, and so on similar to an Application Performance Management), and mathematical performance (is the model predicting correctly, is the model biased, and so on).

There are two types of metrics that are collected from the models:

Time series metrics: Metrics measured in-line with model prediction. It can be useful to track the changes in these values over time. It is the finest granular data for the most recent measurement. To improve performance, older data is aggregated to reduce data records and storage.

Post-prediction metrics: Metrics that are calculated after prediction time, based on ground truth and/or batches (aggregates) of time series metrics. To collect metrics from the models, the Python SDK has been extended to include the following functions that you can use to store different types of metrics:

track_metrics: Tracks the metrics generated by experiments and models.
read_metrics: Reads the metrics already tracked for a deployed model, within a given window of time.
track_delayed_metrics: Tracks metrics that correspond to individual predictions, but aren’t known at the time the prediction is made. The most common instances are ground truth and metrics derived from ground truth such as error metrics.
track_aggregate_metrics: Registers metrics that are not associated with any particular prediction. This function can be used to track metrics accumulated and/or calculated over a longer period of time.

The following two use-cases show how you can use these functions:

Tracking accuracy of a model over time
Tracking drift

Use case 1: Tracking accuracy of a model over time

Consider the case of a large telco. When a customer service representative takes a call from a customer, a web application presents an estimate of the risk that the customer will churn. The service representative takes this risk into account when evaluating whether to offer promotions.

The web application obtains the risk of churn by calling into a model hosted on CDSW. For each prediction thus obtained, the web application records the UUID into a datastore alongside the customer ID. The prediction itself is tracked in CDSW using the track_metrics function.

At some point in the future, some customers do in fact churn. When a customer churns, they or another customer service representative close their account in a web application. That web application records the churn event, which is ground truth for this example, in a datastore.

An ML engineer who works at the telco wants to continuously evaluate the suitability of the risk model. To do this, they create a recurring CDSW job. At each run, the job uses the read_metrics function to read all the predictions that were tracked in the last interval. It also reads in recent churn events from the ground truth datastore. It joins the churn events to the predictions and customer ID’s using the recorded UUID’s, and computes an Receiver operating characteristic (ROC) metric for the risk model. The ROC is tracked in the metrics store using the track_aggregate_metrics function.

The following diagram illustrates use cases like this one:

The ground truth can be stored in an external datastore, such as Cloudera Data Warehouse or in the metrics store.

Use case 2: Tracking drift

Instead of or in addition to computing ROC, the ML engineer may need to track various types of drift. Drift metrics are especially useful in cases where ground truth is unavailable or is difficult to obtain.

The definition of drift is broad and somewhat nebulous and practical approaches to handling it are evolving, but drift is always about changing distributions. The distribution of the input data seen by the model may change over time and deviate from the distribution in the training dataset, and/or the distribution of the output variable may change, and/or the relationship between input and output may change.

All drift metrics are computed by aggregating batches of predictions in some way. As in the use case above, batches of predictions can be read into recurring jobs using the read_metrics function, and the drift metrics computed by the job can be tracked using the track_aggregate_metrics function.

Concepts and Terminology

Model: Model is a high level abstract term that is used to describe several possible incarnations of objects created during the model deployment process. For the purpose of this discussion you should note that 'model' does not always refer to a specific artifact. More precise terms (as defined later in this section) should be used whenever possible.

Stages of the Model Deployment Process

The rest of this section contains supplemental information that describes the model deployment process in detail. Show

Create

File - The R or Python file containing the function to be invoked when the model is started.
Function - The function to be invoked inside the file. This function should take a single JSON-encoded object (for example, a python dictionary) as input and return a JSON-encodable object as output to ensure compatibility with any application accessing the model using the API. JSON decoding and encoding for model input/output is built into Cloudera Data Science Workbench.
The function will likely include the following components:
- Model Implementation
  The code for implementing the model (e.g. decision trees, k-means). This might originate with the data scientist or might be provided by the engineering team. This code implements the model's predict function, along with any setup and teardown that may be required.
- Model Parameters
  A set of parameters obtained as a result of model training/fitting (using experiments). For example, a specific decision tree or the specific centroids of a k-means clustering, to be used to make a prediction.

Build

This stage takes as input the file that calls the function and returns an artifact that implements a single concrete model, referred to as a model build.

Built Model
A built model is a static, immutable artifact that includes the model implementation, its parameters, any runtime dependencies, and its metadata. If any of these components need to be changed, for example, code changes to the implementation or its parameters need to be retrained, a new build must be created for the model. Model builds are versioned using build numbers.

To create the model build, Cloudera Data Science Workbench creates a Docker image based on the engine designated as the project's default engine. This image provides an isolated environment where the model implementation code will run.

To configure the image environment, you can specify a list of dependencies to be installed in a build script called cdsw-build.sh.
For details about the build process and examples on how to install dependencies, see Engines for Experiments and Models.
Build Number:
Build numbers are used to track different versions of builds within the scope of a single model. They start at 1 and are incremented with each new build created for the model.

Deploy

This stage takes as input the memory/CPU resources required to power the model, the number of replicas needed, and deploys the model build created in the previous stage to a REST API.

Deployed Model
A deployed model is a model build in execution. A built model is deployed in a model serving environment, likely with multiple replicas.
Environmental Variable
You can set environmental variables each time you deploy a model. Note that models also inherit any environment variables set at the project and global level. However, in case of any conflicts, variables set per-model will take precedence.
Note:
- If you are using any model-specific environmental variables, these must be specified every time you re-deploy a model. Models do not inherit environmental variables from previous deployments.
- Note that custom mounts or environment variables configured in cdsw.conf (such as NO_PROXY, HTTP(S)_PROXY, etc.) are still not passed to the container builds for experiments and models (even though they are applied to sessions, jobs, and deployed models/experiments).
Model Replicas
The engines that serve incoming requests to the model. Note that each replica can only process one request at a time. Multiple replicas are essential for load-balancing, fault tolerance, and serving concurrent requests. Cloudera Data science Workbench allows you to deploy a maximum of 9 replicas per model.
Deployment ID
Deployment IDs are numeric IDs used to track models deployed across Cloudera Data Science Workbench. They are not bound to a model or project.

Creating and Deploying a Model (QuickStart)

Using Cloudera Data Science Workbench, you can create any function within a script and deploy it to a REST API. In a machine learning project, this will typically be a predict function that will accept an input and return a prediction based on the model's parameters.

For the purpose of this quick start demo we are going to create a very simple function that adds two numbers and deploy it as a model that returns the sum of the numbers. This function will accept two numbers in JSON format as input and return the sum.

Create a new project. Note that models are always created within the context of a project.
Click Open Workbench and launch a new Python 3 session.
Create a new file within the project called add_numbers.py. This is the file where we define the function that will be called when the model is run. For example:
add_numbers.py
```
def add(args):
  result = args["a"] + args["b"]
  return result
```
Note: In practice, do not assume that users calling the model will provide input in the correct format or enter good values. Always perform input validation.
Before deploying the model, test it by running the add_numbers.py script, and then calling the add function directly from the interactive workbench session. For example:
```
add({"a": 3, "b": 5})
```
Deploy the add function to a REST endpoint.
1. Go to the project Overview page.
2. Click Models > New Model.
3. Give the model a Name and Description.
4. Enter details about the model that you want to build. In this case:
  - File: add_numbers.py
  - Function: add
  - Example Input: {"a": 3, "b": 5}
  - Example Output: 8
5. Select the resources needed to run this model, including any replicas for load balancing.
6. Click Deploy Model.
Click on the model to go to its Overview page. Click Builds to track realtime progress as the model is built and deployed. This process essentially creates a Docker container where the model will live and serve requests.
Once the model has been deployed, go back to the model Overview page and use the Test Model widget to make sure the model works as expected.

If you entered example input when creating the model, the Input field will be pre-populated with those values. Click Test. The result returned includes the output response from the model, as well as the ID of the replica that served the request.

Model response times depend largely on your model code. That is, how long it takes the model function to perform the computation needed to return a prediction. It is worth noting that model replicas can only process one request at a time. Concurrent requests will be queued until the model can process them.

Calling a Model

This section lists some requirements for model requests and how to test a model using Cloudera Data Science Workbench.

(Requirement) JSON for Model Requests/Responses

Every model function in Cloudera Data Science Workbench takes a single argument in the form of a JSON-encoded object, and returns another JSON-encoded object as output. This format ensures compatibility with any application accessing the model using the API, and gives you the flexibility to define how JSON data types map to your model's datatypes.

Model Requests

When making calls to a model, keep in mind that JSON is not suitable for very large requests and has high overhead for binary objects such as images or video. Consider calling the model with a reference to the image or video such as a URL instead of the object itself. Requests to models should not be more than 5 MB in size. Performance may degrade and memory usage increase for larger requests.

Ensure that the JSON request represents all objects in the request or response of a model call. For example, JSON does not natively support dates. In such cases consider passing dates as strings, for example in ISO-8601 format, instead.

For a simple example of how to pass JSON arguments to the model function and make calls to deployed model, see Creating and Deploying a Model (QuickStart).

Model Responses

Models return responses in the form of a JSON-encoded object. Model response times depend on how long it takes the model function to perform the computation needed to return a prediction. Model replicas can only process one request at a time. Concurrent requests are queued until a replica is available to process them.

When Cloudera Data Science Workbench receives a call request for a model, it attempts to find a free replica that can answer the call. If the first arbitrarily selected replica is busy, Cloudera Data Science Workbench will keep trying to contact a free replica for 30 seconds. If no replica is available, Cloudera Data Science Workbench will return a model.busy error with HTTP status code 429 (Too Many Requests). If you see such errors, re-deploy the model build with a higher number of replicas.

(Requirement) Access Key

Each model in CDSW has a unique access key associated with it. This access key is a unique identifier for the model.

Models deployed using CDSW are not public. In order to call an active model, your request must include the model's access key for authentication (as demonstrated in the sample calls above).

To locate the access key for a model, go to the model Overview page and click Settings.

Testing Calls to a Model

Cloudera Data Science Workbench provides two ways to test calls to a model:

Test Model Widget

On each model's Overview page, Cloudera Data Science Workbench provides a widget that makes a sample call to the deployed model to ensure it is receiving input and returning results as expected.
Sample Request Strings

On the model Overview page, Cloudera Data Science Workbench also provides sample curl and POST request strings that you can use to test calls to the model. Copy/paste the curl request directly into a Terminal to test the call.

Note that these sample requests already include the example input values you entered while building the model, and the access key required to query the model.

Updating Active Models

Active Model - A model that is in the Deploying, Deployed, or Stopping stages.

You can make changes to a model even after it has been deployed and is actively serving requests. Depending on business factors and changing resource requirements, such changes will likely range from changes to the model code itself, to simply modifying the number of CPU/GPUs requested for the model. In addition, you can also stop and restart active models.

Depending on your requirement, you can perform one of the following actions:

Re-deploy an Existing Build
Deploy a New Build for a Model
Stop a Model
Restart a Model

Re-deploy an Existing Build

Re-deploying a model involves re-publishing a previously-deployed model in a new serving environment - this is, with an updated number of replicas or memory/CPU/GPU allocation. For example, circumstances that require a re-deployment might include:

An active model that previously requested a large number of CPUs/GPUs that are not being used efficiently.
An active model that is dropping requests because it is falling short of replicas.
An active model needs to be rolled back to one of its previous versions.

To re-deploy an existing model:

Go to the model Overview page.
Click Deployments.
Select the version you want to deploy and click Re-deploy this Build.
Note: If you are using any model-specific environmental variables, these must be specified every time you re-deploy a model. Models do not inherit environmental variables from previous deployments.
Modify the model serving environment as needed.
Click Deploy Model.

Deploy a New Build for a Model

Deploying a new build for a model involves both, re-building the Docker image for the model, and deploying this new build. Note that this is not required if you only need to update the resources allocated to the model. As an example, changes that require a new build might include:

Code changes to the model implementation.
Renaming the function that is used to invoke the model.

To create a new build and deploy it:

Go to the model Overview page.
Click Deploy New Build.
Complete the form and click Deploy Model.

Stop a Model

To stop a model (all replicas), go to the model Overview page and click Stop. Click OK to confirm.

Restart a Model

To restart a model (all replicas), go to the model Overview page and click Restart. Click OK to confirm.

Restarting a model does not let you make any code changes to the model. It should primarily be used as a way to quickly re-initialize or re-connect to resources.

Securing Models using Model API Key

You can prevent unauthorized access to your models by requiring the user to specify a Model API Key in the “Authorization” header of your model HTTP request. This topic covers how to create, test, and use a Model API Key in Cloudera Data Science Workbench.

The Model API key governs the authentication part of the process and the authorization is based on what privileges the user already has in terms of the project that he is a part of. For example, if a user/application has read-only access to a project, then the authorization is based on their current access-level to the project, which is “read-only”. If a user has been authenticated to a project, then he can make a request to a model with the Model API Key. This is different to the previously described Access Key, which is only used to identify which model should serve a request.

Enabling authentication

Restricting access using Model API Keys is an optional feature. By default, the Enable Authentication option is turned on. However, it is turned off by default for the existing models for backward compatibility. You can enable authentication for all your existing models.

To enable authentication, go to Projects > Models > Settings and check the Enable Authentication option.

Generating a Model API Key

If you have enabled authentication, then you need a Model API Key to call a model. If you are not a collaborator on a particular project, then you cannot access the models within that project using the Model API Key that you generate. You need to be added as a collaborator by the admin or the owner of the project to use the Model API Key to access a model.

There are two types of API keys used in Cloudera Data Science Workbench:

API Key: This is used in the CDSW-specific internal APIs for CLI automation. This can’t be deleted and neither does it expire. This API Key is not required when sending requests to a model.
Model API Key: These are used to authenticate requests to a model. You can choose the expiration period and delete them when no longer needed.

You can generate more than one Model API Key to use with your model, depending on the number of clients that you are using to call the models.

Sign in to the Cloudera Data Science Workbench.
Click Settings from the left navigation pane.
On the User Settings page, click the API Keys tab.
Select an expiry date for the Model API Key, and click Create API keys.

An API key is generated along with a Key ID. If you do not specify an expiry date, then the generated key is active for one year from the current date.
Note:
- The Model API Key is private and ephemeral. Copy the key and the corresponding key ID on to a secure location for future use before refreshing or leaving the page. If you fail to store the key before refreshing the page, then you can generate another key.
- You can delete the Model API Keys that have expired or are no longer in use. It can take up to five minutes for the system to take effect.
To test the Model API Key:
1. Navigate to your project and click Models from the left navigation pane.
2. On the Overview page, paste the Model API Key in the Model API Key field that you had generated in the previous step and click Test.
  
  The test results, along with the HTTP response code and the Replica ID are displayed in the Results table.
  If the test fails and you see the following message, then you must get added as a collaborator on the respective project by the admin or the creator of the project:
```
"User APikey not authorized to access model": "Check APIKEY permissions or model authentication permissions"
```

Managing Model API Keys

The admin user can access the list of all the users who are accessing the workspace and can delete the Model API Keys for a user. To manage users and their keys:

Sign in to Cloudera Data Science Workbench as an admin user.
From the left navigation pane, click Admin.

The Site Administration page is displayed.
On the Site Administration page, click on the Users tab.

All the users signed under this workspace are displayed.

The Model API Keys column displays the number of Model API Keys granted to a user.
To delete a Model API Key for a particular user:
1. Select the user for which you want to delete the Model API Key.
  
  A page containing the user’s information is displayed.
2. To delete a key, click Delete under the Action column corresponding to the Key ID.
3. Click Delete all keys to delete all the keys for that user.

Enabling Model Metrics

Metrics are used to track the performance of the models. When you enable model metrics while creating a workspace, the metrics are stored in a scalable metrics store. You can track individual model predictions and analyze metrics using custom code. You can enable the model metrics in CDSW through Cloudera Manager.

Sign in to the Cloudera Manager web UI.
Go to Clusters > CDSW service > Configurations.
Select the Enable Model Metrics Support option and click Save Changes.
Cloudera Manager detects stale configuration and prompts for a restart.
Select the Re-deploy client configuration option and restart the CDSW service.
It can typically take 10-20 minutes for the CDSW service to restart.

Tracking Model Metrics

Tracking model metrics without deploying a model

We recommend that you develop and test model metrics in a workbench session before actually deploying the model. This workflow avoids the need to rebuild and redeploy a model to test every change.

Metrics tracked in this way are stored in a local, in-memory datastore instead of the metrics database, and are forgotten when the session exits. You can access these metrics in the same session using the regular metrics API in the cdsw.py file.

The following example demonstrates how to track metrics locally within a session, and use the read_metrics function to read the metrics in the same session by querying by the time window.

To try this feature in the local development mode, use the following files from the Python template project:

use_model_metrics.py
predict_with_metrics.py

The predict function from the predict_with_metrics.py file shown in the following example is similar to the function with the same name in the predict.py file. It takes input and returns output, and can be deployed as a model. But unlike the function in the predict.py file, the predict function from the predict_with_metrics.py file tracks mathematical metrics. These metrics can include information such as input, output, feature values, convergence metrics, and error estimates. In this simple example, only input and output are tracked. The function is equipped to track metrics by applying the decorator cdsw.model_metrics.

@cdsw.model_metrics
def predict(args):
  # Track the input.
  cdsw.track_metric("input", args)

  # If this model involved features, ie transformations of the
  # raw input, they could be tracked as well.
  # cdsw.track_metric("feature_vars", {"a":1,"b":23})

  petal_length = float(args.get('petal_length'))
  result = model.predict([[petal_length]])

  # Track the output.
  cdsw.track_metric("predict_result", result[0][0])
  return result[0][0]

You can directly call this function in a workbench session, as shown in the following example:

predict(
{"petal_length": 3}
)

You can fetch the metrics from the local, in-memory datastore by using the regular metrics API. To fetch the metrics, set the dev keyword argument to True in the use_model_metrics.py file. You can query the metrics by model, model build, or model deployment using the variables cdsw.dev_model_crn and cdsw.dev_model_build_crn or cdsw.dev_model_deploy_crn respectively. For example:

end_timestamp_ms=int(round(time.time() * 1000))
cdsw.read_metrics(model_deployment_crn=cdsw.dev_model_deployment_crn,
start_timestamp_ms=0,
end_timestamp_ms=end_timestamp_ms,
dev=True)

where CRN denotes Cloudera Resource Name, which is a unique identifier from CDP, analogous to Amazon's ARN.

Tracking metrics for deployed models

When you have finished developing your metrics tracking code and the code that consumes the metrics, simply deploy the predict function from predict_with_metrics.py as a model. No code changes are necessary.

Calls to read_metrics, track_delayed_metrics and track_aggregate_metrics need to be changed to take the CRN of the deployed model, build or deployment. These CRNs can be found in the model’s Overview page.

Calls to call_model will also require the model’s access key (model_access_key in use_model_metrics.py) from the model’s Settings page. If authentication has been enabled for the model (the default), a model API key for the user (model_api_token in use_model_metrics.py) is also required. This can be obtained from the user’s Settings page.

Usage Guidelines

This section calls out some important guidelines you should keep in mind when you start deploying models with Cloudera Data Science Workbench.

Model Code

Models in Cloudera Data Science Workbench are designed to execute any code that is wrapped into a function. This means you can potentially deploy a model that returns the result of a SELECT * query on a very large table. However, Cloudera strongly recommends against using the models feature for such use cases.

As a best practice, your models should be returning simple JSON responses in near-real time speeds (within a fraction of a second). If you have a long-running operation that requires extensive computing and takes more than 15 seconds to complete, consider using batch jobs instead.

Model Artifacts

Once you start building larger models, make sure you are storing these model artifacts in HDFS, S3, or any other external storage. Do not use the project filesystem to store large output artifacts.

In general, any project files larger than 50 MB must be part of your project's .gitignore file so that they are not included in snapshots for future experiments/model builds. Note that in case your models require resources that are stored outside the model itself, it is up to you to ensure that these resources are available and immutable as model replicas may be restarted at any time.

Resource Consumption and Scaling

Models should be treated as any other long-running applications that are continuously consuming memory and computing resources. If you are unsure about your resource requirements when you first deploy the model, start with a single replica, monitor its usage, and scale as needed.

If you notice that your models are getting stuck in various stages of the deployment process, check the monitoring page to make sure that the cluster has sufficient resources to complete the deployment operation.

Security Considerations

As stated previously, models do not impose any limitations on the code they can execute. Additionally, models run with the permissions of the user that creates the model (same as sessions and jobs). Therefore, be conscious of potential data leaks especially when querying underlying data sets to serve predictions.

Cloudera Data Science Workbench models are not public by default. Each model has an access key associated with it. Only users/applications who have this key can make calls to the model. Be careful with who has permission to view this key.

Cloudera Data Science Workbench also prints stderr/stdout logs from models to an output pane in the UI. Make sure you are not writing any sensitive information to these logs.

Deployment Considerations

Cloudera Data Science Workbench does not currently support high availability for models. Additionally, there can only be one active deployment per model at any given time. This means you should plan for model downtime if you want to deploy a new build of the model or re-deploy with more/less replicas.

Keep in mind that models that have been developed and trained using Cloudera Data Science Workbench are essentially Python/R code that can easily be persisted and exported to external environments using popular serialization formats such as Pickle, PMML, ONNX, and so on.

Known Issues and Limitations

Known Issues with Model Builds and Deployed Models
- (If quotas are enabled) Models that are stuck in the Scheduled state due to lack of resources do not automatically start even if you free up existing resources.
  
  Workaround: Stop the Model that is stuck in the Scheduled state. Then manually reschedule that Model.
  
  Cloudera Bug: DSE-6886
- Unable to create a model with the name of a deleted Model.
  
  Workaround: For now, Models shall have unique names across the lifespan of the cluster installation.
  
  Cloudera Bug: DSE-4237
- Re-deploying or re-building models results in model downtime (usually brief).
- Model deployment will fail if your project filesystem is too large for the Git snapshot process. As a general rule, any project files (code, generated model artifacts, dependencies, etc.) larger than 50 MB must be part of your project's .gitignore file so that they are not included in snapshots for model builds.
- Model builds will fail if your project filesystem includes a .git directory (likely hidden or nested). Typical build stage errors include:
```
Error: 2 UNKNOWN: Unable to schedule build: [Unable to create a checkpoint of current source: [Unable to push sources to git server: ...
```
  To work around this, rename the .git directory (for example, NO.git) and re-build the model.
  
  Cloudera Bug: DSE-4657
- JSON requests made to active models should not be more than 5 MB in size. This is because JSON is not suitable for very large requests and has high overhead for binary objects such as images or video. Call the model with a reference to the image or video, such as a URL, instead of the object itself.
- Any external connections, for example, a database connection or a Spark context, must be managed by the model's code. Models that require such connections are responsible for their own setup, teardown, and refresh.
- Model logs and statistics are only preserved so long as the individual replica is active. Cloudera Data Science Workbench may restart a replica at any time it is deemed necessary (such as bad input to the model).

The use_model_metrics.py file which is available within the CDSW Templates misses the code for setting the user_api_key and is not up-to-date. Use the following code instead:

import cdsw
import time
from sklearn import datasets
import numpy as np

# This script demonstrates the usage of several model metrics-
# related functions:
# - call_model: Calls a model deployed on CDSW as an HTTP endpoint.
# - read_metrics: Reads metrics tracked for all model predictions
#   made within a time window. This is useful for  doing analytics 
#   on the tracked metrics.
# - track_delayed_metrics: Adds metrics for a given prediction 
#   retrospectively, after the prediction has already been made.
#   Common examples of such metrics are ground truth and various
#   per-prediction accuracy metrics.
# - track_aggregate_metrics: Adds metrics for a set or batch of
#   predictions within a given time window, not an individual 
#   prediction. Common examples of such metrics are mean or 
#   median accuracy, and various measures of drift.

# This script can be used in a local development mode, or in
# deployment mode. To use it in deployment mode, please: 
# - Set dev = False
# - Create a model deployment from the function 'predict' in
#   predict_with_metrics.py 
# - Obtain the model deployment's CRN from the model's overview
#   page and the model's access key from its settings page and 
#   paste them below.
# - If you selected "Enable Authentication" when creating the
#   model, then create a model API key from your user settings 
#   page and paste it below as well.

dev = True

# Conditionally import the predict function only if we are in
# dev mode
try:
    if dev:
        raise RuntimeError("In dev mode")
except:
    from predict_with_metrics import predict

if dev:
    model_deployment_crn=cdsw.dev_model_deployment_crn # update modelDeploymentCrn
    model_access_key=None
else: 
    # The model deployment CRN can be obtained from the model overview
    # page.
    model_deployment_crn=None 
    if model_deployment_crn is None:
        raise ValueError("Please set a valid model deployment Crn")

    # The model access key can be obtained from the model settings page.
    model_access_key=None
    if model_access_key is None:
        raise ValueError("Please set the model's access key")

    # You can create a models API key from your user settings page.
    # Not required if you did not select "Enable Authentication"
    # when deploying the model. In that case, anyone with the
    # model's access key can call the model.
    user_api_key = None

# First, we use the call_model function to make predictions for 
# the held-out portion of the dataset in order to populate the 
# metrics database.
iris = datasets.load_iris()
test_size = 20

# This is the input data for which we want to make predictions.
# Ground truth is generally not yet known at prediction time.
score_x = iris.data[:test_size, 2].reshape(-1, 1) # Petal length

# Record the current time so we can retrieve the metrics
# tracked for these calls.
start_timestamp_ms=int(round(time.time() * 1000))

uuids = []
predictions = []
for i in range(len(score_x)):
    if model_access_key is not None:
        output = cdsw.call_model(model_access_key, {"petal_length": score_x[i][0]}, api_key=user_api_key)["response"]
    else:
        output = predict({"petal_length": score_x[i][0]})
    # Record the UUID of each prediction for correlation with ground truth.
    uuids.append(output["uuid"])
    predictions.append(output["prediction"])

# Record the current time.
end_timestamp_ms=int(round(time.time() * 1000))

# We can now use the read_metrics function to read the metrics we just
# generated into the current session, by querying by time window.
data = cdsw.read_metrics(model_deployment_crn=model_deployment_crn,
            start_timestamp_ms=start_timestamp_ms,
            end_timestamp_ms=end_timestamp_ms, dev=dev)
data = data['metrics']

# Now, ground truth is known and we want to track the true value
# corresponding to each prediction above.
score_y = iris.data[:test_size, 3].reshape(-1, 1) # Observed petal width

# Track the true values alongside the corresponding predictions using
# track_delayed_metrics. At the same time, calculate the mean absolute
# prediction error.
mean_absolute_error = 0
n = len(score_y)
for i in range(n):
    ground_truth = score_x[i][0]
    cdsw.track_delayed_metrics({"actual_result":ground_truth}, uuids[i], dev=dev)

    absolute_error = np.abs(ground_truth - predictions[i])
    mean_absolute_error += absolute_error / n

# Use the track_aggregate_metrics function to record the mean absolute
# error within the time window where we made the model calls above.
cdsw.track_aggregate_metrics(
    {"mean_absolute_error": mean_absolute_error}, 
    start_timestamp_ms, 
    end_timestamp_ms, 
    model_deployment_crn=model_deployment_crn,
    dev=dev
)

Limitations
- Scala models are not supported.
- Spawning worker threads is not supported with models.
- Models deployed using Cloudera Data Science Workbench are not highly-available.
- Dynamic scaling and auto-scaling are not currently supported. To change the number of replicas in service, you will have to re-deploy the build.

Categories: Cloudera Data Science Workbench | Deploying | All Categories

Debugging

Example - Iris Dataset