AMP Project Specification
AMP projects include a project metadata file that provides configuration and setup details. These details may include environment variables and tasks to be run on startup.
YAML File Specification ‒ Version 1.0
The project metadata file is a YAML file. It must be placed in your project's root directory, and must be named .project-metadata.yaml. The specifications for this file are listed below. You can also look at an example for one of the Cloudera AMPs, such as:.project-metadata.yaml.
Fields
Fields for this YAML file are in snake_case
. String fields are generally
constrained by a fixed character size, for example string(64) is constrained to contain at
most 64 characters. Click Show to see the list of fields.
Field Name | Type | Example | Description |
---|---|---|---|
name |
string(200) |
ML Demo |
Required: The name of this project prototype. Prototype names do not need to be unique. |
description |
string(2048) |
This demo shows off some cool applications of ML. |
Required: A description for this project prototype. |
author |
string(64) |
Cloudera Engineer |
Required: The author of this prototype (can be the name of an individual, team, or organization). |
date |
date string |
"2020-08-11" |
The date this project prototype was last modified. It should be in the format: "YYYY-MM-DD" (quotation marks are required). |
specification_version |
string(16) |
0.1 |
Required: The version of the YAML file specification to use. |
prototype_version |
string(16) |
1.0 |
Required: The version of this project prototype. |
shared_memory_limit |
number |
0.0625 |
Additional shared memory in GB available to sessions running in this project. The default is 0.0625 GB (64MB). |
environment_variables |
See below |
Global environment variables for this project prototype. |
|
feature_dependencies | feature_dependencies | See below | A list of feature dependencies of this AMP. A missing dependency in workspace blocks the creation of the AMP. |
engine_images | engine_images | See below | Engine images to be used with the AMP. What's specified here is a recommendation and it does not prevent the user from launching an AMP with non recommended engine images. |
runtimes | runtimes | See below | Runtimes to be used with the AMP. What's specified here is a recommendation and it does not prevent the user from launching an AMP with non recommended runtimes. |
tasks |
See below |
A sequence of tasks, such as running Jobs or deploying Models, to be run after project import. |
Example
name: ML Demo
description: >-
This demo shows off some cool applications of ML.
author: Cloudera Engineer
date: '2020-08-11T17:40:00.839Z'
specification_version: 1.0
environment_variables:
...
tasks:
...
Environment variables object
The YAML file can optionally define any number of global environment variables for the project under the environment field. This field is an object, containing keys representing the names of the environment variables, and values representing details about those environment variables. Click Show to see the list of fields in the Environment variables object.
Field Name |
Type |
Example |
Description |
---|---|---|---|
default |
string |
"3" |
The default value for this environment variable. Users may override this value when importing this project prototype. |
description |
string |
The number of Model replicas, 3 is standard for redundancy. |
A short description explaining this environment variable. |
required |
boolean |
true |
Whether the environment variable is required to have a non-empty value, the
default is |
Example: This example creates four environment variables.
environment_variables:
AWS_ACCESS_KEY:
default: ""
description: "Access Key ID for accessing S3 bucket"
AWS_SECRET_KEY:
default: ""
description: "Secret Access Key for accessing S3 bucket"
required: true
HADOOP_DATA_SOURCE:
default: ""
description: "S3 URL to large data set"
required: false
MODEL_REPLICAS:
default: "3"
description: "Number of model replicas, 3 is standard for redundancy"
required: true
Feature Dependencies
AMPs might depend on some optional features of a workspace. The
feature_dependencies
field accepts a list of such features. Unsatisified
feature dependencies prevent the AMP from being launched in a workspace, and display an
appropriate error message. The supported feature dependencies are as follows:
model_metrics
Runtimes Specification
The runtimes
field accepts a list of runtimes
objects
defined as follows. This Runtimes specification can be added per task or per project.
- editor: the_name_of_the_editor # case-sensitive string required. e.g. Workbench, Jupyter, etc. (how it appears in the UI)
kernel: the_kernel # case-sensitive string required. e.g. Python 3.6, Python 3.8, R 3.6, etc. (how it appears in the UI)
edition: the_edition # case-sensitive string required. e.g. Standard, Nvidia GPU, etc. (how it appears in the UI)
version: the_short_version # case-sensitive string optional. e.g. 2021.03, 2021.05, etc. (how it appears in the UI)
addons: the_list_addons_needed # list of case-sensitive strings optional. e.g Spark 2.4.7 - CDP 7.2.11 - CDE 1.13, etc. (how it appears in the UI)
This example specifies the Runtimes the Workbench version for Python 3.8.
runtimes:
- editor: Workbench
kernel: Python 3.8
edition: Standard
addons: ['Spark 2.4.7 - CDP 7.2.11 - CDE 1.13']
Engine Images Specification
The engine_images
field accepts a list of engine_image
objects defined as follows:
- image_name: the_name_of_the_engine_image # string (required)
tags: # list of strings (optional)
- the_tag_of_engine_image
- ...
This example specifies the official engine image with version 11 or 12:
engine_images:
- image_name: engine
tags:
- 12
- 11
This example specifies the most recent version of the dataviz
engine image
in the workspace:
engine_images:
- image_name: cmldataviz
- image_name: cdswdataviz
Note that when specifying CDV images, both cmldataviz
and
cdswdataviz
must be specified. When tags
are not
specified, the most recent version of the engine image with the matching name is
recommended. The following rule is used to determine the most recent
engine_image
with the matching name:
Official Engine (
) and CDV
(engine
and
cmldataviz
) imagescdswdataviz
Since the officially released engine images follow semantic versioning (where a newer
version is always larger than any older version, when compared with >
), the
most recent engine image is the one with the largest tag. For example,
engine:14
will be recommended over engine:13
and
cmldataviz:6.3.4-b13
is recommended over
cmldataviz:6.2.1-b12
.
Custom engine images
There is no way for Cloudera Data Science Workbench to determine the rules for
customer custom engine image tags, and therefore there is no reliable way to determine the
most recent custom engine image. You should use the engine image that has the correct
matching name and has the newest id
. The newest id
means
that the engine image is the most recently added engine image.
Task list
This defines a list of tasks that can be automatically run on project import. Each task will be run sequentially in the order they are specified in this YAML file. Click Show to see the list of fields.
Field Name |
Type |
Example |
Description |
---|---|---|---|
type |
string |
create_job |
Required: The type of task to be executed. See below for a list of allowed types. |
short_summary |
string |
Creating a Job that will do a task. |
A short summary of what this task is doing. |
long_summary |
string |
Creating a Job that will do this specific task. This is important because it leads up to this next task. |
A long summary of what this task is doing. |
Jobs
Create Job
Example
- type: create_job
name: howdy
entity_label: howdy
script: greeting.py
arguments: Ofek 21
short_summary: Creating a job that will greet you.
environment_variables:
SAMPLE_ENVIRONMENT_VARIABLE: CREATE/RUN_JOB
kernel: python3
Click Show to see the list of fields.
Field Name |
Type |
Example |
Description |
---|---|---|---|
type |
string |
create_job |
Required: Must be |
name |
string |
howdy |
Required: Job name. |
entity_label |
string |
howdy |
Required: Uniquely identifies this job for future tasks, i.e.
|
script |
string |
greeting.py |
Required: Script for this Job to run. |
kernel |
string |
python3 |
Required: What kernel this Job should use. Acceptable values are
|
arguments |
string |
Ofek 21 |
Command line arguments to be given to this Job when running. |
environment_variables |
See above |
See above |
|
cpu |
number |
1.0 |
The amount of CPU virtual cores to allocate for this Job, the default is
|
memory |
number |
1.0 |
The amount of memory in GB to allocate for this Job, the default is
|
gpu |
integer |
0 |
The amount of GPU to allocate for this Job, the default is
|
timeout |
integer |
10 |
The amount of time in minutes to wait before timing out this Job, the default
is |
timeout_kil |
boolean |
false |
Whether or not to stop this Job when it times out, the default is |
Run Job
Example run job task:
- type: run_job
entity_label: howdy
short_summary: Running the job that will greet you.
long_summary: >-
Running the job that will greet you. It will greet you by the name
which is the first and only command line argument.
Most Job run tasks should just contain the type
and
entity_label
fields. Click Show to see the list of fields.
Field Name |
Type |
Example |
Description |
---|---|---|---|
type |
string |
run_job |
Required: Must be |
entity_label |
string |
howdy |
Required: Must match an |
However, they can optionally override previously defined fields. Click Show to see the list of fields.
Field Name |
Type |
Example |
Description |
---|---|---|---|
script |
string |
greeting.py |
Required: Script for this Job to run. |
kernel |
string |
python3 |
Required: What kernel this Job should use. Acceptable values are
|
arguments |
string |
Ofek 21 |
Command line arguments to be given to this Job when running. |
environment_variables |
See above |
See above |
|
cpu |
number |
1.0 |
The amount of CPU virtual cores to allocate for this Job, the default is
|
memory |
number |
1.0 |
The amount of memory in GB to allocate for this Job, the default is
|
gpu |
integer |
0 |
The amount of GPU to allocate for this Job, the default is
|
shared_memory_limit |
number |
0.0625 |
Limits the additional shared memory in GB that can be used by this Job, the default is 0.0625 GB (64MB). |
Models
Note: All models have authentication disabled, so their access key alone is enough to interact with them.
Resources object
Models may define a resources object which overrides the amount of resources to allocate per Model deployment.
Click Show to see the list of fields.
Field Name |
Type |
Example |
Description |
---|---|---|---|
cpu |
number |
1.0 |
The number of CPU virtual cores to allocate per Model deployment. |
memory |
number |
2.0 |
The amount of memory in GB to allocate per Model deployment. |
gpu |
integer |
0 |
The amount of GPU to allocate per Model deployment. |
For example:
resources:
cpu: 1
memory: 2
Replication policy object
Models may define a replication policy object which overrides the default replication policy for Model deployments.
Field Name |
Type |
Example |
Description |
---|---|---|---|
type | string | fixed |
Must be fixed if present. |
num_replicas | integer | 1 |
The number of replicas to create per Model deployment. |
For example:
replication_policy:
type: fixed
num_replicas: 1
Model examples list
Models may include examples, which is a list of objects containing a request and response field, each containing a valid object inside, as shown in the example:
examples:
- request:
name: Ofek
age: 21
response:
greeting: Hello Ofek (21)
- request:
name: Jimothy
age: 43
response:
greeting: Hello Coy (43)
Click Show to see the list of fields.
Field Name |
Type |
Example |
Description |
---|---|---|---|
request |
string |
See above |
Required: An example request object. |
response |
string |
See above |
Required: The response to the above example request object. |
Create Model
Example:
- type: create_model
name: Say hello to me
entity_label: says-hello
description: This model says hello to you
short_summary: Deploying a sample model that you can use to greet you
access_key_environment_variable: SHTM_ACCESS_KEY
default_resources:
cpu: 1
memory: 2
Click Show to see the list of fields.
Field Name |
Type |
Example |
Description |
---|---|---|---|
type |
string |
create_model |
Required: Must be |
name |
string |
Say hello to me |
Required: Model name |
entity_label |
string |
says-hello |
Required: Uniquely identifies this model for future tasks, i.e.
|
access_key_environment_variable |
string |
SHTM_ACCESS_KEY |
Saves the model's access key to an environment variable with the specified name. |
default_resources |
See above |
The default amount of resources to allocate per Model deployment. |
|
default_replication_policy |
See above |
The default replication policy for Model deployments. |
|
description |
string |
This model says hello to you |
Model description. |
visibility |
string |
private |
The default visibility for this Model. |
Build Model
Example
- type: build_model
entity_label: says-hello
comment: Some comment about the model
examples:
- request:
name: Ofek
age: 21
response:
greeting: Hello Ofek (21)
target_file_path: greeting.py
target_function_name: greet_me
kernel: python3
environment_variables:
SAMPLE_ENVIRONMENT_VARIABLE: CREATE/BUILD/DEPLOY_MODEL
Field Name |
Type |
Example |
Description |
---|---|---|---|
type |
string |
build_model |
Required: Must be |
entity_label |
string |
says-hello |
Required: Must match an |
target_file_path |
string |
greeting.py |
Required: Path to file that will be run by Model. |
target_function_name |
string |
greet_me |
Required: Name of function to be called by Model. |
kernel |
string |
python3 |
What kernel this Model should use. Acceptable values are
|
comment |
string |
Some comment about the model |
A comment about the Model. |
examples |
See above |
A list of request/response example objects. |
|
environment_variables |
See above |
See above |
Deploy Model
Example:
- type: deploy_model
entity_label: says-hello
environment_variables:
SAMPLE_ENVIRONMENT_VARIABLE: CREATE/BUILD/DEPLOY_MODEL
Most deploy model tasks should just contain the type and entity_label fields. Click Show to see the list of fields.
Field Name |
Type |
Example |
Description |
---|---|---|---|
type |
string |
deploy_model |
Required: Must be |
entity_label |
string |
says-hello |
Required: Must match an |
However, they can optionally override previously defined fields. Click Show to see the list of fields.
Field Name |
Type |
Example |
Description |
---|---|---|---|
cpu |
number |
1.0 |
The number of CPU virutal cores to allocate for this Model deployment. |
memory |
number |
2.0 |
The amount of memory in GB to allocate for this Model deployment. |
gpu |
integer |
0 |
The amount of GPU to allocate for this Model deployment. |
replication_policy |
See above |
The replication policy for this Model deployment. |
|
environment_variables |
See above |
Overrides environment variables for this Model deployment. |
Applications
Start Application
Example:
- type: start_application
subdomain: greet
script: greeting.py
environment_variables:
SAMPLE_ENVIRONMENT_VARIABLE: START_APPLICATION
kernel: python3
Click Show to see the list of fields.
Field Name |
Type |
Example |
Description |
---|---|---|---|
type |
string |
start_application |
Required: Must be |
subdomain |
string |
greet |
Required: Application subdomain, which must be unique per Application, and must be alphanumeric and hyphen-delimited. Application subdomains are also converted to lowercase. |
kernel |
string |
python3 |
Required: What kernel this Application should use. Acceptable values are
|
entity_label |
string |
greeter |
Uniquely identifies this application for future tasks. Entity labels must be lowercase alphanumeric, and may contain hyphens or underscores. |
script |
string |
greeting.py |
Script for this Application to run. |
name |
string |
Greeter |
Application name, defaults to 'Untitled application'. |
description |
string |
Some description about the Application |
Application description, defaults to 'No description for the app'. |
cpu |
number |
1.0 |
The number of CPU virutal cores to allocate for this Application. |
memory |
number |
1.0 |
The amount of memory in GB to allocate for this Application. |
gpu |
integer |
0 |
The amount of GPU to allocate for this Application. |
shared_memory_limit |
number |
0.0625 |
Limits the additional shared memory in GB that can be used by this application, the default is 0.0625 GB (64MB). |
environment_variables |
See above |
See above |
Experiments
Run Experiment
Example:
- type: run_experiment
script: greeting.py
arguments: Ofek 21
kernel: python3
Click Show to see the list of fields.
Field Name |
Type |
Example |
Description |
---|---|---|---|
type |
string |
run_experiment |
Required: Must be run_experiment . |
script |
string |
greeting.py |
Required: Script for this Experiment to run. |
entity_label |
string |
test-greeter |
Uniquely identifies this experiment for future tasks. Entity labels must be lowercase alphanumeric, and may contain hyphens or underscores. |
arguments |
string |
Ofek 21 |
Command line arguments to be given to this Experiment when running. |
kernel |
string |
python3 |
What kernel this Experiment should use. Acceptable values are
|
comment |
string |
Comment about the experiment |
A comment about the Experiment. |
cpu |
number |
1.0 |
The amount of CPU virtual cores to allocate for this Experiment. |
memory |
number |
1.0 |
The amount of memory in GB to allocate for this Experiment. |
gpu |
number |
0 |
The amount of GPU to allocate for this Experiment. |
Sessions
Run Sessions
Example:
- type: run_session
name: How to be greeted interactively
code: |
import os
os.environ['SAMPLE_ENVIRONMENT_VARIABLE'] = 'SESSION'
!python3 greeting.py Ofek 21
import greeting
greeting.greet_me({'name': 'Ofek', 'age': 21})
kernel: python3
memory: 1
cpu: 1
gpu: 0
Click Show to see the list of fields.
Field Name |
Type |
Example |
Description |
---|---|---|---|
type |
string |
run_session |
Required: Must be |
string |
See above for code, greeting.py for script |
Required: Either the |
|
kernel |
string |
python3 |
Required: What kernel this Session should use. Acceptable values are
|
cpu |
number |
1.0 |
Required: The amount of CPU virtual cores to allocate for this Session. |
memory |
number |
1.0 |
Required: The amount of memory in GB to allocate for this Session. |
entity_label |
string |
greeter |
Uniquely identifies this session for future tasks. Entity labels must be lowercase alphanumeric, and may contain hyphens or underscores. |
name |
string |
How to be greeted interactively |
Session name. |
gpu |
integer |
0 |
The amount of GPU to allocate for this Session. |