Provisioning Cloudera Machine Learning Workspaces
This topic describes how to provision Cloudera Machine Learning Workspaces.
-
Log in to the Cloudera Data Platform web interface.
On Public Cloud, log in to https://console.cdp.cloudera.com using your corporate credentials or any other credentials that you received from your Cloudera Data Platform administrator.
- Click Cloudera Machine Learning Workspaces.
- Click Provision Workspace.
-
Fill out the following fields.
- Workspace Name - Give the Cloudera Machine Learning Workspace a name. For example, user1_dev. Do not use capital letters in the workspace name.
- Select Environment - From the dropdown, select the environment where the Cloudera Machine Learning Workspaces must be provisioned. If you do not have any environments available to you in the dropdown, contact your Cloudera Data Platform administrator to gain access.
- Existing NFS - (Azure only) Enter the mount path from the environment creation procedure.
- NFS Protocol version - (Azure only) Specify the protocol version to use when communicating with the existing NFS server.
-
Switch the toggle to display Advanced Settings.
-
CPU Settings - From the dropdown, select the following:
- Instance Type: You must select an instance type that is supported by Cloudera Machine Learning, or the associated validation check will fail (See Other Settings, below).
- Autoscale Range
- Root Volume Size: If necessary, you can also change the default size of the root volume disk for the nodes in the group.
-
GPU Settings - Click the GPU Instances toggle to
enable GPUs for the cluster, and set the following:
- Instance Type: You must select an instance type that is supported by Cloudera Machine Learning, or the associated validation check will fail (See Other Settings, below).
- Autoscale Range
- Root Volume Size: If necessary, you can also change the default size of the root volume disk for the nodes in the group.
- Kubernetes Config - Upload or directly enter the Kubernetes config information.
-
Network Settings
- Subnets for Worker Nodes: (AWS only) Optionally select one or more subnets to use for Kubernetes worker nodes.
- Subnets for Load Balancer: Optionally select one or more subnets to use for the Load Balancer.
- Load Balancer Source Ranges: (Azure only) Enter a CIDR range of IP
addresses allowed to access the cluster.
- If the Cloudera Machine Learning Workspace is provisioned with public access, enter the allowed public IP address range.
- If the Cloudera Machine Learning Workspace is provisioned with private access, enter the allowed private IP address range.
- Enable Fully Private Cluster: This Preview Feature provides a simple way to create a secure cluster. Only available in AWS environments in Cloudera Data Platform.
- Enable Public IP Address for Load Balancer
(AWS only) You can create a load balancer with a public IP address for the private cluster. This is useful in cases where there is no VPN between the Cloudera Machine Learning VPC and the customer network. In this case, the connection is over the internet.
- Restrict access to Kubernetes API server to authorized IP ranges
You can specify a range of IP addresses in CIDR format that are allowed to access the Kubernetes API server. By default, the Kubernetes API services of Cloudera Machine Learning Workspaces are accessible to all public IP addresses (0.0.0.0/0) that have proper credentials.
To specify an address to authorize, enter an address in CIDR format (for example, 1.0.0.0/0) in API Server Authorized IP Ranges, and click the plus (+) icon. In this case, the API server is accessible by the user-provided address as well as control-plane-exit-ips over the public internet.
If the feature is enabled and no IP authorized addresses are specified, then the Kubernetes API server is only accessible by control-plane-exit-ips from the public internet.
-
Use hostname for a non-transparent proxy
Enter a CIDR range allowed for non-transparent proxy server access to the cluster.
-
Production Cloudera Machine Learning
- Enable Governance - Must be enabled to capture and view information about your Cloudera Machine Learning projects, models, and builds from Apache Atlas for a given environment. If you do not select this option, then integration with Atlas will not work.
- Enable Model Metrics - When enabled, stores metrics in a scalable metrics store, enables you to track individual model predictions, and also track and analyze metrics using custom code.
-
Other Settings
- Enable TLS - Select this checkbox if you want the workspace to use HTTPS for web communication.
- Enable public Internet access - When enabled, the Cloudera Machine Learning Workspace will be available on the public Internet. When disabled, it is assumed that connectivity is achieved through a corporate VPC.
- Enable Monitoring - Administrators (users with the MLAdmin role) can use a Grafana dashboard to monitor resource usage in the provisioned workspace.
- Skip Validation - If selected, validation checks are not performed before a workspace is provisioned. Select this only if validation checks are failing incorrectly.
- Tags - Tags added to cloud infrastructure, compute, and storage resources
associated with this Cloudera Machine Learning Workspace.
Note that these tags are propagated to your cloud service provider account. See Related information for links to AWS and Azure tagging strategies.
- Cloudera Machine Learning Static Subdomain - This is a custom name for the workspace endpoint, and it is also used for the URLs of models, applications, and experiments. You can create or restore a workspace to this same endpoint name, so that external references to the workspace do not have to be changed. Only one workspace with the specific subdomain endpoint name can be running at a time.
-
CPU Settings - From the dropdown, select the following:
- Click Provision Workspace.
Note that the domain name for the provisioned workspace is randomly generated and cannot be changed.