AWS Account Prerequisites for ML Workspaces
Review the AWS Account Prerequisites for CDP
Verify that the AWS account that you would like to use for CDP has the required resources and that you have the permissions required to manage these resources.
Instructions: AWS Account Requirements
Review the Cloudera Machine Learning-Specific AWS Resource Requirements
Provisioning an ML workspace will require access to the following AWS resources. Make sure your AWS account has access to these resources.
AWS Services used by Cloudera Machine Learning (CML)
- Compute - Amazon Elastic Kubernetes Service (EKS)
- Load Balancing - Amazon ELB Classic Load Balancer
- Key Management - AWS Key Management Service (KMS)
- DNS - Amazon Route 53, hosted by Cloudera
- Persistent Storage - Amazon Elastic Block Store (EBS)
- Project File Storage - Amazon Elastic File System (EFS) for project file storage
VPC Requirements - You can either use an existing VPC or allow CDP to create one for you.
Option 1. Using your own VPC
- Recommended requirements: Divide the address space according to the following
- 3 x /19 private subnets. Each subnet should be created in a separate Availability Zone for the EKS worker nodes.
- 3 x /24 public subnets. These should also be created in three separate Availability Zones, using the same zones as the private subnets.
- Ensure the CIDR block for the subnets is sized appropriately.
- You must enable Amazon DNS with the VPC. Corporate DNS is not supported. For guidelines on how verify your DNS settings, refer to sections 1-3 in AWS environment requirements checklist for the Data Warehouse service.
Private subnets should have routable IPs over your internal VPN. If IPs are not routable, private CML endpoints will need to be accessed via a SOCKS proxy. This is not recommended.
Tag the VPC and the subnets as
sharedso that Kubernetes can find them. For load balancers to be able to choose the subnets correctly, you are also required to tag private subnets with the
kubernetes.io/role/internal-elb:1tag, and public subnets with the
- Recommended requirements: Divide the address space according to the following recommended sizes:
Option 2. CDP creates a new VPC
If you choose to allow CDP to create a new VPC, three subnets will be created automatically. One subnet is created for each availability zone assuming three AZs per region; If a region has two AZs instead of three, then still three subnets are created, two in the same AZ.
You will be asked to specify a valid CIDR in IPv4 range that will be used to define the range of private IPs for EC2 instances provisioned into these subnets.
- Ports RequirementsHTTPS access to ML workspaces is available over port 443 for the following cases:
- internal only - should be accessible from your organization's network, but not the public internet
- internet facing - should be accessible from the public internet as well as your internal organization's network
Review the default AWS service limits and your current AWS account limits
By default, AWS imposes certain default limits for AWS services, per-user account. Make sure you review your account's current usage status and resource limits before you start provisioning additional resources for CDP and CML.
For example, depending on your AWS account, you might only be allowed to provision a certain number of CPU instances, or you might not have default access to GPU instances at all. Make sure to review your AWS service limits before your proceed.Related AWS documentation: AWS Service Limits, Amazon EC2 Resource Limits.
Review supported AWS regions
CDP supports the following AWS regions: Supported AWS regions. However, the CML service requires AWS Elastic Kubernetes Service (EKS). Make sure you select a region that includes EKS.
Related AWS documentation: Region Table (AWS Documentation).
Set up an AWS Cloud Credential
Create a role-based AWS credential that allows CDP to authenticate with your AWS account and has authorization to provision AWS resources on your behalf. Role-based authentication uses an IAM role with an attached IAM policy that has the minimum permissions required to use CDP.
Once you have created this IAM policy, register it in CDP as a cloud credential. Then, reference this credential when you are registering the environment in the next step.
Instructions: CDP Cloud Credential for AWS
Register an AWS Environment
A CDP User with the role of Power User must register an environment for their organization. An environment determines the specific cloud provider region and virtual network in which resources can be provisioned, and includes the credential that should be used to access the cloud provider account.
Instructions: Register an AWS Environment
CML Role Requirements
There are two CDP user roles associated with the CML service: MLAdmin and MLUser. Any CDP user with the EnvironmentAdmin (or higher) access level must assign these roles to users who require access to the Cloudera Machine Learning service within their environment.
Furthermore, if you want to allow users to log in to provisioned workspaces and run workloads on them, this will need to be configured separately.
Instructions: Configuring User Access to ML Workspaces