AWS Account Prerequisites for ML Workspaces

  1. Review the AWS Account Prerequisites for CDP

    Verify that the AWS account that you would like to use for CDP has the required resources and that you have the permissions required to manage these resources.

    Instructions: AWS Account Requirements

  2. Review the Cloudera Machine Learning-Specific AWS Resource Requirements

    Provisioning an ML workspace will require access to the following AWS resources. Make sure your AWS account has access to these resources.

    • AWS Services used by Cloudera Machine Learning (CML)
      1. Compute - Amazon Elastic Kubernetes Service (EKS)
      2. Load Balancing - Amazon ELB Classic Load Balancer
      3. Key Management - AWS Key Management Service (KMS)
      4. DNS - Amazon Route 53, hosted by Cloudera
      5. Persistent Storage - Amazon Elastic Block Store (EBS)
      6. Project File Storage - Amazon Elastic File System (EFS) for project file storage
    • Networking & Security Requirements
      • VPC Requirements

        The VPC must have at least three public subnets, each subnet in a different availability zone. The recommendation is to use one subnet per availability zone. In case you choose to have Internet accessible endpoints, at least 2 public subnets are needed.

        If you are using your own existing VPC, you must tag the VPC and the subnets as shared so that Kubernetes can find them. For load balancers to be able to choose the subnets correctly, you are also required to tag private subnets with the kubernetes.io/role/internal-elb:1 tag, and public subnets with the kubernetes.io/role/elb:1 tag.

        If you have CDP create the VPC and subnets for you, they will be automatically tagged as needed.

        Related AWS documentation: Amazon EKS - Cluster VPC Considerations

      • Ports
        HTTPS access to ML workspaces is available over port 443 for the following cases:
        • internal only - should be accessible from your organization's network, but not the public internet
        • internet facing - should be accessible from the public internet as well as your internal organization's network
        This is in addition to the ports requirements noted here for CDP's default security group: Management Console - Security groups.
  3. Review the default AWS service limits and your current AWS account limits

    By default, AWS imposes certain default limits for AWS services, per-user account. Make sure you review your account's current usage status and resource limits before you start provisioning additional resources for CDP and CML.

    For example, depending on your AWS account, you might only be allowed to provision a certain number of CPU instances, or you might not have default access to GPU instances at all. Make sure to review your AWS service limits before your proceed.

    Related AWS documentation: AWS Service Limits, Amazon EC2 Resource Limits.
  4. Review supported AWS regions

    CDP supports the following AWS regions: Supported AWS regions. However, the CML service requires AWS Elastic Kubernetes Service (EKS). Make sure you select a region that includes EKS.

    Related AWS documentation: Region Table (AWS Documentation).

  5. Set up an AWS Cloud Credential

    Create a role-based AWS credential that allows CDP to authenticate with your AWS account and has authorization to provision AWS resources on your behalf. Role-based authentication uses an IAM role with an attached IAM policy that has the minimum permissions required to use CDP.

    Once you have created this IAM policy, register it in CDP as a cloud credential. Then, reference this credential when you are registering the environment in the next step.

    Instructions: CDP Cloud Credential for AWS

  6. Register an AWS Environment

    A CDP User with the role of Power User must register an environment for their organization. An environment determines the specific cloud provider region and virtual network in which resources can be provisioned, and includes the credential that should be used to access the cloud provider account.

    Instructions: Register an AWS Environment

  7. CML Role Requirements

    There are two CDP user roles associated with the CML service: MLAdmin and MLUser. Any CDP user with the EnvironmentAdmin (or higher) access level must assign these roles to users who require access to the Cloudera Machine Learning service within their environment.

    Furthermore, if you want to allow users to log in to provisioned workspaces and run workloads on them, this will need to be configured separately.

    Instructions: Configuring User Access to ML Workspaces