AWS Account Prerequisites for Cloudera AI Workbenches
To successfully provision an Cloudera AI Workbench, there are many prerequisites that you must ensure are met. Carefully go through this section step by step.
-
Review the AWS Account Prerequisites for Cloudera
Verify that the AWS account that you would like to use for Cloudera has the required resources and that you have the permissions required to manage these resources.
Instructions: AWS Account Requirements
-
Review the Cloudera AI-Specific AWS Resource Requirements
Provisioning an Cloudera AI Workbench will require access to the following AWS resources. Make sure your AWS account has access to these resources.
-
AWS Services used by Cloudera AI
- Compute - Amazon Elastic Kubernetes Service (EKS)
- Load Balancing - Amazon Network Load Balancer (NLB)
- Key Management - AWS Key Management Service (KMS)
- DNS - Amazon Route 53, hosted by Cloudera
- Persistent Storage - Amazon Elastic Block Store (EBS)
- Project File Storage - Amazon Elastic File System (EFS) for project file storage
- Command Line Interface - AWS Command Line Interface (CLI).
- Security Token Service - AWS Security Token Service (STS)
-
VPC Requirements - You can either use an existing VPC or allow Cloudera to create one for you.
-
Option 1. Using your own VPC
- Recommended requirements: Divide the address space according to the
following recommended sizes:
- 3 x /19 private subnets. Each subnet should be created in a separate Availability Zone for the EKS worker nodes.
- 3 x /24 public subnets. These should also be created in three separate Availability Zones, using the same zones as the private subnets.
- Ensure the CIDR block for the subnets is sized appropriately.
- You must enable Amazon DNS with the VPC. Corporate DNS is not supported. For guidelines on how verify your DNS settings, refer to sections 1-3 in AWS environment requirements checklist for the Data Warehouse service.
Private subnets should have routable IPs over your internal VPN. If IPs are not routable, private Cloudera AI endpoints will need to be accessed via a SOCKS proxy. Cloudera recommends creating routable IPs by setting up VPN connections between networks, and not using any public load balancers. If a fully-private network configuration is not feasible, use of a SOCKS proxy to access Cloudera AI is possible, but is not recommended.
Tag the VPC and the subnets as
shared
so that Kubernetes can find them. For load balancers to be able to choose the subnets correctly, you are also required to tag private subnets with thekubernetes.io/role/internal-elb:1
tag, and public subnets with thekubernetes.io/role/elb:1
tag.
- Recommended requirements: Divide the address space according to the
following recommended sizes:
-
Option 2. Cloudera creates a new VPC
If you choose to allow Cloudera to create a new VPC, three subnets will be created automatically. One subnet is created for each availability zone assuming three AZs per region; If a region has two AZs instead of three, then still three subnets are created, two in the same AZ.
You will be asked to specify a valid CIDR in IPv4 range that will be used to define the range of private IPs for EC2 instances provisioned into these subnets.
-
Related AWS documentation: Amazon EKS - Cluster VPC Considerations, Creating a VPC for your Amazon EKS Cluster
-
- Ports RequirementsHTTPS access to Cloudera AI Workbenches is available over port 443 for the following cases:
- internal only - should be accessible from your organization's network, but not the public internet
- internet facing - should be accessible from the public internet as well as your internal organization's network
-
Firewall requirements
Installations must comply with firewall requirements set by cloud providers at all times. Ensure that ports required by the provider are not closed. For example, Kubernetes services have requirements documented in Amazon EKS security group considerations.
Also, for information on repositories that must be accessible to set up Cloudera AI Workbenches, see Outbound network access destinations for AWS.
-
Review the default AWS service limits and your current AWS account limits
By default, AWS imposes certain default limits for AWS services, per-user account. Make sure you review your account's current usage status and resource limits before you start provisioning additional resources for Cloudera and Cloudera AI.
For example, depending on your AWS account, you might only be allowed to provision a certain number of CPU instances, or you might not have default access to GPU instances at all. Make sure to review your AWS service limits before your proceed.
Related AWS documentation: AWS Service Limits, Amazon EC2 Resource Limits.-
Review supported AWS regions
Cloudera supports the following AWS regions: Supported AWS regions. However, the Cloudera AI service requires AWS Elastic Kubernetes Service (EKS). Make sure you select a region that includes EKS.
Related AWS documentation: Region Table (AWS Documentation).
-
Set up an AWS Cloud Credential
Create a role-based AWS credential that allows Cloudera to authenticate with your AWS account and has authorization to provision AWS resources on your behalf. Role-based authentication uses an IAM role with an attached IAM policy that has the minimum permissions required to use Cloudera.
Once you have created this IAM policy, register it in Cloudera as a cloud credential. Then, reference this credential when you are registering the environment in the next step.
Instructions: Introduction to the role-based provisioning credential for AWS
-
Register an AWS Environment
A Cloudera User with the role of Power User must register an environment for their organization. An environment determines the specific cloud provider region and virtual network in which resources can be provisioned, and includes the credential that should be used to access the cloud provider account.
Instructions: Register an AWS Environment
-
Ensure private subnets have outbound internet connectivity
Also, ensure that your private subnets have outbound internet connectivity. Check the route tables of private subnets to verify the internet routing. Worker nodes must be able to download Docker images for Kubernetes, billing and metering information, and to perform API server registration.
-
Ensure the Amazon Security Token Service (STS) is activated
To successfully activate an environment in the Data Warehouse service, you must ensure the Amazon STS is activated in your AWS VPC:- In the AWS Management Console home page, select IAM under Security, Identity, & Compliance.
- In the Identity and Access Management (IAM) dashboard, select Account settings in the left navigation menu.
- On the Account settings page, scroll down to the section for Security Token Service (STS).
- In the Endpoints section, locate the region in which your environment is located and make sure that the STS service is activated.
-
Cloudera AI Role Requirements
There are two Cloudera user roles associated with the Cloudera AI service: MLAdmin and MLUser. Any Cloudera user with the EnvironmentAdmin (or higher) access level must assign these roles to users who require access to the Cloudera AI service within their environment.
Furthermore, if you want to allow users to log in to provisioned Cloudera AI Workbenches and run workloads on them, this will need to be configured separately.
Instructions: Configuring User Access to Cloudera AI Workbenches