Activating an AWS environment from CDW

To use an AWS environment for Cloudera Data Warehouse (CDW) Public Cloud you must first activate it.

When you activate an environment, CDP creates an EKS cluster to host Kubernetes-based resources. The underlying compute, network resources are managed by AWS:
  • Resource group
  • Compute instances, which are virtual machine scale sets
  • Load balancer(s)
  • Public IP address(es)
  • Network security group
  • Disk(s)
CDW supports the EC2 instances as cluster nodes. CDP supports the following AWS compute instance types (Hive and Impala executors), which you select during environment activation:
Table 1. Compute Instance Types
Instance type Processor Usage
r6id.4xlarge Intel Compute
r5d.4xlarge Intel Compute (default)
r5ad.4xlarge AMD Compute
r5dn.4xlarge Intel Compute
m5.2xlarge Intel Shared services

Instances are added to the cluster as needed for shared services (always on components). Initially, three shared m5.2xlarge instances run the CDW service in your environment. Additionally, CDW will activate 1 db.r5.large Amazon RDS instance using Postgres to manage Hue and Data Visualization user metadata. Three shared db.r5.large nodes use Postgres for the Amazon Relational Database Service (RDS). These shared nodes are used for Hue and Data Visualization user metadata. For more information, see Always active, shared services.

  1. In the CDW service, in Environments, locate the environment that you want to activate.
  2. Click Activate.
  3. In Activate Environment, select the Compute Instance type and Additional Compute Instance Types based on the following rules:
    Instance Pairing Rules
    • Select r5d.4xlarge, r5ad.4xlarge, or r5dn.4xlarge as primary Compute Instance Types or secondary Additional Compute Instance Types.
    • Do not mix r6id.4xlarge with any other types.

      For example, selecting r5d.4xlarge in Compute Instance Types and r5ad.4xlarge,r5dn.4xlarge in Additional Compute Instance Types is allowed. Selecting r5d.4xlarge in Compute Instance Types and r6id.4xlarge in Additional Compute Instance Types is not allowed.

  4. In Deployment Mode, select load balancers.
    For more information, see Load balancers for AWS environments.
    To view or configure the public and private subnets that have been specified for your CDP environment, click Advanced Settings.
    • Private Subnets: Accept the selected subnets you configured during AWS environment registration for load balancer and workload pods, or deselect subnets. Cloudera recommends three subnets for each load balancer to enable high availability (HA).
    • Enable IP CIDR for Kubernetes cluster: Enter the IP Classless Inter-Domain Routing (CIDRs) from which the Kubernetes cluster should accept incoming connections. Connections from other IP ranges are dropped. Obtain your internal network's IP CIDR ranges of IP addresses that need access to endpoints on the Kubernetes cluster. For more information, see Restricting access to endpoints in AWS.
    • Enable IP CIDRs for the load balancer: Enter the IP CIDR(s) from which the load balancer should accept incoming connections. Connections from other IP ranges are dropped. Obtain your internal network's IP CIDR ranges of IP addresses that need access to endpoints that are load balanced. For more information, see Restricting access to endpoints in AWS.
    • Use Overlay Network: Overlay Networks for AWS environments can increase the number of available IP addresses for your deployments of CDW if you have an existing Virtual Private Cloud (VPC). Use this feature if your VPC subnet has fewer than 1,024 IP addresses. Cloudera recommends that you do not configure more than 200 executor nodes for an overlay network to operate.
    • Attach Managed Policy ARN to Node Role: If you do not want to provide PutRolePolicy permission in your cross account role, you can attach a managed policy ARN to a node role to provide the cross account role permissions. You must create a new NodeInstanceRole manually, and provide the ARN during activation of the environment from CDW.
    • Use Reduced Permissions Mode: If you cannot provide the standard set of IAM permissions required by CDW for environment activation, you can use reduced permissions mode to activate an AWS environment with fewer than half of these permissions. To use this feature, a minimum set of IAM permissions are required.
    • Enable CloudWatch Logs: Enable CloudWatch logs if you use Amazon CloudWatch. In your AWS account, you can then find the logs in /aws/eks/<cluster name>/cluster. Before enabling CloudWatch, you must add required permissions to your IAM policy to access CloudWatch logs; otherwise, you cannot activate the environment.
  5. Click Activate.