Activating an AWS environment from CDW

To use an AWS environment for Cloudera Data Warehouse (CDW) Public Cloud you must first activate it.

When you activate an environment, CDP creates an EKS cluster to host Kubernetes-based resources. The underlying compute, network resources are managed by AWS:
  • Resource group
  • Compute instances, which are virtual machine scale sets
  • Load balancer(s)
  • Public IP address(es)
  • Network security group
  • Disk(s)
CDW supports the EC2 instances as cluster nodes. CDP supports the following AWS compute instance types (Hive and Impala executors), which you select during environment activation:
Table 1. Compute Instance Types
Instance type Processor Usage Virtual Warehouse Support
r7gd.4xlarge ARM Compute Impala
r6gd.4xlarge ARM Compute Impala
r6id.4xlarge Intel Compute Hive and Impala
r5d.4xlarge Intel Compute (default) Hive and Impala
r5ad.4xlarge AMD Compute Hive and Impala
r5dn.4xlarge Intel Compute Hive and Impala
m5.2xlarge Intel Shared services Hive and Impala

In the Cloudera Data Warehouse environment, instances for shared service components are set up within a Kubernetes (K8s) cluster. The setup begins with three m5.2xlarge instances running the CDW service, but the K8s cluster is capable of autoscaling, automatically adding more instances if necessary to handle increased demand. Additionally, an Amazon Relational Database Service (RDS) (db.r5.large) running PostgreSQL is created to store user metadata for Hue and Data Visualization services. In total, three shared db.r5.large nodes are used for this purpose. Always active, shared services.

  1. In the CDW service, in Environments, locate the environment that you want to activate.
  2. Click Activate.
  3. In Activate Environment, select the Compute Instance type and Additional Compute Instance Types based on the following rules:
    Instance Pairing Rules
    • Select r5d.4xlarge, r5ad.4xlarge, or r5dn.4xlarge as primary Compute Instance Types or secondary Additional Compute Instance Types.
    • Do not mix r6id.4xlarge with any other types.

      For example, selecting r5d.4xlarge in Compute Instance Types and r5ad.4xlarge,r5dn.4xlarge in Additional Compute Instance Types is allowed. Selecting r5d.4xlarge in Compute Instance Types and r6id.4xlarge in Additional Compute Instance Types is not allowed.

  4. In Deployment Mode, select load balancers.
    For more information, see Load balancers for AWS environments.
    To view or configure the public and private subnets that have been specified for your CDP environment, click Advanced Settings.
    • Private Subnets: Accept the selected subnets you configured during AWS environment registration for load balancer and workload pods, or deselect subnets. Cloudera recommends three subnets for each load balancer to enable high availability (HA).
    • Enable IP CIDR for Kubernetes cluster: Enter the IP Classless Inter-Domain Routing (CIDRs) from which the Kubernetes cluster should accept incoming connections. Connections from other IP ranges are dropped. Obtain your internal network's IP CIDR ranges of IP addresses that need access to endpoints on the Kubernetes cluster. For more information, see Restricting access to endpoints in AWS.
    • Enable IP CIDRs for the load balancer: Enter the IP CIDR(s) from which the load balancer should accept incoming connections. Connections from other IP ranges are dropped. Obtain your internal network's IP CIDR ranges of IP addresses that need access to endpoints that are load balanced. For more information, see Restricting access to endpoints in AWS.
    • Use Overlay Network: Overlay Networks for AWS environments can increase the number of available IP addresses for your deployments of CDW if you have an existing Virtual Private Cloud (VPC). Use this feature if your VPC subnet has fewer than 1,024 IP addresses. Cloudera recommends that you do not configure more than 200 executor nodes for an overlay network to operate.
    • Attach Managed Policy ARN to Node Role: If you do not want to provide PutRolePolicy permission in your cross account role, you can attach a managed policy ARN to a node role to provide the cross account role permissions. You must create a new NodeInstanceRole manually, and provide the ARN during activation of the environment from CDW.
    • Use Reduced Permissions Mode: If you cannot provide the standard set of IAM permissions required by CDW for environment activation, you can use reduced permissions mode to activate an AWS environment with fewer than half of these permissions. To use this feature, a minimum set of IAM permissions are required.
    • Enable CloudWatch Logs: Enable CloudWatch logs if you use Amazon CloudWatch. In your AWS account, you can then find the logs in /aws/eks/<cluster name>/cluster. Before enabling CloudWatch, you must add required permissions to your IAM policy to access CloudWatch logs; otherwise, you cannot activate the environment.
  5. Click Activate.