Activating an AWS environment from CDW

To use an AWS environment for Cloudera Data Warehouse (CDW) Public Cloud you must first activate it.

When you activate an environment, CDP creates an EKS cluster to host Kubernetes-based resources. The underlying compute, network resources are managed by AWS:
  • Resource group
  • Load balancer(s)
  • Public IP address(es)
  • Network security group
  • Disk(s)

In the Cloudera Data Warehouse environment, instances for shared service components are set up within a Kubernetes (K8s) cluster. The setup begins with three m5.2xlarge instances running the CDW service, but the K8s cluster is capable of autoscaling, automatically adding more instances if necessary to handle increased demand. Additionally, an Amazon Relational Database Service (RDS) (db.r5.large) running PostgreSQL is created to store user metadata for Hue and Data Visualization services. In total, three shared db.r5.large nodes are used for this purpose. Always active, shared services.

  1. In the CDW service, in Environments, locate the environment that you want to activate.
  2. Click Activate.
  3. In Deployment Mode, select load balancers.
    For more information, see Load balancers for AWS environments.
    To view or configure the public and private subnets that have been specified for your CDP environment, click Advanced Settings.
    • Private Subnets: Accept the selected subnets you configured during AWS environment registration for load balancer and workload pods, or deselect subnets. Cloudera recommends three subnets for each load balancer to enable high availability (HA).
    • Enable IP CIDR for Kubernetes cluster: Enter the IP Classless Inter-Domain Routing (CIDRs) from which the Kubernetes cluster should accept incoming connections. Connections from other IP ranges are dropped. Obtain your internal network's IP CIDR ranges of IP addresses that need access to endpoints on the Kubernetes cluster. For more information, see Restricting access to endpoints in AWS.
    • Enable IP CIDRs for the load balancer: Enter the IP CIDR(s) from which the load balancer should accept incoming connections. Connections from other IP ranges are dropped. Obtain your internal network's IP CIDR ranges of IP addresses that need access to endpoints that are load balanced. For more information, see Restricting access to endpoints in AWS.
    • Use Overlay Network: Overlay Networks for AWS environments can increase the number of available IP addresses for your deployments of CDW if you have an existing Virtual Private Cloud (VPC). Use this feature if your VPC subnet has fewer than 1,024 IP addresses. Cloudera recommends that you do not configure more than 200 executor nodes for an overlay network to operate.
    • Attach Managed Policy ARN to Node Role: If you do not want to provide PutRolePolicy permission in your cross account role, you can attach a managed policy ARN to a node role to provide the cross account role permissions. You must create a new NodeInstanceRole manually, and provide the ARN during activation of the environment from CDW.
    • Use Reduced Permissions Mode: If you cannot provide the standard set of IAM permissions required by CDW for environment activation, you can use reduced permissions mode to activate an AWS environment with fewer than half of these permissions. To use this feature, a minimum set of IAM permissions are required.
    • Enable CloudWatch Logs: Enable CloudWatch logs if you use Amazon CloudWatch. In your AWS account, you can then find the logs in /aws/eks/<cluster name>/cluster. Before enabling CloudWatch, you must add required permissions to your IAM policy to access CloudWatch logs; otherwise, you cannot activate the environment.
  4. Click Activate.