To use an AWS environment for Cloudera Data Warehouse (CDW) Public Cloud you must first
activate it.
When you activate an environment, CDP creates an EKS cluster to host Kubernetes-based
resources. The underlying compute, network resources are managed by AWS:
Resource group
Load balancer(s)
Public IP address(es)
Network security group
Disk(s)
In the Cloudera Data Warehouse environment, instances for shared service
components are set up within a Kubernetes (K8s) cluster. The setup begins with three
m5.2xlarge instances running the CDW service, but the K8s cluster is capable of autoscaling,
automatically adding more instances if necessary to handle increased demand. Additionally,
an Amazon Relational Database Service (RDS) (db.r5.large) running PostgreSQL is created to
store user metadata for Hue and Data Visualization services. In total, three shared
db.r5.large nodes are used for this purpose. Always active, shared
services.
To view or configure the public and private subnets that have been specified for your
CDP environment, click Advanced Settings.
Private Subnets: Accept the selected subnets you configured during AWS environment registration for load
balancer and workload pods, or deselect subnets. Cloudera recommends three subnets
for each load balancer to enable high availability (HA).
Enable IP CIDR for Kubernetes cluster: Enter the IP
Classless Inter-Domain Routing (CIDRs) from which the Kubernetes cluster should
accept incoming connections. Connections from other IP ranges are dropped. Obtain
your internal network's IP CIDR ranges of IP addresses that need access to endpoints
on the Kubernetes cluster. For more information, see Restricting access to endpoints in AWS.
Enable IP CIDRs for the load balancer: Enter the IP CIDR(s)
from which the load balancer should accept incoming connections. Connections from
other IP ranges are dropped. Obtain your internal network's IP CIDR ranges of IP
addresses that need access to endpoints that are load balanced. For more
information, see Restricting access to endpoints in AWS.
Use Overlay Network: Overlay Networks for AWS environments can
increase the number of available IP addresses for your deployments of CDW if you
have an existing Virtual Private Cloud (VPC). Use this feature if your VPC subnet
has fewer than 1,024 IP addresses. Cloudera recommends that you do not configure
more than 200 executor nodes for an overlay network to operate.
Attach Managed Policy ARN to Node Role: If you do not want
to provide PutRolePolicy permission in your cross account role, you can attach a managed policy ARN to a node role
to provide the cross account role permissions. You must create a new
NodeInstanceRole manually, and provide the ARN during activation of the environment
from CDW.
Use Reduced Permissions Mode: If you cannot provide the
standard set of IAM permissions required by CDW for environment activation, you can
use reduced permissions mode to activate an
AWS environment with fewer than half of these permissions. To use this feature, a
minimum set of IAM permissions are
required.
Enable CloudWatch Logs: Enable CloudWatch logs if you use
Amazon CloudWatch. In your AWS account, you can then find the logs in
/aws/eks/<cluster name>/cluster. Before enabling CloudWatch, you must add required permissions to your IAM policy
to access CloudWatch logs; otherwise, you cannot activate the environment.