Activating an AWS environment from Cloudera Data Warehouse
To use an AWS environment for Cloudera Data WarehousePublic Cloud you must first activate it.
When you activate an environment, Cloudera creates an EKS
cluster to host Kubernetes-based resources. The underlying compute, network resources are
managed by AWS:
Resource group
Load balancer(s)
Public IP address(es)
Network security group
Disk(s)
In the Cloudera Data Warehouse environment, instances
for shared service components are set up within a Kubernetes (K8s) cluster. The setup begins
with three m5.2xlarge instances running the Cloudera Data Warehouse service,
but the K8s cluster is capable of autoscaling, automatically adding more instances if
necessary to handle increased demand. Additionally, an Amazon Relational Database Service
(RDS) (db.r5.large) running PostgreSQL is created to store user metadata for Hue and Data
Visualization services. In total, three shared db.r5.large nodes are used for this purpose.
Always active, shared
services.
To view or configure the public and private subnets that have been specified for your
Cloudera environment, click Advanced
Settings.
Private Subnets: Accept the selected subnets you configured
during AWS environment registration for load
balancer and workload pods, or deselect subnets. Cloudera recommends three subnets for each
load balancer to enable high availability (HA).
Enable IP CIDR for Kubernetes cluster: Enter the IP
Classless Inter-Domain Routing (CIDRs) from which the Kubernetes cluster should
accept incoming connections. Connections from other IP ranges are dropped. Obtain
your internal network's IP CIDR ranges of IP addresses that need access to endpoints
on the Kubernetes cluster. For more information, see Restricting access to endpoints in AWS.
Enable IP CIDRs for the load balancer: Enter the IP CIDR(s)
from which the load balancer should accept incoming connections. Connections from
other IP ranges are dropped. Obtain your internal network's IP CIDR ranges of IP
addresses that need access to endpoints that are load balanced. For more
information, see Restricting access to endpoints in AWS.
Use Overlay Network: Overlay Networks for AWS environments can
increase the number of available IP addresses for your deployments of Cloudera Data Warehouse if you have an existing Virtual Private Cloud
(VPC). Use this feature if your Virtual Private Cloud subnet has fewer than 1,024 IP
addresses. Cloudera recommends that you do
not configure more than 200 executor nodes for an overlay network to operate.
Attach Managed Policy ARN to Node Role: If you do not want
to provide PutRolePolicy permission in your cross account role, you can attach a managed policy ARN to a node role
to provide the cross account role permissions. You must create a new
NodeInstanceRole manually, and provide the ARN during activation of the environment
from Cloudera Data Warehouse.
Use Reduced Permissions Mode: If you cannot provide the
standard set of IAM permissions required by Cloudera Data Warehouse for
environment activation, you can use reduced permissions mode to activate an
AWS environment with fewer than half of these permissions. To use this feature, a
minimum set of IAM permissions are
required.
Enable CloudWatch Logs: Enable CloudWatch logs if you use
Amazon CloudWatch. In your AWS account, you can then find the logs in
/aws/eks/<cluster name>/cluster. Before enabling CloudWatch, you must add required permissions to your IAM policy
to access CloudWatch logs; otherwise, you cannot activate the environment.