Activating an AWS environment from CDW
To use an AWS environment for Cloudera Data Warehouse (CDW) Public Cloud you must first activate it.
- Resource group
- Compute instances, which are virtual machine scale sets
- Load balancer(s)
- Public IP address(es)
- Network security group
- Disk(s)
Instance type | Processor | Usage | Virtual Warehouse Support |
---|---|---|---|
r7gd.4xlarge | ARM | Compute | Impala |
r6gd.4xlarge | ARM | Compute | Impala |
r6id.4xlarge | Intel | Compute | Hive and Impala |
r5d.4xlarge | Intel | Compute (default) | Hive and Impala |
r5ad.4xlarge | AMD | Compute | Hive and Impala |
r5dn.4xlarge | Intel | Compute | Hive and Impala |
m5.2xlarge | Intel | Shared services | Hive and Impala |
In the Cloudera Data Warehouse environment, instances for shared service components are set up within a Kubernetes (K8s) cluster. The setup begins with three m5.2xlarge instances running the CDW service, but the K8s cluster is capable of autoscaling, automatically adding more instances if necessary to handle increased demand. Additionally, an Amazon Relational Database Service (RDS) (db.r5.large) running PostgreSQL is created to store user metadata for Hue and Data Visualization services. In total, three shared db.r5.large nodes are used for this purpose. Always active, shared services.
- Obtain the DWAdmin role.
- Review the AWS environment requirements.