Introduction to AWS environments

In Cloudera, an environment is a logical subset of your cloud provider account including a specific virtual private network. You can register as many environments as you require.

The “environment” concept of Cloudera is closely related to the virtual private network in your cloud provider account. Registering an environment provides Cloudera with access to your cloud provider account and identifies the resources in your cloud provider account that Cloudera services can access or provision. A single environment is contained within a single cloud provider region, so all resources deployed by Cloudera are deployed within that region within one specific virtual network. Once you’ve registered an environment in Cloudera, you can start provisioning Cloudera resources such as clusters, which run on the physical infrastructure in an AWS data center.

The following diagram enumerates the components of an environment:

The diagram illustrates all major user-created and Cloudera-created components of an environment:

  • The items in dark blue boxes with orange outlines can either be automatically provisioned by Cloudera on your AWS account, or your can optionally pre-create them and provide them when registering an environment.
  • The items in dark blue boxes must be pre-created by your Cloudera administrator prior to environment registration and then provided when registering an environment.
  • The items in orange boxes are automatically provisioned on AWS by Cloudera as part of environment provisioning.

As shown in the diagram, an environment consists of the following resources:

Environment component Description
Virtual network with subnets An environment corresponds to one specific virtual network and subnets in which Cloudera resources are provisioned.

The network is located within one specified region; therefore, you also need to select a region for each environment.

Security groups Security groups act as a virtual firewall for your instances to control inbound and outbound traffic.

All VM instances provisioned within an environment use your specified security access settings allowing inbound access to your instances from your organization’s computers.

Cross-account role for provisioning credential Cloudera uses a provisioning credential for authorization to provision resources (such as compute instances) within your cloud provider account.

On AWS, the credential uses a cross-account IAM role with an attached IAM policy listing all required permissions.

SSH public key When registering an environment on a public cloud, a Cloudera administrator provides an SSH public key. This way, the administrator has root-level access to the Data Lake instance and Cloudera Data Hub cluster instances.
Cloud storage location and AIM roles and instance profiles allowing access to that location When registering an environment, you must provide one or more S3 buckets for storing:
  • All workload cluster data
  • Ranger audits, and FreeIPA logs
  • Cluster service logs

You must also provide IAM instance profiles that control access to the bucket(s).

Data lake A data lake is automatically provisioned when an environment is created. It provides a mechanism for storing, accessing, organizing, securing, and managing data.
FreeIPA A FreeIPA server is automatically provisioned when an environment is created. It is responsible for synchronizing your users and making them available to Cloudera services, Kerberos service principal management, and more.

You may want to register multiple environments corresponding to different regions that your organization would like to use. Once your environment is running, you can provision Cloudera Data Hub clusters, Cloudera Data Warehouses, and other resources in it.