Introduction to Google Cloud environments

In CDP, an environment is a logical subset of your cloud provider account including a specific virtual private network. You can register as many environments as you require.

Registering your GCP environment in CDP provides CDP with limited access to your GCP account and identifies a set of resources in your GCP account that CDP services can access. Once you’ve registered your GCP environment in CDP, you can start provisioning CDP resources such as clusters, which run on the physical infrastructure in an GCP data center.

The following diagram enumerates the components of a GCP environment:

The diagram illustrates all major user-created and CDP-created components of an environment:

  • The items in dark blue boxes must be pre-created by your CDP administrator prior to environment registration and then specified when registering an environment.
  • The items in orange boxes are automatically provisioned on GCP by CDP as part of environment provisioning.

As shown in the diagram, an environment consists of the following resources:

Environment component Description
Virtual network with subnets An environment corresponds to one specific VPC network and one or more subnets in which CDP resources are provisioned.
Firewall rules Firewall rules (similar to security groups on other cloud providers) act as a virtual firewall for your instances to control inbound and outbound traffic.

All VM instances provisioned within an environment use your specified security access settings allowing inbound access to your instances from your organization’s computers.

Service account for provisioning credential CDP uses a provisioning credential for authorization to provision resources (such as compute instances) within your cloud provider account.

In GCP, credential creation involves creating a service account, assigning a set of minimum IAM permissions to it, and providing CDP with the access key generated for the service account.

SSH public key When registering an environment on a public cloud, a CDP administrator provides an SSH public key. This way, the administrator has root-level access to the Data Lake instance and Data Hub cluster instances.
Google storage location and a service account controlling access to it When registering an environment, you must provide a Google storage location for storing:
  • All workload cluster data
  • Cluster service logs and Ranger audits

Furthermore, you must create and assign a service account on the scope of this storage location so that CDP can access it.

Data Lake cluster A data lake is automatically provisioned when an environment is created. It provides a mechanism for storing, accessing, organizing, securing, and managing data.
FreeIPA server A FreeIPA server is automatically provisioned when an environment is created. It is responsible for synchronizing your users and making them available to CDP services, Kerberos service principal management, and more.

Once your environment is running, you can provision Data Hub clusters in it.