Overview of GCP resources used by CDP
The following Google Cloud resources are used by CDP and CDP services.
GCP resources created for a CDP environment
When a CDP environment is created, a FreeIPA cluster and a Data Lake cluster are created.
The following Google Cloud resources are created for FreeIPA (one per environment):
Resource | Description |
---|---|
Service account for credential | To allow CDP to access and provision resources in your Google Cloud project, you must create a service account in your Google Cloud project, assign required roles, and generate a JSON access key that can later be provided to CDP. |
VPC network and subnets | During environment creation you provide your own existing VPC network and subnets. All compute resources that CDP provisions for the environment and CDP services are provisioned into the VPC network specified during environment creation. |
Firewall rules | Firewall rules define inbound and outbound access to the instances. If during environment creation you choose to have new firewall rules created, then they are created on your GCP account. Alternatively, you can provide your own existing firewall rules. |
VM instances | During environment creation, two or three e2-standard-2 VM
instances are provisioned for the FreeIPA HA server. The number of VMs depends on the
selected Data Lake type. |
OS disk | An OS disk is provisioned for the FreeIPA VM. |
Attached disk | An attached disk (pd-standard) is provisioned for each VM. |
Public IP address (if required) | If you choose to use public IPs, your VM is assigned a public IP address. |
GCS bucket for storing operating system images | By default, CDP creates a storage bucket that is used solely for storing operating
system images. If required, you can optionally pre-create this account and copy the required images. |
In addition, the following resources are created for each Data Lake (one per environment):
Resource | Description |
---|---|
VM instances | VM instances are provisioned for the Data Lake nodes.
|
Attached disk | An attached disk (pd-standard ) is provisioned for each
VM. |
OS disk | An OS disk is provisioned for each VM. |
PostgreSQL database | A custom PostgreSQL database instance (100GB SSD, 2vCPU, 13 GB RAM) is provisioned for the Data Lake. This databse instance is used for Cloudera Manager, Ranger, and Hive MetaStore. |
Firewall rules | Firewall rules define inbound and outbound access to VM instances. If during environment creation you choose to have new firewall rules created, then they are created on your GCP project. |
Google storage buckets | The existing Google Storage bucket that you provide during environment creation for the Data Lake is used for Data Lake log storage and workload data storage. |
Service accounts | Prior to registering your environment in CDP, during Google storage setup, you should create service accounts and assign roles to them as instructed in CDP documentation. |
Public IP address (if required) | If you choose to use public IPs, your VM is assigned a public IP address. |
GCP resources created for Data Hub
The following Google Cloud resources are created for each Data Hub cluster:
Resource | Description |
---|---|
VM instances and attached storage | A VM is created for each cluster node. The VM type varies depending on what you selected during Data Hub cluster creation. For a list of supported VM types, refer to Cloudera Data Platform (CDP) Public Cloud service rates. |
Firewall rules | Firewall rules define inbound and outbound access to VM instances. If during environment creation you choose to have new firewall rules created, then they are created on your GCP project. |
OS disk | An OS disk is provisioned for each VM. |
Attached disk | An attached disk (pd-standard) is provisioned for each VM, as specified during Data Hub cluster creation. The disk size is selected during cluster creation. |
Public IP address (if required) | If you choose to use public IPs, each of the VMs is assigned a public IP address. |