Overview of GCP resources used by Cloudera

The following Google Cloud resources are used by Cloudera and Cloudera services.

GCP resources created for a Cloudera environment

When a Cloudera environment is created, a FreeIPA cluster and a Data Lake cluster are created.

The following Google Cloud resources are created for FreeIPA (one per environment):

Resource Description
Service account for credential To allow Cloudera to access and provision resources in your Google Cloud project, you must create a service account in your Google Cloud project, assign required roles, and generate a JSON access key that can later be provided to Cloudera.
VPC network and subnets During environment creation you provide your own existing VPC network and subnets.

All compute resources that Cloudera provisions for the environment and Cloudera services are provisioned into the VPC network specified during environment creation.

Firewall rules Firewall rules define inbound and outbound access to the instances. If during environment creation you choose to have new firewall rules created, then they are created on your GCP account. Alternatively, you can provide your own existing firewall rules.
VM instances During environment creation, two or three e2-standard-2 VM instances are provisioned for the FreeIPA HA server. The number of VMs depends on the selected Data Lake type.
OS disk An OS disk is provisioned for the FreeIPA VM.
Attached disk An attached disk (pd-standard) is provisioned for each VM.
Public IP address (if required) If you choose to use public IPs, your VM is assigned a public IP address.
GCS bucket for storing operating system images By default, Cloudera creates a storage bucket that is used solely for storing operating system images.

If required, you can optionally pre-create this account and copy the required images.

In addition, the following resources are created for each Data Lake (one per environment):

Resource Description
VM instances VM instances are provisioned for the Data Lake nodes.
  • Light duty: Two VM instances are provisioned: One e2-standard-2 VM instance (for IDBroker) and one e2-standard-8 VM instance (for master) are created.
  • Medium duty: Ten VM instances are provisioned: Two e2-standard-2 (IDBroker), three e2-standard-4 (two Data Lake Master nodes and one Auxiliary node), and five e2-standard-8 (three DataLake Core nodes and two Gateway nodes).
Attached disk An attached disk (pd-standard) is provisioned for each VM.
OS disk An OS disk is provisioned for each VM.
PostgreSQL database A custom PostgreSQL database instance (100GB SSD, 2vCPU, 13 GB RAM) is provisioned for the Data Lake. This databse instance is used for Cloudera Manager, Ranger, and Hive MetaStore.
Firewall rules Firewall rules define inbound and outbound access to VM instances. If during environment creation you choose to have new firewall rules created, then they are created on your GCP project.
Google storage buckets The existing Google Storage bucket that you provide during environment creation for the Data Lake is used for Data Lake log storage and workload data storage.
Service accounts Prior to registering your environment in Cloudera, during Google storage setup, you should create service accounts and assign roles to them as instructed in Cloudera documentation.
Public IP address (if required) If you choose to use public IPs, your VM is assigned a public IP address.

GCP resources created for Cloudera Data Hub

The following Google Cloud resources are created for each Cloudera Data Hub cluster:

Resource Description
VM instances and attached storage A VM is created for each cluster node. The VM type varies depending on what you selected during Cloudera Data Hub cluster creation. For a list of supported VM types, refer to Cloudera Public Cloud service rates.
Firewall rules Firewall rules define inbound and outbound access to VM instances. If during environment creation you choose to have new firewall rules created, then they are created on your GCP project.
OS disk An OS disk is provisioned for each VM.
Attached disk An attached disk (pd-standard) is provisioned for each VM, as specified during Cloudera Data Hub cluster creation. The disk size is selected during cluster creation.
Public IP address (if required) If you choose to use public IPs, each of the VMs is assigned a public IP address.