Overview of GCP resources used by Cloudera
The following Google Cloud resources are used by Cloudera and Cloudera services.
GCP resources created for a Cloudera environment
When a Cloudera environment is created, a FreeIPA cluster and a Data Lake cluster are created.
The following Google Cloud resources are created for FreeIPA (one per environment):
Resource | Description |
---|---|
Service account for credential | To allow Cloudera to access and provision resources in your Google Cloud project, you must create a service account in your Google Cloud project, assign required roles, and generate a JSON access key that can later be provided to Cloudera. |
VPC network and subnets | During environment creation you provide your own existing VPC network and subnets. All compute resources that Cloudera provisions for the environment and Cloudera services are provisioned into the VPC network specified during environment creation. |
Firewall rules | Firewall rules define inbound and outbound access to the instances. If during environment creation you choose to have new firewall rules created, then they are created on your GCP account. Alternatively, you can provide your own existing firewall rules. |
VM instances | During environment creation, two or three e2-standard-2 VM
instances are provisioned for the FreeIPA HA server. The number of VMs depends on the
selected Data Lake type. |
OS disk | An OS disk is provisioned for the FreeIPA VM. |
Attached disk | An attached disk (pd-standard) is provisioned for each VM. |
Public IP address (if required) | If you choose to use public IPs, your VM is assigned a public IP address. |
GCS bucket for storing operating system images | By default, Cloudera creates a storage bucket that is
used solely for storing operating system images. If required, you can optionally pre-create this account and copy the required images. |
In addition, the following resources are created for each Data Lake (one per environment):
Resource | Description |
---|---|
VM instances | VM instances are provisioned for the Data Lake nodes.
|
Attached disk | An attached disk (pd-standard ) is provisioned for each
VM. |
OS disk | An OS disk is provisioned for each VM. |
PostgreSQL database | A custom PostgreSQL database instance (100GB SSD, 2vCPU, 13 GB RAM) is provisioned for the Data Lake. This databse instance is used for Cloudera Manager, Ranger, and Hive MetaStore. |
Firewall rules | Firewall rules define inbound and outbound access to VM instances. If during environment creation you choose to have new firewall rules created, then they are created on your GCP project. |
Google storage buckets | The existing Google Storage bucket that you provide during environment creation for the Data Lake is used for Data Lake log storage and workload data storage. |
Service accounts | Prior to registering your environment in Cloudera, during Google storage setup, you should create service accounts and assign roles to them as instructed in Cloudera documentation. |
Public IP address (if required) | If you choose to use public IPs, your VM is assigned a public IP address. |
GCP resources created for Cloudera Data Hub
The following Google Cloud resources are created for each Cloudera Data Hub cluster:
Resource | Description |
---|---|
VM instances and attached storage | A VM is created for each cluster node. The VM type varies depending on what you selected during Cloudera Data Hub cluster creation. For a list of supported VM types, refer to Cloudera Public Cloud service rates. |
Firewall rules | Firewall rules define inbound and outbound access to VM instances. If during environment creation you choose to have new firewall rules created, then they are created on your GCP project. |
OS disk | An OS disk is provisioned for each VM. |
Attached disk | An attached disk (pd-standard) is provisioned for each VM, as specified during Cloudera Data Hub cluster creation. The disk size is selected during cluster creation. |
Public IP address (if required) | If you choose to use public IPs, each of the VMs is assigned a public IP address. |