Azure resources used by CDP

The following Azure resources are used by CDP and CDP services.

Azure resources created for an environment

The following Azure resources are created for each Azure environment:

Resource Description Naming convention
Virtual Private Network (VNet) If during environment creation you select to have a new VNet and subnets created, then a new VNet and subnets are created on your Azure account. Alternatively, you can provide your own existing VNet and subnets.

In both cases (new and existing VPC), all compute resources that CDP provisions for the environment and CDP services are provisioned into the VNet specified during environment creation.

Specified by customer
Resource group for FreeIPA resources A resource group is created to group all the resources created for FreeIPA. <env-name>-freeipa-<numeric-id>
Virtual Machine (VM) During environment creation, a VM (Standard_D3_v2) is provisioned for the FreeIPA server node. <env-name>-freeipa-<numeric-id>m0
OS disk An OS disk is provisioned for the FreeIPA VM. <env-name>-freeipa-<numeric-id>m0
Network interface One network interface card (NIC) is provisioned for the FreeIPA VM. <env-name>-freeipa-<numeric-id>m0
Public IP address If you choose to use public IPs, your VM is assigned a public IP address. <env-name>-freeipa-<numeric-id>m0
Network security group Network security groups define inbound and outbound access to the instances.If during environment creation you choose to have new security groups created, then they are created on your Azure account. Alternatively, you can provide your own existing security groups. master0-<env-name>freeipa<numeric-id>sg
ADLS Gen2 storage account for storing images By default, CDP creates an ADLS Gen2 storage account that is used solely for image storage.

If required, you can optionally pre-create this account and copy the required images.

The name is the concatenation of the following three elements:
  • “cbimg”
  • Region identifier: The starting letters of the region where the SA is. For example, “eu” for East US, or “eu2” for East US 2.
  • Subscription ID, without hyphens ('-') and all lowercase. For example, if your subscription ID is a9d4456e-349f-46f5-bc73-54a8d523e504,you should convert it to a9d4456e349f46f5bc7354a8d523e504.

For example, the name of a storage account in East US 2 region in subscription a9d4456e-349f-46f5-bc73-54a8d523e504 would be cbimgeu2a9d4456e349f46f5bc7354a8d523e504.

Resource group for the ADLS account used for storing images A separate resource group is created for the ADLS Gen2 account mentioned above. cloudbreak-images

In addition, the following resources are created for each Data Lake (one per environment):

Resource Description Naming convention
Resource group Two resource groups are created: One new resource group is created to group all the resources created for the Data Lake and another resource group is created for the database used by the Data Lake. <dl-name><numeric-id>
Virtual Machines (VMs) During light duty Data Lake creation, two VMs (Standard_D8s_v3 for master and Standard_D2s_v3 for IDBroker) are provisioned for the Data Lake nodes. <dl-name><numeric-id>[m0|i1]
OS disks An OS disk is provisioned for each VM. <env-name>-freeipa-<numeric-id>m0
Attached disks An attached disk (StandardSSD_LRS) is provisioned for each VM. <dl-name><numeric-id>-[m0|i1]<index>-<timestamp>
Network interface One network interface card (NIC) is provisioned for each VM. <dl-name><numeric-id>[m0|i1]
Public IP address If you choose to use public IPs, each of the VMs is assigned a public IP address. <dl-name><numeric-id>[m0|i1]
Network security groups Network security groups define inbound and outbound access to the instances.If during environment creation you choose to have new security groups created, then they are created on your Azure account. [master|idbroker]<dl-name><numeric-id>sg
Availability set Two availability sets are created, one for each VM. <dl-name>-[master|idbroker]-as
Resource group for external DB A resource group is created for the external database. <env-name>-dbstck-<numeric-id>
Azure Database for PostgreSQL server A database instance MO_Gen5_4, 100 GB (General Purpose, Gen5, 4 vCore & 100GB Storage) is provisioned for Cloudera Manager, Ranger, and Hive MetaStore. dbsrv-<numeric-id>
ADLS Gen2 storage Prior to registering your environment in CDP, you should create ADLS Gen2 storage containers as instructed in CDP documentation. Specified by customer
Managed identities Prior to registering your environment in CDP, you should create managed identities as instructed in CDP documentation. Specified by customer

Azure resources used by Data Hub

The following Azure resources are used by the Data Hub service:

Resource Description Naming convention
Resource group For each Data Hub cluster, a new resource group is created to group all the resources created for the cluster. <dh-name><numeric-id>
Virtual Machines (VMs) A VM is created for each cluster node. The VM type varies depending on what you selected during Data Hub cluster creation. For a list of supported VM types, refer to Cloudera Data Platform (CDP) Public Cloud service rates. <dh-name><numeric-id><hostgroup abbr.><node index>
OS disk An OS disk is provisioned for each VM. <dh-name><numeric-id>-osDisk<hostgroup abbr.><node index>
Attached Disks An attached disk is provisioned for each VM, as specified during Data Hub cluster creation. The disk size is selected during cluster creation. <dh-name><numeric-id>-<hostgroup abbr.><node index>-<disk counter>-<timestamp>
Network interface One network interface card (NIC) is provisioned for each VM. <dh-name><numeric-id><hostgroup abbr.><node index>
Public IP address If you choose to use public IPs, each of the VMs is assigned a public IP address. <dh-name><numeric-id><hostgroup abbr.><node index>
Network security group Network security groups define inbound and outbound access to the instances. If during environment creation you choose to have new security groups created, then they are created on your Azure account. <hostgroup nm><dh-name><numeric-id>sg
Availability set One availability set is created for each host group. <dh-name>-<hostgroup>-as

Azure resources used by Data Warehouse

The following Azure resources are used by the Data Warehouse (DW) service:

Resource Description
Resource Group One resource group is created with the naming convention “<environment-id>-dwx-rg”. The environment ID can be found in the web UI.
Azure Kubernetes Service (AKS) CDP creates an AKS cluster for each activated DW environment to host Kubernetes-based resources. The underlying compute, network resources are managed by Azure, including:
  • Virtual machine scale sets
  • Load balancer(s)
  • Public IP address(es)
  • Network security group
  • Disk(s)

For a list of supported VM types, refer to Cloudera Data Platform (CDP) Public Cloud service rates.

Azure Database for PostgreSQL server PostgreSQL database (General Purpose, Gen5, 4 vCore) is created for DW to store configuration data.

Azure resources used by Machine Learning

The following Azure resources are used by the Machine Learning (ML) service:

Resource Description
Resource groups Two resource groups are created:
  • liftie-<unique string>, which has an AKS cluster of the same name
  • MC_liftie-<unique string>_<region> which has resources needed to run the AKS cluster.
Azure Kubernetes Service (AKS) CDP creates an AKS cluster for each ML workspace to host Kubernetes-based resources. The underlying compute, network resources are managed by Azure, including:
  • Virtual machine scale sets
  • Load balancer(s)
  • Public IP address(es)
  • Route table
  • Network security group
  • Azure disk(s) (Premium_LRS)

For a list of supported VM types, refer to Cloudera Data Platform (CDP) Public Cloud service rates.

Log analytics workspace A logs analytics workspace is created for storing log data.

Azure resources used by Operational Database

The following Azure resources are used by the Operational Database (OD) service:

Resource Description
Resource Group One resource group is created which contains all of the nodes that comprise the OD database.
Virtual Machines (VMs) A compute VM is created for each node in an OD database. The instance type and managed storage are automatically determined by OD. Azure network security groups are automatically configured as a part of environment creation to define inbound and outbound network access to the created instances.
ADLS Gen2 storage This existing blob storage account that you provided for the Data Lake to use for workload data storage is automatically used by the OD database for storage of data.