Azure resources used by CDP

The following Azure resources are used by CDP and CDP services.

Azure resources used by a CDP environment

The following Azure resources are used by each Azure environment:

Resource Description Naming convention
Virtual Private Network (VNet) If during environment creation you select to have a new VNet and subnets created, then a new VNet and subnets are created on your Azure account. Alternatively, you can provide your own existing VNet and subnets.

In both cases (new and existing VPC), all compute resources that CDP provisions for the environment and CDP services are provisioned into the VNet specified during environment creation.

Specified by customer
Resource group for FreeIPA resources If you chose for CDP to create multiple resource groups, a resource group is created to group all the resources created for FreeIPA. This resource group is not created if you chose to use a single existing resource group. <env-name>-freeipa-<numeric-id>
Virtual Machine (VM) During environment creation, a VM (Standard_D3_v2) is provisioned for the FreeIPA server node by default. If you choose to deploy FreeIPA in HA mode, up to three VMs of that same type are provisioned. <env-name>-freeipa-<numeric-id>m0
OS disk An OS disk is provisioned for the FreeIPA VM. <env-name>-freeipa-<numeric-id>m0
Network interface One network interface card (NIC) is provisioned for the FreeIPA VM. <env-name>-freeipa-<numeric-id>m0
Public IP address If you choose to use public IPs, your VM is assigned a public IP address. <env-name>-freeipa-<numeric-id>m0
Network security group Network security groups define inbound and outbound access to the instances.If during environment creation you choose to have new security groups created, then they are created on your Azure account. Alternatively, you can provide your own existing security groups. master0-<env-name>freeipa<numeric-id>sg
ADLS Gen2 storage account for storing operating system images By default, CDP creates an ADLS Gen2 storage account that is used solely for storing operating system images.

If required, you can optionally pre-create this account and copy the required images.

The name is the concatenation of the following three elements:
  • “cbimg”
  • Region identifier: The starting letters of the region where the SA is. For example, “eu” for East US, or “eu2” for East US 2.
  • Subscription ID, without hyphens ('-') and all lowercase. For example, if your subscription ID is a9d4456e-349f-46f5-bc73-54a8d523e504,you should convert it to a9d4456e349f46f5bc7354a8d523e504.

For example, the name of a storage account in East US 2 region in subscription a9d4456e-349f-46f5-bc73-54a8d523e504 would be cbimgeu2a9d4456e349f46f5bc7354a8d523e504.

Resource group for the ADLS account used for storing operating system images If you chose for CDP to create multiple resource groups, a separate resource group is created for the ADLS Gen2 account mentioned above.This resource group is not created if you chose to use a single existing resource group. cloudbreak-images

In addition, the following resources are created for each Data Lake (one per environment):

Resource Description Naming convention
Resource group If you chose for CDP to create multiple resource groups, two resource groups are created: One new resource group is created to group all the resources created for the Data Lake and another resource group is created for the database used by the Data Lake. These resource groups are not created if you coose to use a single existing resource group. <dl-name><numeric-id>
Virtual Machines (VMs) During light duty Data Lake creation, two VMs (Standard_D8s_v3 for master and Standard_D2s_v3 for IDBroker) are provisioned for the Data Lake nodes. <dl-name><numeric-id>[m0|i1]
OS disks An OS disk is provisioned for each VM. <env-name>-freeipa-<numeric-id>m0
Attached disks An attached disk (StandardSSD_LRS) is provisioned for each VM. <dl-name><numeric-id>-[m0|i1]<index>-<timestamp>
Network interface One network interface card (NIC) is provisioned for each VM. <dl-name><numeric-id>[m0|i1]
Public IP address If you choose to use public IPs, each of the VMs is assigned a public IP address. <dl-name><numeric-id>[m0|i1]
Network security groups Network security groups define inbound and outbound access to the instances.If during environment creation you choose to have new security groups created, then they are created on your Azure account. [master|idbroker]<dl-name><numeric-id>sg
Availability set Two availability sets are created, one for each VM. <dl-name>-[master|idbroker]-as
Resource group for external DB If you chose for CDP to create multiple resource groups, a resource group is created for the external database. This resource group is not created if you chooe to use a single existing resource group. <env-name>-dbstck-<numeric-id>
Azure Database for PostgreSQL server A database instance MO_Gen5_4, 100 GB (General Purpose, Gen5, 4 vCore & 100GB Storage) is provisioned for Cloudera Manager, Ranger, and Hive MetaStore. dbsrv-<numeric-id>
ADLS Gen2 storage Prior to registering your environment in CDP, you should create ADLS Gen2 storage containers as instructed in CDP documentation. Specified by customer
Managed identities Prior to registering your environment in CDP, you should create managed identities as instructed in CDP documentation. Specified by customer

Azure resources used by Data Hub

The following Azure resources are used by the Data Hub service:

Resource Description Naming convention
Resource group If you chose for CDP to create multiple resource groups, for each Data Hub cluster, a new resource group is created to group all the resources created for the cluster. This resource group is not created if you chose to use a single existing resource group. <dh-name><numeric-id>
Virtual Machines (VMs) A VM is created for each cluster node. The VM type varies depending on what you selected during Data Hub cluster creation. For a list of supported VM types, refer to Cloudera Data Platform (CDP) Public Cloud service rates. <dh-name><numeric-id><hostgroup abbr.><node index>
OS disk An OS disk is provisioned for each VM. <dh-name><numeric-id>-osDisk<hostgroup abbr.><node index>
Attached Disks An attached disk is provisioned for each VM, as specified during Data Hub cluster creation. The disk size is selected during cluster creation. <dh-name><numeric-id>-<hostgroup abbr.><node index>-<disk counter>-<timestamp>
Network interface One network interface card (NIC) is provisioned for each VM. <dh-name><numeric-id><hostgroup abbr.><node index>
Public IP address If you choose to use public IPs, each of the VMs is assigned a public IP address. <dh-name><numeric-id><hostgroup abbr.><node index>
Network security group Network security groups define inbound and outbound access to the instances. If during environment creation you choose to have new security groups created, then they are created on your Azure account. <hostgroup nm><dh-name><numeric-id>sg
Availability set One availability set is created for each host group. <dh-name>-<hostgroup>-as

Azure resources used by Data Warehouse

The following Azure resources are used by the Cloudera Data Warehouse (CDW) service:

Resource Description
Resource Group If you chose for CDP to create multiple resource groups, one resource group is created with the naming convention “<environment-id>-dwx-rg”. This resource group is not created if you chose to use a single existing resource group.
Azure Kubernetes Service (AKS) CDP creates an AKS cluster for each activated DW environment to host Kubernetes-based resources. The underlying compute, network resources are managed by Azure, including:
  • Resource group
  • Virtual machine scale sets
  • Load balancer(s)
  • Public IP address(es)
  • Network security group
  • Disk(s)

For a list of supported VM types, refer to Cloudera Data Platform (CDP) Public Cloud service rates.

Azure Database for PostgreSQL server PostgreSQL database (General Purpose, Gen5, 4 vCore) is created for DW to store configuration data.

Azure resources used by Machine Learning

The following Azure resources are used by the Cloudera Machine Learning (CML) service:

Resource Description
Resource groups If you chose for CDP to create multiple resource groups, one resource group is created with the naming convention “liftie-<unique string>" (which has an AKS cluster of the same name). This resource group is not created if you chose to use a single existing resource group.
Azure Kubernetes Service (AKS) CDP creates an AKS cluster for each CML workspace to host Kubernetes-based resources. The underlying compute, network resources are managed by Azure, including:
  • Resource group
  • Virtual machine scale sets
  • Load balancer(s)
  • Public IP address(es)
  • Route table
  • Network security group
  • Azure disk(s) (Premium_LRS)

For a list of supported VM types, refer to Cloudera Data Platform (CDP) Public Cloud service rates.

Log analytics workspace A logs analytics workspace is created for storing log data.
NetApp Files account CML workspaces deployed on Azure require an existing Azure NetApp Files account with a pre-created 4TB capacity pool. For instructions on how to create it, see Create Azure NetApp Files Account and Pool Note that even if the Azure NetApp Files is available in your selected Azure region, you may need to manually request it from Azure Support to enable it for use in your Azure subscription.

Azure resources used by Operational Database

The following Azure resources are used by the Cloudera Operational Database (COD) service:

Resource Description
Resource Group If you chose for CDP to create multiple resource groups, a resource group is created which contains all of the nodes that comprise the OD database. This resource group is not created if you chose to use a single existing resource group.
Virtual Machines (VMs) A compute VM is created for each node in a COD database. The instance type and managed storage are automatically determined by OD. Azure network security groups are automatically configured as a part of environment creation to define inbound and outbound network access to the created instances.
ADLS Gen2 storage This existing blob storage account that you provided for the Data Lake to use for workload data storage is automatically used by the COD database for storage of data.