Overview of Azure resources used by CDP

The following Azure resources are used by CDP and CDP services.

Azure resources created for a CDP environment

When a CDP environment is created, a FreeIPA cluster and a Data Lake cluster are created.

The following Azure resources are created for FreeIPA (one per environment):

Resource Description Naming convention
Virtual Private Network (VNet) If during environment creation you select to have a new VNet and subnets created, then a new VNet and subnets are created on your Azure account. Alternatively, you can provide your own existing VNet and subnets.

In both cases (new and existing VPC), all compute resources that CDP provisions for the environment and CDP services are provisioned into the VNet specified during environment creation.

Specified by customer
Resource group for FreeIPA resources If you chose for CDP to create multiple resource groups, a resource group is created to group all the resources created for FreeIPA. This resource group is not created if you chose to use a single existing resource group. <env-name>-freeipa-<numeric-id>
Virtual Machines (VMs) During environment creation, two or three Standard_DS3_v2 VMs are provisioned for the FreeIPA HA server by default. The number of VMs depends on the selected Data Lake type. <env-name>-freeipa-<numeric-id>m0
OS disk An OS disk is provisioned for the FreeIPA VM. <env-name>-freeipa-<numeric-id>m0
Network interface One network interface card (NIC) is provisioned for the FreeIPA VM. <env-name>-freeipa-<numeric-id>m0
Public IP address If you choose to use public IPs, your VM is assigned a public IP address. <env-name>-freeipa-<numeric-id>m0
Network security group Network security groups define inbound and outbound access to the instances.If during environment creation you choose to have new security groups created, then they are created on your Azure account. Alternatively, you can provide your own existing security groups. master0-<env-name>freeipa<numeric-id>sg
ADLS Gen2 storage account for storing operating system images By default, CDP creates an ADLS Gen2 storage account that is used solely for storing operating system images.

If required, you can optionally pre-create this account and copy the required images.

The name is the concatenation of the following three elements:
  • “cbimg”
  • Region identifier: The starting letters of the region where the SA is. For example, “eu” for East US, or “eu2” for East US 2.
  • Subscription ID, without hyphens ('-') and all lowercase. For example, if your subscription ID is a9d4456e-349f-46f5-bc73-54a8d523e504,you should convert it to a9d4456e349f46f5bc7354a8d523e504.

For example, the name of a storage account in East US 2 region in subscription a9d4456e-349f-46f5-bc73-54a8d523e504 would be cbimgeu2a9d4456e349f46f5bc7354a8d523e504.

Resource group for the ADLS account used for storing operating system images If you chose for CDP to create multiple resource groups, a separate resource group is created for the ADLS Gen2 account mentioned above.This resource group is not created if you chose to use a single existing resource group. cloudbreak-images

In addition, the following resources are created for each Data Lake (one per environment):

Resource Description Naming convention
Resource group If you chose for CDP to create multiple resource groups, two resource groups are created: One new resource group is created to group all the resources created for the Data Lake and another resource group is created for the database used by the Data Lake. These resource groups are not created if you coose to use a single existing resource group. <dl-name><numeric-id>
Virtual Machines (VMs) VMs are provisioned for the Data Lake nodes:
  • Light duty: Two instances are provisioned: One Standard_D8s_v3 instance (Data Lake Master node) and one Standard_D2s_v3 instance (IDBroker).
  • Medium duty: Ten instances are provisioned: One Standard_D2s_v3 instance (IDBroker), three Standard_D4s_v3 instances (two Data Lake Master nodes and one Auxiliary node), and five Standard_D8s_v3 instances (three DataLake Core and two Gateway nodes).
<dl-name><numeric-id>[m0|i1]
OS disks An OS disk is provisioned for each VM. <env-name>-freeipa-<numeric-id>m0
Attached disks An attached disk (StandardSSD_LRS) is provisioned for each VM. <dl-name><numeric-id>-[m0|i1]<index>-<timestamp>
Network interface One network interface card (NIC) is provisioned for each VM. <dl-name><numeric-id>[m0|i1]
Public IP address If you choose to use public IPs, each of the VMs is assigned a public IP address. <dl-name><numeric-id>[m0|i1]
Network security groups Network security groups define inbound and outbound access to the instances.If during environment creation you choose to have new security groups created, then they are created on your Azure account. [master|idbroker]<dl-name><numeric-id>sg
Availability set One availability set is created for the master host group only. <dl-name>-master-as
Resource group for external DB If you chose for CDP to create multiple resource groups, a resource group is created for the external database. This resource group is not created if you chooe to use a single existing resource group. <env-name>-dbstck-<numeric-id>
Azure Database for PostgreSQL server

An RDS instance is provisioned for Cloudera Manager, Ranger, and Hive MetaStore.

When creating Flexible Server, CDP automatically chooses the latest generation of Standard_E4s instance family that is supported in the given region, for example, Standard_E4ds_v5, Standard_E4ds_v4 or Standard_E4s_v3 with 128 GB of storage. For more information, see Azure regions.

If you choose to use Single Server, a database instance MO_Gen5_4 with 100 GB of storage) is provisioned.
dbsrv-<numeric-id>
ADLS Gen2 storage Prior to registering your environment in CDP, you should create ADLS Gen2 storage containers as instructed in CDP documentation. Specified by customer
Managed identities Prior to registering your environment in CDP, you should create managed identities as instructed in CDP documentation. Specified by customer

Azure resources created for Data Hub

The following Azure resources are created for the Data Hub service:

Resource Description Naming convention
Resource group If you chose for CDP to create multiple resource groups, for each Data Hub cluster, a new resource group is created to group all the resources created for the cluster. This resource group is not created if you chose to use a single existing resource group. <dh-name><numeric-id>
Virtual Machines (VMs) A VM is created for each cluster node. The VM type varies depending on what you selected during Data Hub cluster creation. For a list of supported VM types, refer to Cloudera Data Platform (CDP) Public Cloud service rates. <dh-name><numeric-id><hostgroup abbr.><node index>
OS disk An OS disk is provisioned for each VM. <dh-name><numeric-id>-osDisk<hostgroup abbr.><node index>
Attached Disks An attached disk is provisioned for each VM, as specified during Data Hub cluster creation. The disk size is selected during cluster creation. <dh-name><numeric-id>-<hostgroup abbr.><node index>-<disk counter>-<timestamp>
Network interface One network interface card (NIC) is provisioned for each VM. <dh-name><numeric-id><hostgroup abbr.><node index>
Public IP address If you choose to use public IPs, each of the VMs is assigned a public IP address. <dh-name><numeric-id><hostgroup abbr.><node index>
Network security group Network security groups define inbound and outbound access to the instances. If during environment creation you choose to have new security groups created, then they are created on your Azure account. <hostgroup nm><dh-name><numeric-id>sg
Availability set If the "Hardware and Storage" Advanced Options were used, one availability set is created for each host group. Otherwise, one availability set is created only for the host groups that contain KNOX and/or OOZIE service. <dh-name>-<hostgroup>-as

Azure resources created for Data Warehouse

The following Azure resources are created for the Cloudera Data Warehouse (CDW) service:

Resource Description
Resource Group If you chose for CDP to create multiple resource groups, one resource group is created with the naming convention “<environment-id>-dwx-rg”. This resource group is not created if you chose to use a single existing resource group.
Azure Kubernetes Service (AKS) CDP creates an AKS cluster for each activated DW environment to host Kubernetes-based resources. The underlying compute, network resources are managed by Azure, including:
  • Resource group
  • Virtual machine scale sets
  • Load balancer(s)
  • Public IP address(es)
  • Network security group
  • Disk(s)

For a list of supported VM types, refer to Cloudera Data Platform (CDP) Public Cloud service rates.

Azure Database for PostgreSQL server PostgreSQL database (General Purpose, Gen5, 4 vCore) is created for DW to store configuration data.

Azure resources created for Machine Learning

The following Azure resources are created for the Cloudera Machine Learning (CML) service:

Resource Description
Resource groups If you chose for CDP to create multiple resource groups, one resource group is created with the naming convention “liftie-<unique string>" (which has an AKS cluster of the same name). This resource group is not created if you chose to use a single existing resource group.
Azure Kubernetes Service (AKS) CDP creates an AKS cluster for each CML workspace to host Kubernetes-based resources. The underlying compute, network resources are managed by Azure, including:
  • Resource group
  • Virtual machine scale sets
  • Load balancer(s)
  • Public IP address(es)
  • Route table
  • Network security group
  • Azure disk(s) (Premium_LRS)

For a list of supported VM types, refer to Cloudera Data Platform (CDP) Public Cloud service rates.

Log analytics workspace A logs analytics workspace is created for storing log data.
Azure Files Storage account If you choose Azure Files NFS, you will need an existing Azure Files Storage account.

Azure resources created for DataFlow

The following Azure resources are created for the DataFlow service:

Resource Description
Resource groups If you chose for CDP to create multiple resource groups, one resource group is created with the naming convention “liftie-<unique string>" (which has an AKS cluster of the same name). This resource group is not created if you chose to use a single existing resource group.
Azure Kubernetes Service (AKS) CDP creates an AKS cluster for the DataFlow service. The underlying compute, network resources are managed by Azure, including:
  • Load balancer: Azure Load Balancer
  • Network security group
  • Public IP address(es)
  • Route table
  • Virtual machine scale set

For a list of supported VM types, refer to Cloudera Data Platform (CDP) Public Cloud service rates.

Log analytics workspace A logs analytics workspace is created for storing log data.
Azure Database for PostgreSQL Azure Database for PostgreSQL is used for storing job-related metadata and histories.

Azure resources created for Data Engineering

The following Azure resources are created for the Data Engineering (CDE) service:

Resource Description
Resource groups If you chose for CDP to create multiple resource groups, one resource group is created with the naming convention “liftie-<unique string>" (which has an AKS cluster of the same name). This resource group is not created if you chose to use a single existing resource group.
Azure Kubernetes Service (AKS) CDP creates an AKS cluster for each Data Engineering Service. The underlying compute, network resources are managed by Azure, including:
  • Load balancer: Azure Load Balancer
  • Network security group
  • Public IP address(es)
  • Route table
  • Virtual machine scale set

For a list of supported VM types, refer to Cloudera Data Platform (CDP) Public Cloud service rates.

Log analytics workspace A logs analytics workspace is created for storing log data.
Azure Files Azure Files | Microsoft Azure contains job resources, application code, Apache Airflow DAG files and any other uploaded files.
Azure Database for MySQL Server Azure Database for MySQL is used for storing job related metadata, histories.

Azure resources created for Operational Database

The following Azure resources are created for the Cloudera Operational Database (COD) service:

Resource Description
Resource Group If you chose for CDP to create multiple resource groups, a resource group is created which contains all of the nodes that comprise the OD database. This resource group is not created if you chose to use a single existing resource group.
Virtual Machines (VMs) A compute VM is created for each node in a COD database. The instance type and managed storage are automatically determined by OD. Azure network security groups are automatically configured as a part of environment creation to define inbound and outbound network access to the created instances.
ADLS Gen2 storage This existing blob storage account that you provided for the Data Lake to use for workload data storage is automatically used by the COD database for storage of data.