Overview of Azure resources used by Cloudera

The following Azure resources are used by Cloudera and Cloudera services.

Azure resources created for a Cloudera environment

When a Cloudera environment is created, a FreeIPA cluster and a Data Lake cluster are created.

The following Azure resources are created for FreeIPA (one per environment):

Resource Description Naming convention
Virtual Private Network (VNet) If during environment creation you select to have a new VNet and subnets created, then a new VNet and subnets are created on your Azure account. Alternatively, you can provide your own existing VNet and subnets.

In both cases (new and existing VPC), all compute resources that Cloudera provisions for the environment and Cloudera services are provisioned into the VNet specified during environment creation.

Specified by customer
Resource group for FreeIPA resources If you chose for Cloudera to create multiple resource groups, a resource group is created to group all the resources created for FreeIPA. This resource group is not created if you chose to use a single existing resource group. <env-name>-freeipa-<numeric-id>
Virtual Machines (VMs) During environment creation, two or three Standard_DS3_v2 VMs are provisioned for the FreeIPA HA server by default. The number of VMs depends on the selected Data Lake type. <env-name>-freeipa-<numeric-id>m0
OS disk An OS disk is provisioned for the FreeIPA VM. <env-name>-freeipa-<numeric-id>m0
Network interface One network interface card (NIC) is provisioned for the FreeIPA VM. <env-name>-freeipa-<numeric-id>m0
Public IP address If you choose to use public IPs, your VM is assigned a public IP address. <env-name>-freeipa-<numeric-id>m0
Network security group Network security groups define inbound and outbound access to the instances.If during environment creation you choose to have new security groups created, then they are created on your Azure account. Alternatively, you can provide your own existing security groups. master0-<env-name>freeipa<numeric-id>sg
ADLS Gen2 storage account for storing operating system images By default, Cloudera creates an ADLS Gen2 storage account that is used solely for storing operating system images.

If required, you can optionally pre-create this account and copy the required images.

The name is the concatenation of the following three elements:
  • “cbimg”
  • Region identifier: The starting letters of the region where the SA is. For example, “eu” for East US, or “eu2” for East US 2.
  • Subscription ID, without hyphens ('-') and all lowercase. For example, if your subscription ID is a9d4456e-349f-46f5-bc73-54a8d523e504,you should convert it to a9d4456e349f46f5bc7354a8d523e504.

For example, the name of a storage account in East US 2 region in subscription a9d4456e-349f-46f5-bc73-54a8d523e504 would be cbimgeu2a9d4456e349f46f5bc7354a8d523e504.

Resource group for the ADLS account used for storing operating system images If you chose for Cloudera to create multiple resource groups, a separate resource group is created for the ADLS Gen2 account mentioned above.This resource group is not created if you chose to use a single existing resource group. cloudbreak-images

In addition, the following resources are created for each Data Lake (one per environment):

Resource Description Naming convention
Resource group If you chose for Cloudera to create multiple resource groups, two resource groups are created: One new resource group is created to group all the resources created for the Data Lake and another resource group is created for the database used by the Data Lake. These resource groups are not created if you coose to use a single existing resource group. <dl-name><numeric-id>
Virtual Machines (VMs) VMs are provisioned for the Data Lake nodes:
  • Light duty: Two instances are provisioned: One Standard_D8s_v3 instance (Data Lake Master node) and one Standard_D2s_v3 instance (IDBroker).
  • Medium duty: Ten instances are provisioned: One Standard_D2s_v3 instance (IDBroker), three Standard_D4s_v3 instances (two Data Lake Master nodes and one Auxiliary node), and five Standard_D8s_v3 instances (three DataLake Core and two Gateway nodes).
<dl-name><numeric-id>[m0|i1]
OS disks An OS disk is provisioned for each VM. <env-name>-freeipa-<numeric-id>m0
Attached disks An attached disk (StandardSSD_LRS) is provisioned for each VM. <dl-name><numeric-id>-[m0|i1]<index>-<timestamp>
Network interface One network interface card (NIC) is provisioned for each VM. <dl-name><numeric-id>[m0|i1]
Public IP address If you choose to use public IPs, each of the VMs is assigned a public IP address. <dl-name><numeric-id>[m0|i1]
Network security groups Network security groups define inbound and outbound access to the instances.If during environment creation you choose to have new security groups created, then they are created on your Azure account. [master|idbroker]<dl-name><numeric-id>sg
Availability set One availability set is created for the master host group only. <dl-name>-master-as
Resource group for external DB If you chose for Cloudera to create multiple resource groups, a resource group is created for the external database. This resource group is not created if you chooe to use a single existing resource group. <env-name>-dbstck-<numeric-id>
Azure Database for PostgreSQL server

An RDS instance is provisioned for Cloudera Manager, Ranger, and Hive MetaStore.

When creating Flexible Server, Cloudera automatically chooses the latest generation of Standard_E4s instance family that is supported in the given region, for example, Standard_E4ds_v5, Standard_E4ds_v4 or Standard_E4s_v3 with 128 GB of storage. For more information, see Azure regions.

If you choose to use Single Server, a database instance MO_Gen5_4 with 100 GB of storage) is provisioned.
dbsrv-<numeric-id>
ADLS Gen2 storage Prior to registering your environment in Cloudera, you should create ADLS Gen2 storage containers as instructed in Cloudera documentation. Specified by customer
Managed identities Prior to registering your environment in Cloudera, you should create managed identities as instructed in Cloudera documentation. Specified by customer

Azure resources created for Cloudera Data Hub

The following Azure resources are created for the Cloudera Data Hub service:

Resource Description Naming convention
Resource group If you chose for Cloudera to create multiple resource groups, for each Cloudera Data Hub cluster, a new resource group is created to group all the resources created for the cluster. This resource group is not created if you chose to use a single existing resource group. <dh-name><numeric-id>
Virtual Machines (VMs) A VM is created for each cluster node. The VM type varies depending on what you selected during Cloudera Data Hub cluster creation. For a list of supported VM types, refer to Cloudera Public Cloud service rates. <dh-name><numeric-id><hostgroup abbr.><node index>
OS disk An OS disk is provisioned for each VM. <dh-name><numeric-id>-osDisk<hostgroup abbr.><node index>
Attached Disks An attached disk is provisioned for each VM, as specified during Cloudera Data Hub cluster creation. The disk size is selected during cluster creation. <dh-name><numeric-id>-<hostgroup abbr.><node index>-<disk counter>-<timestamp>
Network interface One network interface card (NIC) is provisioned for each VM. <dh-name><numeric-id><hostgroup abbr.><node index>
Public IP address If you choose to use public IPs, each of the VMs is assigned a public IP address. <dh-name><numeric-id><hostgroup abbr.><node index>
Network security group Network security groups define inbound and outbound access to the instances. If during environment creation you choose to have new security groups created, then they are created on your Azure account. <hostgroup nm><dh-name><numeric-id>sg
Availability set If the "Hardware and Storage" Advanced Options were used, one availability set is created for each host group. Otherwise, one availability set is created only for the host groups that contain KNOX and/or OOZIE service. <dh-name>-<hostgroup>-as

Azure resources created for Cloudera Data Warehouse

The following Azure resources are created for the Cloudera Data Warehouse service:

Resource Description
Resource Group If you chose for Cloudera to create multiple resource groups, one resource group is created with the naming convention “<environment-id>-dwx-rg”. This resource group is not created if you chose to use a single existing resource group.
Azure Kubernetes Service (AKS) Cloudera creates an AKS cluster for each activated Cloudera Data Warehouse environment to host Kubernetes-based resources. The underlying compute, network resources are managed by Azure, including:
  • Resource group
  • Virtual machine scale sets
  • Load balancer(s)
  • Public IP address(es)
  • Network security group
  • Disk(s)

For a list of supported VM types, refer to Cloudera Public Cloud service rates.

Azure Database for PostgreSQL server PostgreSQL database (General Purpose, Gen5, 4 vCore) is created for Cloudera Data Warehouse to store configuration data.

Azure resources created for Cloudera AI

The following Azure resources are created for the Cloudera AI service:

Resource Description
Resource groups If you chose for Cloudera to create multiple resource groups, one resource group is created with the naming convention “liftie-<unique string>" (which has an AKS cluster of the same name). This resource group is not created if you chose to use a single existing resource group.
Azure Kubernetes Service (AKS) Cloudera creates an AKS cluster for each Cloudera AI workbench to host Kubernetes-based resources. The underlying compute, network resources are managed by Azure, including:
  • Resource group
  • Virtual machine scale sets
  • Load balancer(s)
  • Public IP address(es)
  • Route table
  • Network security group
  • Azure disk(s) (Premium_LRS)

For a list of supported VM types, refer to Cloudera Public Cloud service rates.

Log analytics workspace A logs analytics workspace is created for storing log data.
Azure Files Storage account If you choose Azure Files NFS, you will need an existing Azure Files Storage account.

Azure resources created for Cloudera DataFlow

The following Azure resources are created for the Cloudera DataFlow service:

Resource Description
Resource groups If you chose for Cloudera to create multiple resource groups, one resource group is created with the naming convention “liftie-<unique string>" (which has an AKS cluster of the same name). This resource group is not created if you chose to use a single existing resource group.
Azure Kubernetes Service (AKS) Cloudera creates an AKS cluster for the Cloudera DataFlow service. The underlying compute, network resources are managed by Azure, including:
  • Load balancer: Azure Load Balancer
  • Network security group
  • Public IP address(es)
  • Route table
  • Virtual machine scale set

For a list of supported VM types, refer to Cloudera Public Cloud service rates.

Log analytics workspace A logs analytics workspace is created for storing log data.
Azure Database for PostgreSQL Azure Database for PostgreSQL is used for storing job-related metadata and histories.

Azure resources created for Cloudera Data Engineering

The following Azure resources are created for the Cloudera Data Engineering service:

Resource Description
Resource groups If you chose for Cloudera to create multiple resource groups, one resource group is created with the naming convention “liftie-<unique string>" (which has an AKS cluster of the same name). This resource group is not created if you chose to use a single existing resource group.
Azure Kubernetes Service (AKS) Cloudera creates an AKS cluster for each Cloudera Data Engineering Service. The underlying compute, network resources are managed by Azure, including:
  • Load balancer: Azure Load Balancer
  • Network security group
  • Public IP address(es)
  • Route table
  • Virtual machine scale set

For a list of supported VM types, refer to Cloudera Public Cloud service rates.

Log analytics workspace A logs analytics workspace is created for storing log data.
Azure Files Azure Files | Microsoft Azure contains job resources, application code, Apache Airflow DAG files and any other uploaded files.
Azure Database for MySQL Server Azure Database for MySQL is used for storing job related metadata, histories.

Azure resources created for Cloudera Operational Database

The following Azure resources are created for the Cloudera Operational Database service:

Resource Description
Resource Group If you chose for Cloudera to create multiple resource groups, a resource group is created which contains all of the nodes that comprise the Cloudera Operational Database database. This resource group is not created if you chose to use a single existing resource group.
Virtual Machines (VMs) A compute VM is created for each node in a Cloudera Operational Database database. The instance type and managed storage are automatically determined by Cloudera Operational Database. Azure network security groups are automatically configured as a part of environment creation to define inbound and outbound network access to the created instances.
ADLS Gen2 storage This existing blob storage account that you provided for the Data Lake to use for workload data storage is automatically used by the Cloudera Operational Database database for storage of data.