Overview of Azure resources used by Cloudera
The following Azure resources are used by Cloudera and Cloudera services.
Azure resources created for a Cloudera environment
When a Cloudera environment is created, a FreeIPA cluster and a Data Lake cluster are created.
The following Azure resources are created for FreeIPA (one per environment):
Resource | Description | Naming convention |
---|---|---|
Virtual Private Network (VNet) | If during environment creation you select to have a new VNet and subnets created, then
a new VNet and subnets are created on your Azure account. Alternatively, you can provide
your own existing VNet and subnets. In both cases (new and existing VPC), all compute resources that Cloudera provisions for the environment and Cloudera services are provisioned into the VNet specified during environment creation. |
Specified by customer |
Resource group for FreeIPA resources | If you chose for Cloudera to create multiple resource groups, a resource group is created to group all the resources created for FreeIPA. This resource group is not created if you chose to use a single existing resource group. | <env-name>-freeipa-<numeric-id> |
Virtual Machines (VMs) | During environment creation, two or three Standard_DS3_v2 VMs are provisioned for the FreeIPA HA server by default. The number of VMs depends on the selected Data Lake type. | <env-name>-freeipa-<numeric-id>m0 |
OS disk | An OS disk is provisioned for the FreeIPA VM. | <env-name>-freeipa-<numeric-id>m0 |
Network interface | One network interface card (NIC) is provisioned for the FreeIPA VM. | <env-name>-freeipa-<numeric-id>m0 |
Public IP address | If you choose to use public IPs, your VM is assigned a public IP address. | <env-name>-freeipa-<numeric-id>m0 |
Network security group | Network security groups define inbound and outbound access to the instances.If during environment creation you choose to have new security groups created, then they are created on your Azure account. Alternatively, you can provide your own existing security groups. | master0-<env-name>freeipa<numeric-id>sg |
ADLS Gen2 storage account for storing operating system images | By default, Cloudera creates an ADLS Gen2 storage
account that is used solely for storing operating system images. If required, you can optionally pre-create this account and copy the required images. |
The name is the concatenation of the following three elements:
For example, the name of a storage account in East US 2 region in subscription a9d4456e-349f-46f5-bc73-54a8d523e504 would be cbimgeu2a9d4456e349f46f5bc7354a8d523e504. |
Resource group for the ADLS account used for storing operating system images | If you chose for Cloudera to create multiple resource groups, a separate resource group is created for the ADLS Gen2 account mentioned above.This resource group is not created if you chose to use a single existing resource group. | cloudbreak-images |
In addition, the following resources are created for each Data Lake (one per environment):
Resource | Description | Naming convention |
---|---|---|
Resource group | If you chose for Cloudera to create multiple resource groups, two resource groups are created: One new resource group is created to group all the resources created for the Data Lake and another resource group is created for the database used by the Data Lake. These resource groups are not created if you coose to use a single existing resource group. | <dl-name><numeric-id> |
Virtual Machines (VMs) | VMs are provisioned for the Data Lake nodes:
|
<dl-name><numeric-id>[m0|i1] |
OS disks | An OS disk is provisioned for each VM. | <env-name>-freeipa-<numeric-id>m0 |
Attached disks | An attached disk (StandardSSD_LRS) is provisioned for each VM. | <dl-name><numeric-id>-[m0|i1]<index>-<timestamp> |
Network interface | One network interface card (NIC) is provisioned for each VM. | <dl-name><numeric-id>[m0|i1] |
Public IP address | If you choose to use public IPs, each of the VMs is assigned a public IP address. | <dl-name><numeric-id>[m0|i1] |
Network security groups | Network security groups define inbound and outbound access to the instances.If during environment creation you choose to have new security groups created, then they are created on your Azure account. | [master|idbroker]<dl-name><numeric-id>sg |
Availability set | One availability set is created for the master host group only. | <dl-name>-master-as |
Resource group for external DB | If you chose for Cloudera to create multiple resource groups, a resource group is created for the external database. This resource group is not created if you chooe to use a single existing resource group. | <env-name>-dbstck-<numeric-id> |
Azure Database for PostgreSQL server | An RDS instance is provisioned for Cloudera Manager, Ranger, and Hive MetaStore. When creating Flexible Server, Cloudera automatically chooses the latest generation of Standard_E4s instance family that is supported in the given region, for example, Standard_E4ds_v5, Standard_E4ds_v4 or Standard_E4s_v3 with 128 GB of storage. For more information, see Azure regions. If you choose to use Single Server, a database instance MO_Gen5_4 with 100 GB of storage) is provisioned. |
dbsrv-<numeric-id> |
ADLS Gen2 storage | Prior to registering your environment in Cloudera, you should create ADLS Gen2 storage containers as instructed in Cloudera documentation. | Specified by customer |
Managed identities | Prior to registering your environment in Cloudera, you should create managed identities as instructed in Cloudera documentation. | Specified by customer |
Azure resources created for Cloudera Data Hub
The following Azure resources are created for the Cloudera Data Hub service:
Resource | Description | Naming convention |
---|---|---|
Resource group | If you chose for Cloudera to create multiple resource groups, for each Cloudera Data Hub cluster, a new resource group is created to group all the resources created for the cluster. This resource group is not created if you chose to use a single existing resource group. | <dh-name><numeric-id> |
Virtual Machines (VMs) | A VM is created for each cluster node. The VM type varies depending on what you selected during Cloudera Data Hub cluster creation. For a list of supported VM types, refer to Cloudera Public Cloud service rates. | <dh-name><numeric-id><hostgroup abbr.><node index> |
OS disk | An OS disk is provisioned for each VM. | <dh-name><numeric-id>-osDisk<hostgroup abbr.><node index> |
Attached Disks | An attached disk is provisioned for each VM, as specified during Cloudera Data Hub cluster creation. The disk size is selected during cluster creation. | <dh-name><numeric-id>-<hostgroup abbr.><node index>-<disk counter>-<timestamp> |
Network interface | One network interface card (NIC) is provisioned for each VM. | <dh-name><numeric-id><hostgroup abbr.><node index> |
Public IP address | If you choose to use public IPs, each of the VMs is assigned a public IP address. | <dh-name><numeric-id><hostgroup abbr.><node index> |
Network security group | Network security groups define inbound and outbound access to the instances. If during environment creation you choose to have new security groups created, then they are created on your Azure account. | <hostgroup nm><dh-name><numeric-id>sg |
Availability set | If the "Hardware and Storage" Advanced Options were used, one availability set is created for each host group. Otherwise, one availability set is created only for the host groups that contain KNOX and/or OOZIE service. | <dh-name>-<hostgroup>-as |
Azure resources created for Cloudera Data Warehouse
The following Azure resources are created for the Cloudera Data Warehouse service:
Resource | Description |
---|---|
Resource Group | If you chose for Cloudera to create multiple resource groups, one resource group is created with the naming convention “<environment-id>-dwx-rg”. This resource group is not created if you chose to use a single existing resource group. |
Azure Kubernetes Service (AKS) | Cloudera creates an AKS cluster for each activated Cloudera Data Warehouse environment to host Kubernetes-based resources. The
underlying compute, network resources are managed by Azure, including:
For a list of supported VM types, refer to Cloudera Public Cloud service rates. |
Azure Database for PostgreSQL server | PostgreSQL database (General Purpose, Gen5, 4 vCore) is created for Cloudera Data Warehouse to store configuration data. |
Azure resources created for Cloudera AI
The following Azure resources are created for the Cloudera AI service:
Resource | Description |
---|---|
Resource groups | If you chose for Cloudera to create multiple resource groups, one resource group is created with the naming convention “liftie-<unique string>" (which has an AKS cluster of the same name). This resource group is not created if you chose to use a single existing resource group. |
Azure Kubernetes Service (AKS) | Cloudera creates an AKS cluster for each Cloudera AI
workbench to host Kubernetes-based resources. The underlying
compute, network resources are managed by Azure, including:
For a list of supported VM types, refer to Cloudera Public Cloud service rates. |
Log analytics workspace | A logs analytics workspace is created for storing log data. |
Azure Files Storage account | If you choose Azure Files NFS, you will need an existing Azure Files Storage account. |
Azure resources created for Cloudera DataFlow
The following Azure resources are created for the Cloudera DataFlow service:
Resource | Description |
---|---|
Resource groups | If you chose for Cloudera to create multiple resource groups, one resource group is created with the naming convention “liftie-<unique string>" (which has an AKS cluster of the same name). This resource group is not created if you chose to use a single existing resource group. |
Azure Kubernetes Service (AKS) | Cloudera creates an AKS cluster for the Cloudera DataFlow service. The underlying compute, network resources are
managed by Azure, including:
For a list of supported VM types, refer to Cloudera Public Cloud service rates. |
Log analytics workspace | A logs analytics workspace is created for storing log data. |
Azure Database for PostgreSQL | Azure Database for PostgreSQL is used for storing job-related metadata and histories. |
Azure resources created for Cloudera Data Engineering
The following Azure resources are created for the Cloudera Data Engineering service:
Resource | Description |
---|---|
Resource groups | If you chose for Cloudera to create multiple resource groups, one resource group is created with the naming convention “liftie-<unique string>" (which has an AKS cluster of the same name). This resource group is not created if you chose to use a single existing resource group. |
Azure Kubernetes Service (AKS) | Cloudera creates an AKS cluster for each Cloudera Data Engineering Service. The underlying compute, network resources are
managed by Azure, including:
For a list of supported VM types, refer to Cloudera Public Cloud service rates. |
Log analytics workspace | A logs analytics workspace is created for storing log data. |
Azure Files | Azure Files | Microsoft Azure contains job resources, application code, Apache Airflow DAG files and any other uploaded files. |
Azure Database for MySQL Server | Azure Database for MySQL is used for storing job related metadata, histories. |
Azure resources created for Cloudera Operational Database
The following Azure resources are created for the Cloudera Operational Database service:
Resource | Description |
---|---|
Resource Group | If you chose for Cloudera to create multiple resource groups, a resource group is created which contains all of the nodes that comprise the Cloudera Operational Database database. This resource group is not created if you chose to use a single existing resource group. |
Virtual Machines (VMs) | A compute VM is created for each node in a Cloudera Operational Database database. The instance type and managed storage are automatically determined by Cloudera Operational Database. Azure network security groups are automatically configured as a part of environment creation to define inbound and outbound network access to the created instances. |
ADLS Gen2 storage | This existing blob storage account that you provided for the Data Lake to use for workload data storage is automatically used by the Cloudera Operational Database database for storage of data. |