Hardware and storage

The "Hardware and storage" options allow you to customize the cloud provider specific cluster hardware and storage options.

The Hardware and Storage options can be selected for each host group. To edit this section for a specific host group, click on the . When done editing, click on the to save the changes. Repeat for these steps for all host groups that you would like to edit.

The following hardware and storage settings are available:

Parameter Description
Cloudera Manager Server You must select one node for Cloudera Manager Server by clicking the button. The “Instance Count” for that host group must be set to “1”. If you are using one of the default cluster templates, this is set by default.
Instance Type Select an instance type. For information about instance types on Azure refer to General Purpose Virtual Machine Sizes in Azure documentation.
Instance Count Enter the number of instances of a given type. Default is 1.
Storage Type Select the volume type. The options are: (1) Locally-redundant storage, (2) Geo-redundant storage, (3) Premium locally-redundant storage. For more information about these options refer to Introduction to Azure Storage in Azure documentation.
Attached Volumes Per Instance Enter the number of volumes attached per instance. Default is 1.
Volume Size Enter the size in GB for each volume. Default is 100.
Root Volume Size This option allows you to increase or decrease the root volume size. Default is 200 GB. This option is useful if your custom image requires more space than the default 200 GB. If you change the value of the root volume size, an osDisk with the given rootVolumeSize will be created for the instance automatically; however, you will have to manually resize the osDisk partition by using the steps provided in How to Resize Linux osDisk Partition on Azure in Azure documentation. If you use a custom Data Hub template specifying a root volume size smaller than 200GB, you may encounter an error.
Availability Sets

To support fault tolerance for VMs, Azure uses availability sets. This allows two or more VMs to be mapped to multiple fault domains, each of which defines a group of virtual machines that share a common power source and a network switch. When adding VMs to an availability set, Azure automatically assigns each VM a fault domain. The SLA includes guarantees that during OS Patching in Azure or during maintenance operations, at least one VM belonging to a given fault domain will be available.

In CDP, no availability sets are configured by default. To have availability sets configured, make sure to navigate to the "Hardware and Storage" tab in the Advanced Options during Data Hub creation. In this case, one availability set is configured during cluster creation for each host group with “Instance Count” that is set to 1 or larger. The assignment of fault domains is automated by Azure, so there is no option for this in CDP web UI.

After the deployment is finished, you can check the layout of the VMs inside an availability set on Azure Portal. You will find the “Availability set” resources corresponding to the host groups inside the deployment’s resource group.

Availability Set Name Choose a name for the availability set that will be created for the selected host group. The following convention is used: “clustername-hostgroupname-as”.
Fault Domain Count The number of fault domains. Default is 2 or 3, depending on the setting supported by Azure.
Update Domain Count This number of update domains. This can be set to a number in range of 2-20. Default is 20.