Azure Prerequisites

Microsoft Azure prerequisites for Cloudera Data Engineering (CDE).

1. Review the Azure account prerequisites for CDP

Refer to the Azure subscription requirements and make sure that the Azure account you are using for CDP has the required resources, and that you have the permissions required to manage these resources.

2. Review the CDE-specific Azure Resource Requirements

Following Azure Services are required to provision a CDE Service and virtual clusters:

Azure Services used by Cloudera Data Engineering (CDE)

  • Network – VNet and Subnets(see below for requirements)
  • Database – Azure Database for MySQL server
  • Compute – Azure Kubernetes Service (AKS)
  • Load Balancing - Azure Load Balancer
  • Virtual machine scale set
  • Storage account - CDE stores workload data and logs in Azure Data Lake Store Gen 2 (ADLS Gen2) environment storage account. The AKS service also generates a separate storage account for use with Azure Files.
  • Azure Files - Contain job resources, application code, Apache Airflow DAG files and any other uploaded files. The AKS service generates an ADLS Gen2 storage account for these files.
  • Log Analytics workspace

Refer to the Azure resources used by CDP to check the Azure resources used by CDP.

Vnet and Subnet Requirements

When registering an Azure environment in CDP, you will be asked to select a VNet and one or more subnets. Cloudera Data Engineering runs in the VNet registered in CDP as part of your Azure environment.

You have two options:

  • Use your existing VNet and subnets for provisioning CDP resources
  • Have CDP create a new VNet and subnets

Option 1: New VNet and subnets

If you would like CDP to create a new VNet, you will need to specify a valid CIDR in IPv4 range that will be used to define the range of private IPs for VM instances provisioned into these subnets. This must be a /16 CIDR, but you can customize the IP Range. The default is 10.10.0.0/16.

If you would like CDP to create a new VNet, you will need to specify a valid CIDR in IPv4 range that will be used to define the range of private IPs for VM instances provisioned into these subnets. This must be a /16 CIDR, but you can customize the IP Range. The default is 10.10.0.0/16.

CDP will divide this address range as follows:

  • 32 x /24 private subnet for ML and CDE
  • 3 x /19 private subnet for DW
  • 3 x /19 private subnet for Data Lake and Data Hub
  • 3 x /24 public subnet

Option 2: Existing VNet and subnets

VNet Requirements

If you would like to use your own VNet, it needs to fulfill the following requirements:

  • The VNet has at least one subnet
  • VNet should be able to make an outbound connection with the internet or set of CIDRs and ports provided by Cloudera

Subnet Requirements

Each CDE service requires its own subnet. CDE on AKS uses the Kubenet CNI plugin provided by Azure. In order to use Kubenet CNI, we need to create multiple smaller subnets when creating an Azure environment. It is recommended to partition the vnet with subnets that are just the right size to fit the expected max nodes in the cluster.

Cloudera recommends a /24 CIDR for these subnet, but if you would like to provide a custom range, the formula to calculate IP Addresses per CDE service is as follows:

  • Each CDE service can scale up to 100 compute nodes; each node consumes one IP address.
  • In addition, you need to allocate 3 IPs for the base infra nodes and 2 IP address per virtual cluster for the virtual cluster service nodes.

Related documents: VNet and subnets, VNet and subnet planning

3. Review the default Azure service limits and your current Azure account limits

Azure portal imposes default limits to the resources available to each user subscription, which may vary for different regions. Make sure you review your Azure subscription’s current usage status and resource limits before you start provisioning additional resources for CDP and CDE.

If you require more resources than the limit set by Azure, you can create a support request on your Azure Portal.

For example, To register an Azure environment in CDP, you may need to increase some of these limits for the region(s) that you are planning to use.CDP creates resources such as VMs in your Azure subscription. Depending on the number of clusters that CDP creates in your Azure subscription, you might need to raise the limits for certain resources such as VMs and vCPUs in your Azure subscription.

Related Azure quotas documentation: Azure subscription and service limits, quotas, and constraints.

4. Review supported Azure regions

CDP supports the following AWS regions: Supported Azure regions.

  • A single Azure environment registered in CDP corresponds to a single VNet located in a specific region, and all the resources deployed by CDP on Azure are deployed into that VNet.
  • Deploying clusters into the region containing the ADLS Gen2 containers that you want to access for input and output data, speeds up the data access.Therefore, when selecting the region to use, you should consider where your data is located.
  • CDP requires that the ADLS Gen2 storage location provided during environment registration is in the same region as the region selected for the environment.

If you need to use multiple regions, you need to register multiple environments, one per region.

Related Azure documentation: Azure geographies.

5. Set up an Azure Cloud Credential

You must create the Azure provisioning credential for CDP prior to registering an environment. The credential allows CDP to access and provision a set of resources in your Azure account.

When working with an Azure environment, you can use the app-based credential to authenticate your Azure account and obtain authorization to create resources on your behalf. The app-based credential allows you to manually configure the service principal created within the Azure Active Directory.

Instructions: Azure Credentials

6. Register an Azure Environment in CDP

Once you have met cloud provider requirements and have created the Azure provisioning credential for CDP, you may proceed to register an Azure Environment.

Instructions: Register an Azure environment

7. CDE Role Requirements

There are two CDP user roles associated with the CDE service: DEAdmin and DEUser. Any CDP user with the EnvironmentAdmin (or higher) access level must assign these roles to users who require access to the Cloudera Data Engineering console within their environment.

Furthermore, if you want to allow users to log in to provisioned workspaces and run workloads on them, this will need to be configured separately.

8. Set up to run Kubectl Commands

  1. Go to the three dots on top right side of the CDE UI to see a dropdown menu.
  2. Click on Download the Kube Config option and save it (For example, ~/.kube/cde-env1-kube-config).
  3. Run the following shell command.
    $ export KUBECONFIG=~/.kube/cde-env1-kube-config
  4. You should now be able to run Kubectl Commands

9. Browser Requirements

Supported browsers:

  • Chrome
  • Safari