Register an Azure environment

Once you’ve met the cloud provider requirements, register your Azure environment.

Before you begin

This assumes that you have already fulfilled the environment prerequisites described in Azure environment prerequisites.

Steps

  1. Navigate to the Management Console > Environments > Register environment.
  2. On the Register Environment page, provide the following information:
    Parameter Description

    General Information

    Environment Name

    Enter a name for your environment. This name will be used to refer to this environment in CDP.

    Description

    (Optional) Enter a description for your environment.

    Select Cloud Provider

    Select Azure.

    Microsoft Azure Credential

    Select Credential

    Select an existing credential or select Create new credential.

    To review the credentials options and requirements, refer to Credential options on Azure. For instructions on how to create a credential, refer to Create an interactive credential or Create an Azure credential documentation.

  3. Click Next.
  4. On the Data Lake Settings page, provide the following information:
    Parameter Description

    Data Lake Cluster Name

    Enter a name for the Data Lake cluster that will be created for this environment. This name will be used to refer to this Data Lake in CDP and on Azure Portal.

    Data Lake Version Select Cloudera Runtime version that should be deployed for your Data Lake. The latest stable version is used by default. All Data Hub clusters provisioned within this Data Lake will be using the same Runtime version.

    Scale

    Select Data Lake scale. By default, “Light Duty” is used.

    For more information on data lake scale, refer to Data Lake scale.

  5. Click Next.
  6. On the Region, Networking and Storage page, provide the following information:
    Parameter Description

    Region

    Select Region

    Select the region that you would like to use for accessing and provisioning resources from CDP.

    If you would like to use a specific existing virtual network, the virtual network must be located in the selected region.

    Network

    Select VNet

    You have two options:

    • Select the existing virtual network where you would like to provision clusters and/or other resources. Refer to VNet and subnets for requirements.
    • Select Create new network to have a new network with three subnets created.

    Select Subnet

    This option is only available if you choose to use an existing network. Multiple subnets must be selected, as described in VNet and subnets, and CDP distributes resources evenly within the subnets.

    Network CIDR

    This option is only available if you select to create a new network.

    If you selected to create a new network, provide Network CIDR that determines the range of private IPs that VMs will use. This must be a valid private IP CIDR IP in IPv4 range.

    For example 10.10.0.0/16 are valid IPs. /16 is required to allow for enough IP addresses.

    Create Private Subnets Select if you want communication to take place over private IPs.
    Enable Cluster Connectivity Manager Select to enable Cluster Connectivity Manager (CCM). You can use Cluster Connectivity Manager (CCM) to communicate with Data Lake and Data Hub workload clusters that are on private subnets. For more information about the required setup, refer to Cluster Connectivity Manager documentation.
    Don't Create Public Ip Enable this option to use private IPs instead of public IPs.
    Proxies
    Select Proxy Configuration Select a proxy configuration if previously registered. For more information refer to Setting up a proxy server.

    Security Access Settings

    Select Security Access Type

    This determines inbound security group settings that allow connections to the Data Lake and Data Hub clusters from your organization’s computers. You have two options:

    • Create new security groups - Allows you to provide custom CIDR IP range for all new security groups that will be created for the Data Lake and Data Hub clusters so that users from your organization can access cluster UIs and SSH to the nodes.

      This must be a valid CIDR IP in IPv4 range. For example: 192.168.27.0/24 allows access from 192.168.27.0 through 192.168.27.255. You can specify multiple CIDR IP ranges separated with a comma. For example: 192.168.27.0/24,192.168.28.0/24.

      If you use this setting, several security groups will get created: one for each Data Lake host group the Data Lake and one for each host group), one for each FreeIPA host group, and one for RDS; Furthermore, the security group settings specified will be automatically used for Data Hub, Data Warehouse, and Machine Learning clusters created as part of the environment.

    • Provide existing security groups (Only available for an existing VPC) - Allows you to select two existing security groups, one for Knox-installed nodes and another for all other nodes. If you select this option, refer to Security groups to ensure that you open all ports required for your users to access environment resources.

    SSH Settings

    New or existing SSH public key

    Upload a public key directly from your computer.

    Note: CDP does not use this SSH key. The matching private key can be used by your CDP administrator for root-level access to the instances provisioned for the Data Lake and Data Hub.

    Add tags You can optionally add tags to be created for your resources on Azure. Refer to Tags.

    Logs Storage and Audits

    Logger Identity

    Refer to ADLS Gen2 and managed identities.

    Logs Location Base

    Refer to ADLS Gen2 and managed identities.

    Ranger Audit Identity

    Refer to ADLS Gen2 and managed identities.

    Enable Workload Analytics

    Enables Workload Manager support for workload clusters created within this environment. When this setting is enabled, diagnostic information about job and query execution is sent to Workload Manager. This setting can be updated once the environment is running by navigating to environment details > Actions > Enable/Disable Workload Analytics.

    Telemetry The telemetry options allow diagnostic information and logs to be collected for troubleshooting purposes.

    Data Access

    Select the ADLS Gen2 location and managed identities created in ADLS Gen2 and managed identities.

    Assumer Identity

    Refer to ADLS Gen2 and managed identities.

    Storage Location Base

    Refer to ADLS Gen2 and managed identities.

    Data Access Identity

    Refer to ADLS Gen2 and managed identities.

    IDBroker Mappings

    We recommend that you leave this out and set it up after registering your environment as part of Onboarding CDP users and groups for cloud storage.  

    Add Tags Refer to Tags.
  7. Click on Register Environment to trigger environment registration.
  8. The environment creation takes about 60 minutes. The creation of the FreeIPA server and Data Lake cluster is triggered. You can monitor the progress from the web UI. Once the environment creation has been completed, its status will change to “Running”.

After you finish

After your environment is running, perform the following steps: