Register an Azure environment from CDP UI
Once you’ve met the Azure cloud provider requirements, register your Azure environment.
Before you begin
This assumes that you have already fulfilled the environment prerequisites described in Azure requirements.
Required role: EnvironmentCreator
- Navigate to the Management Console > Environments > Register environment:
- On the Register Environment page, provide the following information:
Parameter Description General Information Environment Name (Required) Enter a name for your environment. The name:
- Must be between 5 and 28 characters long.
- Can only include lowercase letters, numbers, and hyphens.
- Must start with a lowercase letter.
Description Enter a description for your environment. Select Cloud Provider (Required) Select Azure. Microsoft Azure Credential (Required) Select Credential Select an existing credential or select Create new credential.
For instructions on how to create a credential, refer to Create an app-based credential.
- Click Next.
- On the Data Access and Data Lake Scaling page, provide the following information:
Parameter Description Data Lake Settings Data Lake Name (Required) Enter a name for the Data Lake cluster that will be created for this environment. The name:
- Must be between 5 and 100 characters long
- Must contain lowercase letters
- Cannot contain uppercase letters
- Must start with a letter
- Can only include the following accepted characters are: a-z, 0-9, -.
Data Lake Version (Required) Select Cloudera Runtime version that should be deployed for your Data Lake. The latest stable version is used by default. All Data Hub clusters provisioned within this Data Lake will be using the same Runtime version. Fine-grained access control on ADLS Gen2 Enable Ranger authorization for ADLS Gen2 Identity If you would like to use Fine-grained access control, enable this option and then select the Ranger RAZ managed identity created in the Minimal setup for cloud storage. Data Access and Audit Assumer Identity (Required) Select the Assumer managed identity created in Minimal setup for cloud storage. Storage Location Base (Required) Provide the ADLS Gen2 location created for data storage in Minimal setup for cloud storage. Data Access Identity (Required) Select the Data Lake Admin managed identity created in Minimal setup for cloud storage. Ranger Audit Identity (Required) Select the Ranger Audit managed identity created in Minimal setup for cloud storage. IDBroker Mappings We recommend that you leave this out and set it up after registering your environment as part of Onboarding CDP users and groups for cloud storage. Scale (Required) Select Data Lake scale. By default, “Light Duty” is used.
For more information on data lake scale, refer to Data Lake scale.
- Click on Advanced Options to make additional configurations for your Data Lake. The
following options are available:
Parameter Description Cluster Extensions Recipes You can optionally select and attach previously registered recipes to run on a specific Data Lake host group. For more information, see Recipes.
- Click Next.
- On the Region, Networking and Security page, provide the following information:
Parameter Description Region Select Region (Required) Select the region that you would like to use for accessing and provisioning resources from CDP.
If you would like to use a specific existing virtual network, the virtual network must be located in the selected region.
Resource Group Select Resource Group (Required) You have two options:
- Select one existing resource group. If you select this, all CDP resources will be provisioned into that resource group.
- Select Create new resource groups to have CDP create multiple resource groups.
Customer Managed Encryption Keys Enable Customer-Managed Keys Enable this if you would like to provide a Customer-Managed Key (CMK) to encrypt environment's disks and databases. For more information, refer to Customer managed encryption keys. Select Encryption Key Resource Group Select the resource group where the CMK is located. Encryption key URL Provide the URL of the key value where the CMK resides. This is the same as the key identifier that you can copy directly from Azure Portal. Network Select Network (Required) You have two options:
- Select the existing virtual network where you would like to provision all CDP resources. Refer to VNet and subnets.
- Select Create new network to have a new network with three subnets created.
Select Subnets (Required) This option is only available if you choose to use an existing network. Multiple subnets must be selected and CDP distributes resources evenly within the subnets. Network CIDR (Required) This option is only available if you select to create a new network.
If you selected to create a new network, provide Network CIDR that determines the range of private IPs that VMs will use. This must be a valid private IP CIDR IP in IPv4 range.
For example 10.10.0.0/16 are valid IPs. /16 is required to allow for enough IP addresses.
Create Private Subnets This option is only available if you select to have a new network and subnets created. Is is turned on by default so that private subnets are created in addition to public subnets. If you disable it, only public subnets will be created. Enable CCM (Cluster Connectivity Manager) This option is enabled by default. You can disable it if you do not want to use CCM. You can use Cluster Connectivity Manager (CCM) for communication with Data Lake and Data Hub workload clusters that are on private subnets. For more information about the required setup, refer to Cluster Connectivity Manager documentation. Enable Public Endpoint Access Gateway When CCM is enabled, you can optionally enable Public Endpoint Access Gateway to provide secure connectivity to UIs and APIs in Data Lake and Data Hub clusters deployed using private networking.
If you are using your existing VPC, under Select Endpoint Access Gateway Subnets, select the public subnets for which you would like to use the gateway. The number of subnets must be the same as under Select Subnets and the availability zones must match. For more information, refer to Public Endpoint Access Gateway documentation.
Create Private Endpoints By default, the PostgreSQL Azure database provisioned for your Data Lake is reachable via a service endpoint (public IP address). To increase security, you can optionally select to have it reachable via a private endpoint instead of a service endpoint.
If you select to create a private endpoint and you are using your own VNet, you have two options:
Select “Create new private DNS zone” and CDP creates and manages a private DNS zone for you in the provided existing resource group.
Select your existing private DNS zone.
If you select to create a private endpoint and you would like for CDP to create a new VNet, CDP creates a private DNS zone for you.For more information, refer to Private endpoint for Azure Postgres.
Create Public IPs This option is disabled by default when CCM is enabled and enabled by default when CCM is disabled. Proxies Select Proxy Configuration Select a proxy configuration if previously registered. For more information refer to Setting up a proxy server. Security Access Settings Select Security Access Type (Required) This determines inbound security group settings that allow connections to the Data Lake and Data Hub clusters from your organization’s computers. You have two options:
- Create new security groups - Allows you to provide custom CIDR IP
range for all new security groups that will be created for the Data Lake and
Data Hub clusters so that users from your organization can access cluster UIs
and SSH to the nodes.
This must be a valid CIDR IP in IPv4 range. For example: 192.168.27.0/24 allows access from 192.168.27.0 through 192.168.27.255. You can specify multiple CIDR IP ranges separated with a comma. For example: 192.168.27.0/24,192.168.28.0/24.
If you use this setting, several security groups will get created: one for each Data Lake host group the Data Lake and one for each host group), one for each FreeIPA host group, and one for RDS; Furthermore, the security group settings specified will be automatically used for Data Hub, Data Warehouse, and Machine Learning clusters created as part of the environment.
- Provide existing security groups (Only available for an existing VPC) - Allows you to select two existing security groups, one for Knox-installed nodes and another for all other nodes. If you select this option, refer to Security groups to ensure that you open all ports required for your users to access environment resources.
SSH Settings New SSH public key (Required) Upload a public key directly from your computer. Add tags You can optionally add tags to be created for your resources on Azure. Refer to Defining custom tags.
- Click Next.
- On the Storage page, provide the following information:
Parameter Description Logs Logger Identity (Required) Select the Logger managed identity created in Minimal setup for cloud storage. Logs Location Base (Required) Provide the ADLS Gen2 location created for log storage in Minimal setup for cloud storage. Backup Location Base Provide the ADLS Gen2 location created for FreeIPA and Data Lake backups in Minimal setup for cloud storage. If not provided, the default Backup Location Base uses the Logs Location Base. Telemetry Enable Workload Analytics Enables Workload Manager support for workload clusters created within this environment. When this setting is enabled, diagnostic information about job and query execution is sent to Workload Manager. For more information, refer to Enabling workload analytics and logs collection. Enable Deployment Cluster Logs Collection When this option is enabled. the logs generated during deployments will be automatically sent to Cloudera. For more information, refer to Enabling workload analytics and logs collection.
- Click on Register Environment to trigger environment registration.
- The environment creation takes about 60 minutes. The creation of the FreeIPA server and Data Lake cluster is triggered. You can monitor the progress from the web UI. Once the environment creation has been completed, its status will change to “Running”.
After you finish
After your environment is running, perform the following steps:
- You must assign roles to specific users and groups for the environment so that selected users or user groups can access the environment. Next, you need to perform user sync. For steps, refer to Enabling admin and user access to environments.
- You must onboard your users and/or groups for cloud storage. For steps, refer to Onboarding CDP users and groups for cloud storage.
- You must create Ranger policies for your users. For instructions on how to access your Data Lake, refer to Accessing Data Lake services. Once you've accessed Ranger, create Ranger policies to determine which users have access to which databases and tables.