Registering Cloudera Hybrid Environments
Learn how to register a hybrid environment.
You must fulfill the following AWS, Azure, or GCP requirements:
- Your AWS account must have the right permissions as described in AWS account permissions.
- You must create the Cross-account Role ARN corresponding to the Cross Account Role as described in Creating cross-account access IAM role.
- You must create your credential to provision the environment as described in Creating a provisioning credential for AWS.
- The VPC and subnet requirements must be met as described in VPC and subnets.
- You must create or use your existing security groups as
described in Security groups.
- If you plan to create new Security Groups, you must have the required IP address range information.
- If you plan to use existing Security Groups, you must open all the required ports.
- If you plan to use Customer Managed Encryption Keys (CMEK), you must configure them as described in Customer managed encryption keys.
- You must create or use your existing SSH public key as
described in SSH key pair.
- If you plan to create a new SSH key, you must use an RSA or ED25519 public key. This will create a new EC2 key pair on the AWS side, and all cloud resources will use it for SSH authentication.
- If you plan to use an existing SSH key, you must refer to an existing AWS EC2 key pair. The Cloudera Control Plane will validate your key existence
- You must create an S3 bucket and set up the Logs location as described in AWS cloud storage prerequisites..
- Your Azure account must have the right permissions as described in Azure subscription requirements.
- You must create a custom role with the required set of permissions as described in Azure credential prerequisites.
- You must create your credential to provision the environment as described in Creating a provisioning credential for Azure.
- You must create or let Cloudera create resource groups for the environment as described in Resource groups.
- The VNET and subnet requirements must be met as described in VNet and subnets.
- You must configure Azure Flexible Server as described in Private setup for Azure Flexible Server.
- If you plan to use Customer Managed Encryption Keys (CMEK), you must configure them as described in Encrypting Azure resources with customer managed keys.
- You must create or use your existing security groups as
described in Network security groups.
- If you are planning to create new Security Groups, you must have the required IP address range information.
- If you are planning to use existing Security Groups, you must open all the required ports.
- You must create or use your existing SSH public key as described in SSH key pair.
- You must create an ADLS Gen2 storage and set up the Logs location as described in Azure cloud storage prerequisites.
- Your Google account must have the right permissions as described in GCP permissions.
- You must create a Google project as described in GCP project.
- You must create a service account with the required set of permissions as described in Service account for credential.
- You must create your credential to provision the environment as described in Creating a GCP credential.
- The VPC and subnet requirements must be met as described in VPC network and subnet.
- If you plan to use Customer Managed Encryption Keys (CMEK), you must configure them as described in Customer managed encryption keys.
- You must create or use your existing SSH public key as described in SSH key pair.
- You must create a Google storage bucket and set up the Logs location as described in GCP cloud storage prerequisites.
- EnvironmentCreator
- Go to Cloudera Management Console.
- Select Environments.
- Click Register environment.
- In the Purpose section, select the Hybrid Cloud Environment option.
-
Enter the following general information for the new hybrid environment:
Environments page General Information section Environment Name Enter a name for the new hybrid environment. Description (optional) Enter a short description for the new hybrid environment. Select Cloud Provider Select the cloud provider of your choice. - If you already have a credential set up, select it from the drop-down list.
-
If you must create new credentials, enter or select the following
information:
Environments page Amazon Web Services Credential section Name Enter a name for the new credential. Description (optional) Enter a short description for the new credential. Enable Permission Verification Click this toggle to have Cloudera check permissions for your credential. Cloudera will verify that you have the required permissions for your environment. Default | Minimal Select whether to use Default or Minimal role.
Use the provided JSON to create the AWS IAM policy.
Use Minimal role for a general hybrid environment. Use Default if you plan to use Data Services.
Service Manager Account ID
External ID
Use the provided IDs to create the AWS IAM role. Cross-account Role ARN Enter the cross-account ARN role. SHOW CLI COMMAND (optional) Click this button to display the command required to create the credential from the CLI. Create Credential Click this button to create the new credential. Click Next to proceed to the Region, Networking and Security step.
Region, Networking and Security page Region, Location section Select Region Select the region for the new environment. Network section Select Network Select the existing virtual network where you want to provision all Cloudera resources. For more information, see VPC and subnet. Select Subnets Select existing subnets within the selected VPC. For more information, see VPC and subnet. Enable Cluster Connectivity Manager (CCM) Enable or disable Cluster Connectivity Manager. Cluster Connectivity Manager allows Cloudera to communicate with Cloudera Data Hub clusters and on-premises classic clusters that are on private subnets. For more information, see the Cluster Connectivity Manager documentation. Enable Endpoint Access Gateway When the Cluster Connectivity Manager is enabled, you can optionally enable Endpoint Access Gateway to provide secure connectivity to UIs and APIs in Cloudera Data Hub clusters deployed using private networking.
From the Select Subnets drop-down list for the Endpoint Access Gateway, select the public subnets for which you want to use the gateway. The number of subnets must be the same as selected under Select Subnets and the availability zones must match.
For more information, see the Public Endpoint Access Gateway section.
Encryption section Enable Customer Managed Keys Enable this if you want to provide a Customer-Managed Key (CMK) to encrypt the environment disks and databases. For more information, see Customer managed encryption keys. Proxies section Select Proxy Configuration Select one of the following options:- Do Not Use Proxy Configuration
- Create New Proxy
Configuration
Select this option, if you ant Cloudera to automatically create security groups for you and open them to the CIDR range specified.
Enter the following information for the new proxy configuration:
- Name
- Description (optional)
- Protocol
- Server Host
- Server Port
- No Proxy Hosts
- Inbound Proxy CIDR
- Username
- Password
- Existing proxy
configuration
You must open all the required ports if you want to use your existing security groups.
For more information, see Setting up a proxy server.
Security Access Settings Select one of the following options to determine inbound security group settings that allow connections to the Cloudera Data Hub clusters from your organization’s computers:- Create New Security
Groups
If you want Cloudera to automatically create security groups for you and open them to the specified CIDR range.
- Access CIDR
Enter a custom CIDR IP range for all new security groups that will be created for the Cloudera Data Hub clusters.
- Access CIDR
-
Select Existing Security Groups
If you want to use your existing security groups, you must open all the required ports. Refer to Security groups to ensure that you open all ports required for your users to access environment resources.
- Select Existing Security Group for Gateway Nodes.
- Select Existing Security Group as default.
SSH Settings section New SSH public key Enter a new SSH public key. Existing SSH public key Enter the name of an existing EC2 key pair name with your desired SSH key pair. Add tags section Add tags (optional) Add tags to be created for your resources on AWS. For more information, see Defining custom tags. Advanced Options section Network And Availability Click the Enable Multiple Availability Zones for FreeIPA toggle to enable multiple availability zones for FreeIPA For more information, see Deploying Cloudera in multiple AWS availability zones. Hardware And Storage Enter FreeIPA nodes instance types. Click the edit icon in the top right corner and select the instance type from the drop-down list. For more information on instance types, see Amazon EC2 instance types. Cluster Extensions You can optionally select and attach previously registered recipes to run on a specific FreeIPA host group. For more information, see Recipes. Security Select one of the following SELinux modes based on your requirements: - Permissive
- Enforcing
Click Next to proceed to the Storage step.
Storage page Logs section Logger Instance Profile Select the IAM instance profile (or IAM role) that provides Cloudera with write access to the S3 logs data location. Logs Location Base Provide a path to an existing S3 bucket or a directory within an existing S3 bucket where log data will be stored. Backup Location Base (optional) Provide a path to an existing S3 bucket or a directory within an existing S3 bucket where IPA backups will be stored.
If none is provided, the log location will be used.
Telemetry section Enable Cloudera Observability (optional) When this is enabled, diagnostic information about job and query execution is sent to Cloudera Observability for Cloudera Data Hub clusters. For more information, see Enabling workload analytics and logs collection. Register Environment page Microsoft Azure Credential section Name Enter a name for the new credential. Description (optional) Enter a short description for the new credential. Default | Minimal Select whether to use Default or Minimal role.
Use the provided JSON to create the AWS IAM policy.
Use Minimal role for a general hybrid environment. Use Default if you plan to use Data Services.
Command 1 Use the provided command in the Azure Shell to associate the new certificate with the service principal. Command 2 Use the provided command in the Azure Shell to identify your Subscription ID and Tenant ID. Show CLI Command (optional) Click this button to display the command required to create the credential from the CLI. Create Credential Click this button to create the credential. Click Next to proceed to the Region, Networking and Security step.
Region, Networking and Security page Region, Location section Select Region Select the region for the new environment. Resource Group section Select Resource Group Select one of the following options:
- Select an existing resource group to have all Cloudera resources provisioned into that resource group.
- Select the Create new resource groups option to have Cloudera create multiple resource groups.
Network section Select Network Select the existing virtual network where you want to provision all Cloudera resources. For more information, see VPC and subnet. Select Subnets This option is only available if you choose to use an existing network. Multiple subnets must be selected and Cloudera distributes resources evenly within the subnets. Enable Cluster Connectivity Manager (CCM) Enable or disable Cluster Connectivity Manager. Cluster Connectivity Manager allows Cloudera to communicate with Cloudera Data Hub clusters and on-premises classic clusters that are on private subnets. For more information, see the Cluster Connectivity Manager documentation. Enable Endpoint Access Gateway When Cluster Connectivity Manager is enabled, you can optionally enable Public Endpoint Access Gateway to provide secure connectivity to UIs and APIs in Cloudera Data Hub clusters deployed using private networking. If you are using your existing VNET, from the Select Endpoint Access Gateway Subnets drop-down list, select the public subnets for which you want to use the gateway. The number of subnets must match that set under Select Subnets, and the availability zones must match. For more information, see Public Endpoint Access Gateway.
Create Public IPs This option is disabled by default when Cluster Connectivity Manager is enabled. It is enabled by default when Cluster Connectivity Manager is disabled. Database section Database Select one of the following options:- Flexible Server
- Flexible Server with
Private Link
You must select the Private DNS Zone for the database from the drop-down menu.
- Flexible Server with Delegated Subnet
For more information on Flexible Servers, see Using Azure Database for PostgreSQL Flexible Server.
Encryption section Enable Encryption at Host Azure Encryption at Host is a security feature that provides end-to-end encryption for your Virtual Machine (VM) data. Unlike standard encryption that happens at the storage layer, this feature ensures that data is encrypted the moment it is processed by the physical server (the host) where your VM is running. Enable Customer Managed Keys Enable this option if you want to provide a Customer-Managed Key (CMK) to encrypt the environment's disks and databases. For more information, see Customer managed encryption keys. Proxies section Select Proxy Configuration Select one of the following options:- Do Not Use Proxy Configuration
- Create New Proxy
Configuration
Select this option if you want Cloudera to automatically create security groups for you and open them to the specified CIDR range.
Enter the following information for the new proxy configuration:
- Name
- Description (optional)
- Protocol
- Server Host
- Server Port
- No Proxy Hosts
- Inbound Proxy CIDR
- Username
- Password
- Select existing proxy
configuration
You must open all the required ports if you want to use your existing security groups.
For more information, see Setting up a proxy server.
Security Access Settings Select one of the following options to determine inbound security group settings that allow connections to the Cloudera Data Hub clusters from your organization computers:- Create New Security
Groups
If you want Cloudera to automatically create security groups for you and open them to the CIDR range specified.
- Access CIDR
Enter a custom CIDR IP range for all new security groups that will be created for the Cloudera Data Hub clusters.
- Access CIDR
-
Select Existing Security Groups
If you want to use your existing security groups. In this case, you must open all the required ports. Refer to Security groups to ensure that you open all ports required for your users to access environment resources.
- Select Existing Security Group for Gateway Nodes.
- Select Existing Security Group as default.
SSH Settings section New SSH public key Enter a new SSH public key. Existing SSH public key Enter the name of an existing SSH key pair. Add tags section Add tags (optional) Add tags to be created for your resources on Azure. For more information, see Defining custom tags. Advanced Options section Network And Availability Click the Enable Multiple Availability Zones for FreeIPA toggle to enable multiple availability zones for FreeIPA. For more information, see Deploying Cloudera in multiple Azure availability zones. Hardware And Storage You can specify an instance type for each host group. For more information on instance types, see Sizes for virtual machines in Azure. Cluster Extensions You can optionally select and attach previously registered recipes to run on FreeIPA nodes. For more information, see Recipes. Security Select one of the following SELinux modes based on your requirements:
- Permissive
- Enforcing
Click Next to proceed to the Storage step.
Storage page Logs section Logger Instance Profile The logger requires Storage Blob Data Contributor role on the provided storage account. Logs Location Base Provide your filesystem and storage account name in a filesystem@storageaccountname.dfs.core.windows.net[/subfolders] format where data will be stored. - Filesystem must already exist.
- The storage account must be Storage V2.
- Subfolders are optional.
Backup Location Base (optional) Provide your filesystem and storage account name in a filesystem@storageaccountname.dfs.core.windows.net[/subfolders] format where IPA backups will be stored. - Filesystem must already exist.
- The storage account must be Storage V2.
- Subfolders are optional.
Telemetry section Enable Cloudera Observability (optional) When this is enabled, diagnostic information about job and query execution is sent to Cloudera Observability for Data Hub clusters. For more information, see Enabling workload analytics and logs collection. Register Environment page Google Cloud Platform Credential section Name Enter a name for the new credential. Description (optional) Enter a short description for the new credential. Default | Minimal Select whether to use Default or Minimal role.
Use the provided commands to create a service account through the Google Cloud SDK or Google Cloud Shell.
Use Minimal role for a general hybrid environment. Use Default if you plan to use Data Services.
Upload file Use the Upload file button to upload a service account private key in JSON format. Show CLI Command (optional) Click this button to display the command required to create the credential from the CLI. Create Credential Click this button to create the credential. Click Next to proceed to the Region, Networking and Security step.
Region, Networking and Security page Region, Location section Select Region Select the region for the new environment. Select Zone Select the zone within the selected region. Network section Use Shared VPC Shared VPC allows an organization to connect resources from multiple projects to a common Virtual Private Cloud (VPC) network, so that they can communicate with each other securely and efficiently using internal IPs from that network. When you use Shared VPC, you designate a project as a host project and attach one or more service projects to it. The VPC networks in the host project are called Shared VPC networks. Eligible resources from service projects can use subnets in the Shared VPC network. For more information, see https://cloud.google.com/vpc/docs/shared-vpc Select Network Select the existing VPC where you want to provision all Cloudera resources. For more information, see VPC and subnet. Select Subnets Select at least one subnet within the selected VPC. For more information, see VPC and subnet. Create Private Subnets This option is only available if you select to have a new network and subnets created. It is turned on by default so that private subnets are created in addition to public subnets. If you disable it, only public subnets will be created.
For production deployments, Cloudera recommends using private subnets. Work with your internal IT teams to ensure that users can access the browser interfaces for cluster services.
Enable Cluster Connectivity Manager (CCM) Enable or disable .Cluster Connectivity Manager Cluster Connectivity Manager allows Cloudera to communicate with Cloudera Data Hub clusters and on-premises classic clusters that are on private subnets. For more information, see the Cluster Connectivity Manager documentation. Enable Endpoint Access Gateway When Cluster Connectivity Manager is enabled, you can optionally enable Public Endpoint Access Gateway to provide secure connectivity to UIs and APIs in Cloudera Data Hub clusters deployed using private networking. If you are using your existing VPC, under Select Endpoint Access Gateway Subnets, select the public subnets for which you want to use the gateway. The number of subnets must match that set under Select Subnets, and the availability zones must match. For more information, see Public Endpoint Access Gateway.
Create Public IPs This option is disabled by default when Cluster Connectivity Manager is enabled. It is enabled by default when Cluster Connectivity Manager is disabled. Encryption section Enable Customer Managed Keys Enable this if you want to provide a Customer-Managed Key (CMK) to encrypt the environment's disks and databases. For more information, see Customer managed encryption keys. Proxies section Select Proxy Configuration Select one of the following options:- Do Not Use Proxy Configuration
- Create New Proxy
Configuration
If you want Cloudera to automatically create security groups for you and open them to the specified CIDR range.
Enter the following information for the new proxy configuration:
- Name
- Description (optional)
- Protocol
- Server Host
- Server Port
- No Proxy Hosts
- Inbound Proxy CIDR
- Username
- Password
- Existing proxy
configuration
If you want to use your existing security groups. In this case, you must open all required ports.
For more information, see Setting up a proxy server.
Security Access Settings Select one of the following options:- Do not create firewall
rule
Select this option if you are using a shared VPC and have already set the firewall rules directly on the VPC.
- Provide existing firewall
rules
If not all of your firewall rules are set directly on the VPC, provide the previously created firewall rules for SSH and UI access. You must select two existing firewall rules, one for Knox gateway-installed nodes and another for all other nodes. You might select the same firewall rule in both places if needed.
For information on required ports, see Firewall rules.
SSH Settings section New SSH public key Enter a new SSH public key. Existing SSH public key Enter the name of an existing SSH key pair. Add tags section Add tags (optional) Add tags to be created for your resources on GCP. For more information, see Defining custom tags. Advanced Options section Network And Availability Click the Enable Multiple Availability Zones for FreeIPA toggle to enable multiple availability zones for FreeIPA. For more information, see Deploying Cloudera In Multiple GCP Availability Zones. Hardware And Storage You can specify an instance type for each host group. For more information on instance types, see Sizes for virtual machines in Azure. Cluster Extensions You can optionally select and attach previously registered recipes to run on FreeIPA nodes. Security Select one of the following SELinux modes based on your requirements:
- Permissive
- Enforcing
Click Next to proceed to the Storage step.
Storage page Logs section Logger Service Profile Select the service account that provides Cloudera with write access to the Google Cloud Storage (GCS) location where logs will be stored. Logs Location Base Provide a path to an existing GCS bucket or a directory within an existing GCS bucket where data will be stored. For more information, see Minimum setup for cloud storage. Backup Location Base (optional) Provide a path to an existing GCS bucket or a directory within an existing GCS bucket where FreeIPA backups will be stored. For more information, see Minimum setup for cloud storage. Telemetry section Enable Cloudera Observability (optional) When this is enabled, diagnostic information about job and query execution is sent to Cloudera Observability for Cloudera Data Hub clusters. For more information, see Enabling workload analytics and logs collection. - Click Register Environment to finish the hybrid environment registration process.
After your environment is running, perform the following steps:
- Assign roles to users and groups to grant them access to the environment, and perform user synchronization. For instructions, see Enabling admin and user access to environments.
- Onboard your users and groups for cloud storage. For instructions, see Onboarding Cloudera users and groups for cloud storage.
