Prerequisites
Before deploying Cloudera, you should make sure that your cloud account meets the basic requirements and that you've installed a few prerequisites.
Next, you can follow the instructions below for deploying Cloudera.
This guide demonstrates how to deploy Cloudera on AWS, Azure, or GCP by using one of the Cloudera deployment templates.
The templates use Terraform, an open source Infrastructure as Code (IaC) software tool for defining and managing cloud or data center infrastructure. You interface the templates via a simple configuration file residing in a GitHub repository.
For an overview of best practices for deploying Cloudera, refer to Creating and managing Cloudera deployments.Before deploying Cloudera, you should make sure that your cloud account meets the basic requirements and that you've installed a few prerequisites.
Next, you can follow the instructions below for deploying Cloudera.
Setting up a Cloudera deployment involves cloning a GitHub repository, editing the configuration, and running Terraform commands to launch the deployment.
The cdp-tf-quickstarts repository contains Terraform resource files to quickly deploy Cloudera on cloud and associated pre-requisite cloud resources. It uses the Cloudera Terraform Modules provided by Cloudera to do this.
Clone this repository and navigate to the local directory with the cloned repository:
git clone https://github.com/cloudera-labs/cdp-tf-quickstarts.git cd cdp-tf-quickstarts
In the cloned repository, change to the required cloud provider directory - AWS, Azure, or GCP.
Edit the input variables in the configuration file as required:
Following is a sample configuration file indicating the values to be changed. The variables are explained after the sample. You should review and update all the variables.
# ------- Global settings ------- env_prefix = "<ENTER_VALUE>" # Required name prefix for cloud and Cloudera resources, e.g. cldr1 # ------- Cloud Settings ------- aws_region = "<ENTER_VALUE>" # Change this to specify Cloud Provider region, e.g. eu-west-1 # ------- Cloudera Environment Deployment ------- deployment_template = "<ENTER_VALUE>" # Specify the deployment pattern below. Options are public, semi-private or private
As a result of this step, your configuration file should look similar to the following:
# ------- Global settings ------- env_prefix = "test-env" # Required name prefix for cloud and Cloudera resources, e.g. cldr1 # ------- Cloud Settings ------- aws_region = "eu-west-1" # Change this to specify Cloud Provider region, e.g. eu-west-1 # ------- Cloudera Environment Deployment ------- deployment_template = "public" # Specify the deployment pattern below. Options are public, semi-private or private
The following tables explain the mandatory inputs that need to be provided in the configuration file.
Table 1: Mandatory inputs
Input | Description | Default value |
env_prefix |
A string prefix that will be used to name the cloud provider and Cloudera resources created. | Not set |
aws_region |
The AWS region in which the cloud prerequisites and Cloudera will be deployed. For example, eu-west-1. For a list of supported AWS regions, see Supported AWS regions. | Not set |
deployment_template |
The selected deployment pattern. Values allowed:
|
public |
The following tables explain the optional inputs that can be added to the configuration file. While the mandatory input attributes are included in the configuration file and only their values need to be entered, optional attributes and values must be added manually.
Table 2: Optional inputs
Input | Description | Default value |
aws_key_pair |
The name of an AWS keypair that exists in your account in the selected region. | Not set |
ingress_extra_cidrs_and_ports |
Inbound access to the UI and API endpoints of your deployment will be
allowed from the Enter your machine’s public IP here, with ports 443 and 22. If unsure, you can check your public IP address here. |
CIDRs are not set. Ports are set to 443, 22 by default. |
create_vpc |
Flag to specify if the VPC should be created. | true |
cdp_vpc_id |
VPC ID for Cloudera environment. Required
if create_vpc is false |
Empty string |
cdp_public_subnet_ids |
List of public subnet ids. Required if create_vpc is false. Can be
an empty list depending on deployment_template . |
Empty list |
cdp_private_subnet_ids |
List of private subnet ids. Required if create_vpc is
false. |
Empty list |
private_network_extensions |
Enable creation of resources for connectivity to Cloudera Control Plane (public subnet and NAT Gateway) for Private Deployment. Only relevant for private deployment template. | true |
env_tags |
Define environment-level tags for your resources, such as owner, project, and end date. For more information about custom tags, see the Defining custom tags documentation. Using the owner, project, and end date example, define the environment-level tags as follows: env_tags = { owner = "<ENTER_VALUE>" project = "<ENTER_VALUE>" enddate = "<ENTER_VALUE>" } |
Not set |
terraform init terraform apply
Terraform displays a plan with the list of cloud provider and Cloudera resources that will be created.
When you are prompted, type yes
to instruct Terraform to perform
the deployment. Typically, this takes about 60 minutes. Once the deployment is complete,
Cloudera will print output similar to the following:
Apply complete! Resources: 46 added, 0 changed, 0 destroyed.
You can navigate to the Cloudera web interface at https://cdp.cloudera.com/ and see your deployment progress. Once the deployment completes, you can create Cloudera Data Hub clusters and data services.
If you no longer need the infrastructure provisioned by Terraform, run the following command to remove the deployment infrastructure and terminate all resources:
terraform destroy