Deploy Cloudera using Terraform
This guide demonstrates how to deploy Cloudera on AWS or Azure by using one of the Cloudera deployment templates.
The templates use Terraform, an open source Infrastructure as Code (IaC) software tool for defining and managing cloud or data center infrastructure. You interface the templates via a simple configuration file residing in a GitHub repository.
For an overview of best practices for deploying Cloudera, refer to Creating and managing Cloudera deployments.Prerequisites
Prior to deploying Cloudera, you should make sure that your cloud account meets the basic requirements and that you've installed a few prerequisites.
Next, you can follow the instructions below for deploying CDP.
Deploy Cloudera
Setting up a Cloudera deployment involves cloning a GitHub repository, editing the configuration, and running Terraform commands.
Step 1: Clone the repository
The cdp-tf-quickstarts repository contains Terraform resource files to quickly deploy Cloudera Public Cloud and associated pre-requisite cloud resources. It uses the Cloudera Terraform Modules provided by Cloudera to do this.
Clone this repository and navigate to the directory with the cloned repository:
git clone https://github.com/cloudera-labs/cdp-tf-quickstarts.git
cd cdp-tf-quickstarts
Step 2: Edit the configuration file for the required cloud provider
In the cloned repository, change to the required cloud provider directory. Currently AWS and Azure are available.
Next, edit the input variables in the configuration file as required:
cd aws
mv terraform.tfvars.template terraform.tfvars
vi terraform.tfvars
cd azure
mv terraform.tfvars.template terraform.tfvars
vi terraform.tfvars
Sample content of this file, with indicators of values to change are shown below. The variables are explained below the sample. You should review and update all the variables.
# ------- Global settings -------
env_prefix = "<ENTER_VALUE>" # Required name prefix for cloud and Cloudera resources, e.g. cldr1
# ------- Cloud Settings -------
aws_region = "<ENTER_VALUE>" # Change this to specify Cloud Provider region, e.g. eu-west-1
# ------- Cloudera Environment Deployment -------
deployment_template = "<ENTER_VALUE>" # Specify the deployment pattern below. Options are public, semi-private or private
# ------- Global settings -------
env_prefix = "<ENTER_VALUE>" # Required name prefix for cloud and Cloudera resources, e.g. cldr1
# ------- Cloud Settings -------
azure_region = "<ENTER_VALUE>" # Change this to specify Cloud Provider region, e.g. eastus
# ------- Cloudera Environment Deployment -------
deployment_template = "<ENTER_VALUE>" # Specify the deployment pattern below. Options are public, semi-private or private
As an outcome of this step, your configuration file should look similar to the following:
# ------- Global settings -------
env_prefix = "test-env" # Required name prefix for cloud and CDP resources, e.g. cldr1
# ------- Cloud Settings -------
aws_region = "eu-west-1" # Change this to specify Cloud Provider region, e.g. eu-west-1
# ------- Cloudera Environment Deployment -------
deployment_template = "public" # Specify the deployment pattern below. Options are public, semi-private or private
# ------- Global settings -------
env_prefix = "test-env" # Required name prefix for cloud and CDP resources, e.g. cldr1
# ------- Cloud Settings -------
azure_region = "westeurope" # Change this to specify Cloud Provider region, e.g. eastus
# ------- Cloudera Environment Deployment -------
deployment_template = "public" # Specify the deployment pattern below. Options are public, semi-private or private
The following tables explain the mandatory inputs that need to be provided in the configuration file.
Table 1: Mandatory inputs
Input | Description | Default value |
env_prefix |
A string prefix that will be used to name the cloud provider and Cloudera resources created. | Not set |
aws_region |
The AWS region in which the cloud prerequisites and Cloudera will be deployed. For example, eu-west-1. For a list of supported AWS regions, see Supported AWS regions. | Not set |
deployment_template |
The selected deployment pattern. Values allowed:
|
public |
Input | Description | Default value |
azure_region |
The Azure region in which the cloud prerequisites and Cloudera will be deployed. For example, eastus. For a list of supported Azure regions, see Supported Azure regions. | Not set |
env_prefix |
A string prefix that will be used to name the cloud provider and Cloudera resources created. | Not set |
deployment_template |
The selected deployment pattern. Values allowed:
|
public |
The following tables explain the optional inputs. The optional inputs can optionally be added to the configuration file. While the mandatory inputs are present in the configuration file and only their values need to be provided, the optional inputs should be added manually.
Table 2: Optional inputs
Input | Description | Default value |
aws_key_pair |
The name of an AWS keypair that exists in your account in the selected region. | Not set |
ingress_extra_cidrs_and_ports |
Inbound access to the UI and API endpoints of your deployment will be
allowed from the Enter your machine’s public IP here, with ports 443 and 22. If unsure, you can check your public IP address here. |
CIDRs are not set. Ports are set to 443, 22 by default. |
create_vpc |
Flag to specify if the VPC should be created | true |
cdp_vpc_id |
VPC ID for Cloudera environment. Required
if create_vpc is false |
Empty string |
cdp_public_subnet_ids |
List of public subnet ids. Required if create_vpc is false |
Empty list |
cdp_private_subnet_ids |
List of private subnet ids. Required if create_vpc is false |
Empty list |
private_network_extensions |
Enable creation of resources for connectivity to Cloudera Control Plane (public subnet and NAT Gateway) for Private Deployment. Only relevant for private deployment template | true |
Input | Description | Default value |
public_key_text |
An SSH public key string to be used for the nodes of the Cloudera environment. | Not set |
ingress_extra_cidrs_and_ports |
Inbound access to the UI and API endpoints of your deployment will be
allowed from the Enter your machine’s public IP here, with ports 443 and 22. If unsure, you can check your public IP address here. |
CIDRs are not set. Ports are set to 443, 22 by default. |
create_vnet |
Flag to specify if the VNet should be created | true |
cdp_resourcegroup_name |
Preexisting Azure resource group for Cloudera
environment. Required if create_vnet is false |
Empty string |
cdp_vnet_name |
VNet name for Cloudera environment.
Required if create_vnet is false |
Empty string |
cdp_subnet_names |
List of subnet names for Cloudera resources.
Required if create_vnet is false |
Empty list |
cdp_gw_subnet_ids |
List of subnet names for Cloudera Gateway. Required
if create_vnet is false |
Empty list |
Step 3: Launch the deployment
terraform init
terraform apply
Terraform will show a plan with the list of cloud provider and Cloudera resources that will be created.
When you are prompted, type yes
to tell Terraform to perform the
deployment. Typically, this will take about 60 minutes. Once the deployment is complete, Cloudera will print output similar to the following:
Apply complete! Resources: 46 added, 0 changed, 0 destroyed.
You can navigate to the Cloudera web interface at https://cdp.cloudera.com/ and see your deployment progressing. Once the deployment completes, you can create Cloudera Data Hub clusters and data services.
Clean up the Cloudera environment and infrastructure
If you no longer need the infrastructure that’s provisioned by Terraform, run the following command to remove the deployment infrastructure and terminate all resources:
terraform destroy