Deploy Cloudera using Terraform

This guide demonstrates how to deploy Cloudera on AWS or Azure by using one of the Cloudera deployment templates.

The templates use Terraform, an open source Infrastructure as Code (IaC) software tool for defining and managing cloud or data center infrastructure. You interface the templates via a simple configuration file residing in a GitHub repository.

For an overview of best practices for deploying Cloudera, refer to Creating and managing Cloudera deployments.

Prerequisites

Prior to deploying Cloudera, you should make sure that your cloud account meets the basic requirements and that you've installed a few prerequisites.

To meet these requirements and install the prerequisites, refer to the following documentation:You should also familiarize yourself with the background information about CDP deployment patterns and deployment pattern definitions described in Creating and managing Cloudera deployments.

Next, you can follow the instructions below for deploying CDP.

Deploy Cloudera

Setting up a Cloudera deployment involves cloning a GitHub repository, editing the configuration, and running Terraform commands.

Step 1: Clone the repository

The cdp-tf-quickstarts repository contains Terraform resource files to quickly deploy Cloudera Public Cloud and associated pre-requisite cloud resources. It uses the Cloudera Terraform Modules provided by Cloudera to do this.

Clone this repository and navigate to the directory with the cloned repository:

git clone https://github.com/cloudera-labs/cdp-tf-quickstarts.git
cd cdp-tf-quickstarts

Step 2: Edit the configuration file for the required cloud provider

In the cloned repository, change to the required cloud provider directory. Currently AWS and Azure are available.

Next, edit the input variables in the configuration file as required:

cd aws
mv terraform.tfvars.template terraform.tfvars
vi terraform.tfvars
cd azure
mv terraform.tfvars.template terraform.tfvars
vi terraform.tfvars

Sample content of this file, with indicators of values to change are shown below. The variables are explained below the sample. You should review and update all the variables.

# ------- Global settings -------
env_prefix = "<ENTER_VALUE>" # Required name prefix for cloud and Cloudera resources, e.g. cldr1

# ------- Cloud Settings -------
aws_region = "<ENTER_VALUE>" # Change this to specify Cloud Provider region, e.g. eu-west-1

# ------- Cloudera Environment Deployment -------
deployment_template = "<ENTER_VALUE>"  # Specify the deployment pattern below. Options are public, semi-private or private
# ------- Global settings -------
env_prefix = "<ENTER_VALUE>" # Required name prefix for cloud and Cloudera resources, e.g. cldr1

# ------- Cloud Settings -------
azure_region = "<ENTER_VALUE>" # Change this to specify Cloud Provider region, e.g. eastus

# ------- Cloudera Environment Deployment -------
deployment_template = "<ENTER_VALUE>"  # Specify the deployment pattern below. Options are public, semi-private or private

As an outcome of this step, your configuration file should look similar to the following:

# ------- Global settings -------
env_prefix = "test-env" # Required name prefix for cloud and CDP resources, e.g. cldr1

# ------- Cloud Settings -------
aws_region = "eu-west-1" # Change this to specify Cloud Provider region, e.g. eu-west-1

# ------- Cloudera Environment Deployment -------
deployment_template = "public"  # Specify the deployment pattern below. Options are public, semi-private or private
# ------- Global settings -------
env_prefix = "test-env" # Required name prefix for cloud and CDP resources, e.g. cldr1

# ------- Cloud Settings -------
azure_region = "westeurope" # Change this to specify Cloud Provider region, e.g. eastus

# ------- Cloudera Environment Deployment -------
deployment_template = "public"  # Specify the deployment pattern below. Options are public, semi-private or private

The following tables explain the mandatory inputs that need to be provided in the configuration file.

Table 1: Mandatory inputs

Input Description Default value
env_prefix A string prefix that will be used to name the cloud provider and Cloudera resources created. Not set
aws_region The AWS region in which the cloud prerequisites and Cloudera will be deployed. For example, eu-west-1. For a list of supported AWS regions, see Supported AWS regions. Not set
deployment_template

The selected deployment pattern. Values allowed:

private, semi-private and public.

public
Input Description Default value
azure_region The Azure region in which the cloud prerequisites and Cloudera will be deployed. For example, eastus. For a list of supported Azure regions, see Supported Azure regions. Not set
env_prefix A string prefix that will be used to name the cloud provider and Cloudera resources created. Not set
deployment_template

The selected deployment pattern. Values allowed:

private, semi-private and public.

public

The following tables explain the optional inputs. The optional inputs can optionally be added to the configuration file. While the mandatory inputs are present in the configuration file and only their values need to be provided, the optional inputs should be added manually.

Table 2: Optional inputs

Input Description Default value
aws_key_pair The name of an AWS keypair that exists in your account in the selected region. Not set
ingress_extra_cidrs_and_ports

Inbound access to the UI and API endpoints of your deployment will be allowed from the CIDRs (IP ranges) and ports specified here.

Enter your machine’s public IP here, with ports 443 and 22. If unsure, you can check your public IP address here.

CIDRs are not set.

Ports are set to 443, 22 by default.

create_vpc Flag to specify if the VPC should be created true
cdp_vpc_id VPC ID for Cloudera environment. Required if create_vpc is false Empty string
cdp_public_subnet_ids List of public subnet ids. Required if create_vpc is false Empty list
cdp_private_subnet_ids List of private subnet ids. Required if create_vpc is false Empty list
private_network_extensions Enable creation of resources for connectivity to Cloudera Control Plane (public subnet and NAT Gateway) for Private Deployment. Only relevant for private deployment template true
Input Description Default value
public_key_text An SSH public key string to be used for the nodes of the Cloudera environment. Not set
ingress_extra_cidrs_and_ports

Inbound access to the UI and API endpoints of your deployment will be allowed from the CIDRs (IP ranges) and ports specified here.

Enter your machine’s public IP here, with ports 443 and 22. If unsure, you can check your public IP address here.

CIDRs are not set.

Ports are set to 443, 22 by default.

create_vnet Flag to specify if the VNet should be created true
cdp_resourcegroup_name Preexisting Azure resource group for Cloudera environment. Required if create_vnet is false Empty string
cdp_vnet_name VNet name for Cloudera environment. Required if create_vnet is false Empty string
cdp_subnet_names List of subnet names for Cloudera resources. Required if create_vnet is false Empty list
cdp_gw_subnet_ids List of subnet names for Cloudera Gateway. Required if create_vnet is false Empty list

Step 3: Launch the deployment

Run the Terraform commands to validate the configuration and launch the deployment with the following commands:
terraform init
terraform apply

Terraform will show a plan with the list of cloud provider and Cloudera resources that will be created.

When you are prompted, type yes to tell Terraform to perform the deployment. Typically, this will take about 60 minutes. Once the deployment is complete, Cloudera will print output similar to the following:

Apply complete! Resources: 46 added, 0 changed, 0 destroyed.

You can navigate to the Cloudera web interface at https://cdp.cloudera.com/ and see your deployment progressing. Once the deployment completes, you can create Cloudera Data Hub clusters and data services.

Clean up the Cloudera environment and infrastructure

If you no longer need the infrastructure that’s provisioned by Terraform, run the following command to remove the deployment infrastructure and terminate all resources:

terraform destroy