Prerequisites for deploying CDP

To set up CDP via deployment automation using this guide, the following dependencies must be installed in your local environment:

  • AWS CLI version 1 or 2
  • Terraform versions 1.3.x or 1.4.x
  • jq
  • git
  • Python 3.8
  • Ansible core and jmespath
  • Ansible collections
  • cdpy
  • CDP CLI

The instructions below show you how to install AWS CLI and then configure additional local prerequisites. To simplify the cloud prerequisite and CDP deployment process we have automated most manual steps, but we require additional dependencies to be installed in the same local environment where you have installed the AWS CLI. The below code examples are provided for Mac OS and Linux (Ubuntu). The Linux examples should work on a Windows operating system using the Windows Subsystem for Linux (WSL) as well.

AWS CLI

You need the AWS CLI installed and pre-configured.

  1. Use the following commands to install the AWS CLI version 2 on MacOS via the Terminal app:

    curl "https://awscli.amazonaws.com/AWSCLIV2.pkg" -o "AWSCLIV2.pkg"
    sudo installer -pkg AWSCLIV2.pkg -target /
  2. Run the following command to confirm that the AWS CLI has been installed correctly:
    aws --version
  3. Before proceeding to the next step, create an AWS access key ID and secret access key.

  4. Run the aws configure command to quickly configure the AWS CLI to run commands on behalf of your AWS user or the AWS role that you are planning to use to deploy the cloud prerequisites:
    aws configure
    AWS Access Key ID [None]: Enter the key ID
    AWS Secret Access Key [None]: Enter the access key
    Default region name [None]: Enter the short name of an AWS region supported by CDP
    Default output format [None]: Enter json or leave empty
  5. To verify that the CLI has been set up correctly, run the following command:
    aws sts get-caller-identity
    This is the identity that you will use to set up the CDP prerequisites in your AWS account.

For a full reference of AWS CLI installation and configuration options, refer to Getting Started with the AWS CLI and Configuring the AWS CLI.

Terraform

The deployment automation framework is implemented as a Terraform module. To be able to use this, you need to download and install Terraform.

We provide one possible way for installing it for a MacOS system via terminal. For a full reference, follow the official Terraform installation guide. The module has been tested with Terraform versions 1.3.x and 1.4.x.

The easiest way to set up Terraform on MacOS is via the open source brew package manager. If you don't yet have brew installed, run:

/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

Next, set the brew command on your path by using:

export PATH=$PATH:/opt/homebrew/bin

With brew installed you can use it to install terraform and its required dependencies:

brew tap hashicorp/tap
brew install hashicorp/tap/terraform

Debian/Ubuntu

To install on Debian/Ubuntu based Linux distributions, first add the HashiCorp package key and repository:

wget -O- https://apt.releases.hashicorp.com/gpg | sudo gpg --dearmor -o /usr/share/keyrings/hashicorp-archive-keyring.gpg
echo "deb [signed-by=/usr/share/keyrings/hashicorp-archive-keyring.gpg] https://apt.releases.hashicorp.com $(lsb_release -cs) main" | sudo tee /etc/apt/sources.list.d/hashicorp.list

The Terraform package can then be installed using the following command:

sudo apt update && sudo apt install terraform

CentOS/RHEL

To install on CentOS/RHEL based Linux distributions add the HashiCorp repository using the following commands:

sudo yum install -y yum-utils
sudo yum-config-manager --add-repo https://rpm.releases.hashicorp.com/RHEL/hashicorp.repo

The install on CentOS/RHEL based Linux distributions, add the HashiCorp repository using the following command:

sudo yum -y install terraform

jq

Jq command line utility is used to parse JSON formatted output from the CDP CLI.

On MacOS, you can install jq using homebrew:
brew install jq

For Linux install, jq is in official Debian/Ubuntu and RHEL/CentOS repositories.

On Debian and Ubuntu, the package can be installed by using the following command:

sudo apt-get install jq

On RHEL/CentOS, the package can be installed by using the following command:

sudo dnf install jq

For more detailed instructions, see https://stedolan.github.io/jq/download/.

git

Run the following command to confirm that you have the git client installed:
git --version

If you need to install git, we recommend following the git installation instructions from GitHub.

Python and pip package installer

Python version 3.8 is required to run the Ansible playbooks for the CDP deployment. Prior to deploying CDP using the CDP deployment templates, you should install the following required packages and configure additional prerequisites. We recommend that these steps be performed within a Python virtual environment. See Python documentation on using virtual-environments for details on how to setup and use virtual environments.

  1. Check whether you have the right version installed and the python command is configured on your $PATH:
    python --version
    Depending on your system configuration, you may need to run:
    python3 --version
  2. Once you have python3 installed, make sure that the required directory is in your $PATH. To do this once, you can run:
    export PATH=$PATH:~/Library/Python/3.8/bin
    Ensure to include this command in your local zsh/bash profile ~/.zshrc, so that it is applied each time you launch a new terminal. We also recommend updating the pip package installer for Python 3 before continuing:
    pip3 install --upgrade pip
    If you find that pip is not yet installed on your machine, you can install it by using:
    curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py
    $ python3 get-pip.py --user
    To avoid conflicts with other packages installed on your machine, we recommend that you install the dependencies mentioned below in the Python virtual environment. For reference, here are the steps to create and activate the virtual environment using venv.
    1. Create the Python virtual environment:
      python3.8 -m venv ~/.venv/testing_zero_virtualenv
    2. Activate the virtual environment:
      source ~/.venv/testing_zero_virtualenv/bin/activate

Ansible core and jmespath

JMESpath is a Python package to operate and extract on Python and JSON data structures. Install the Ansible core and jmespath Python packages using the following command:
pip3 install ansible-core==2.12.10 jmespath==1.0.1

Ansible collections

The deployment of the CDP environment is performed using an Ansible playbook. This playbook uses the cloudera.cloud Ansible collection, which in turn interacts with the CDP API. It also uses the community.general Ansible collection, primarily to extract values from return values of Ansible tasks.

  1. Install the cloudera.cloud Ansible collection by using the following command:
    ansible-galaxy collection install git+https://github.com/cloudera-labs/cloudera.cloud.git,devel
  2. Install the community.general Ansible collection by using the following command:
    ansible-galaxy collection install community.general:==5.5.0

cdpy

Cdpy is a Python wrapper for the Cloudera CDP CLI. It’s designed for use with the Ansible framework.

Install cdpy using the following command:

pip3 install git+https://github.com/cloudera-labs/cdpy@main#egg=cdpy

Note that this in turn installs the CDP CLI.

Configure CDP CLI

The Cloudera CDP CLI Command Line Interface is used by the cdpy Python package to create the CDP deployment and also to automatically discover information such as the cross account IDs when creating the Cloud IAM policies. A set of access credentials are required to access these services via the CLI, as described in CDP documentation for steps to Generate the API access key.

Similarly to the way you set up the AWS CLI earlier, now configure the CDP CLI with a CDP access key ID and private key by using the following command:

cdp configure