Prerequisites for deploying CDP
To set up CDP via deployment automation using this guide, the following dependencies must be installed in your local environment:
- AWS CLI version 1 or 2
- Terraform versions 1.3.x or 1.4.x
- jq
- git
- Python 3.8
- Ansible core and jmespath
- Ansible collections
- cdpy
- CDP CLI
The instructions below show you how to install AWS CLI and then configure additional local prerequisites. To simplify the cloud prerequisite and CDP deployment process we have automated most manual steps, but we require additional dependencies to be installed in the same local environment where you have installed the AWS CLI. The below code examples are provided for Mac OS and Linux (Ubuntu). The Linux examples should work on a Windows operating system using the Windows Subsystem for Linux (WSL) as well.
AWS CLI
You need the AWS CLI installed and pre-configured.
-
Use the following commands to install the AWS CLI version 2 on MacOS via the Terminal app:
curl "https://awscli.amazonaws.com/AWSCLIV2.pkg" -o "AWSCLIV2.pkg" sudo installer -pkg AWSCLIV2.pkg -target /
-
Run the following command to confirm that the AWS CLI has been installed correctly:
aws --version
-
Before proceeding to the next step, create an AWS access key ID and secret access key.
-
Run the
aws configure
command to quickly configure the AWS CLI to run commands on behalf of your AWS user or the AWS role that you are planning to use to deploy the cloud prerequisites:aws configure AWS Access Key ID [None]: Enter the key ID AWS Secret Access Key [None]: Enter the access key Default region name [None]: Enter the short name of an AWS region supported by CDP Default output format [None]: Enter json or leave empty
- To verify that the CLI has been set up correctly, run the following command:This is the identity that you will use to set up the CDP prerequisites in your AWS account.
aws sts get-caller-identity
For a full reference of AWS CLI installation and configuration options, refer to Getting Started with the AWS CLI and Configuring the AWS CLI.
Terraform
The deployment automation framework is implemented as a Terraform module. To be able to use this, you need to download and install Terraform.
We provide one possible way for installing it for a MacOS system via terminal. For a full reference, follow the official Terraform installation guide. The module has been tested with Terraform versions 1.3.x and 1.4.x.
The easiest way to set up Terraform on MacOS is via the open source brew
package manager. If you don't yet have brew installed, run:
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
Next, set the brew command on your path by using:
export PATH=$PATH:/opt/homebrew/bin
With brew installed you can use it to install terraform and its required dependencies:
brew tap hashicorp/tap
brew install hashicorp/tap/terraform
Debian/Ubuntu
To install on Debian/Ubuntu based Linux distributions, first add the HashiCorp package key and repository:
wget -O- https://apt.releases.hashicorp.com/gpg | sudo gpg --dearmor -o /usr/share/keyrings/hashicorp-archive-keyring.gpg
echo "deb [signed-by=/usr/share/keyrings/hashicorp-archive-keyring.gpg] https://apt.releases.hashicorp.com $(lsb_release -cs) main" | sudo tee /etc/apt/sources.list.d/hashicorp.list
The Terraform package can then be installed using the following command:
sudo apt update && sudo apt install terraform
CentOS/RHEL
To install on CentOS/RHEL based Linux distributions add the HashiCorp repository using the following commands:
sudo yum install -y yum-utils
sudo yum-config-manager --add-repo https://rpm.releases.hashicorp.com/RHEL/hashicorp.repo
The install on CentOS/RHEL based Linux distributions, add the HashiCorp repository using the following command:
sudo yum -y install terraform
jq
Jq command line utility is used to parse JSON formatted output from the CDP CLI.
brew install jq
For Linux install, jq is in official Debian/Ubuntu and RHEL/CentOS repositories.
On Debian and Ubuntu, the package can be installed by using the following command:
sudo apt-get install jq
On RHEL/CentOS, the package can be installed by using the following command:
sudo dnf install jq
For more detailed instructions, see https://stedolan.github.io/jq/download/.
git
git --version
If you need to install git, we recommend following the git installation instructions from GitHub.
Python and pip package installer
Python version 3.8 is required to run the Ansible playbooks for the CDP deployment. Prior to deploying CDP using the CDP deployment templates, you should install the following required packages and configure additional prerequisites. We recommend that these steps be performed within a Python virtual environment. See Python documentation on using virtual-environments for details on how to setup and use virtual environments.
- Check whether you have the right version installed and the python command is
configured on your $PATH:
Depending on your system configuration, you may need to run:python --version
python3 --version
- Once you have python3 installed, make sure that the required directory is in your
$PATH. To do this once, you can
run:
Ensure to include this command in your localexport PATH=$PATH:~/Library/Python/3.8/bin
zsh/bash
profile~/.zshrc
, so that it is applied each time you launch a new terminal. We also recommend updating the pip package installer for Python 3 before continuing:
If you find that pip is not yet installed on your machine, you can install it by using:pip3 install --upgrade pip
To avoid conflicts with other packages installed on your machine, we recommend that you install the dependencies mentioned below in the Python virtual environment. For reference, here are the steps to create and activate the virtual environment using venv.curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py $ python3 get-pip.py --user
- Create the Python virtual
environment:
python3.8 -m venv ~/.venv/testing_zero_virtualenv
- Activate the virtual
environment:
source ~/.venv/testing_zero_virtualenv/bin/activate
- Create the Python virtual
environment:
Ansible core and jmespath
pip3 install ansible-core==2.12.10 jmespath==1.0.1
Ansible collections
The deployment of the CDP environment is performed using an Ansible playbook. This playbook uses the cloudera.cloud Ansible collection, which in turn interacts with the CDP API. It also uses the community.general Ansible collection, primarily to extract values from return values of Ansible tasks.
-
Install the cloudera.cloud Ansible collection by using the following command:
ansible-galaxy collection install git+https://github.com/cloudera-labs/cloudera.cloud.git,devel
-
Install the community.general Ansible collection by using the following command:
ansible-galaxy collection install community.general:==5.5.0
cdpy
Cdpy is a Python wrapper for the Cloudera CDP CLI. It’s designed for use with the Ansible framework.
Install cdpy using the following command:
pip3 install git+https://github.com/cloudera-labs/cdpy@main#egg=cdpy
Note that this in turn installs the CDP CLI.
Configure CDP CLI
The Cloudera CDP CLI Command Line Interface is used by the cdpy Python package to create the CDP deployment and also to automatically discover information such as the cross account IDs when creating the Cloud IAM policies. A set of access credentials are required to access these services via the CLI, as described in CDP documentation for steps to Generate the API access key.
Similarly to the way you set up the AWS CLI earlier, now configure the CDP CLI with a CDP access key ID and private key by using the following command:
cdp configure