AWS requirements for Cloudera DataFlow

As the administrator for your AWS environment, ensure that the environment meets the requirements for Cloudera Public Cloud and Cloudera DataFlow. Then set up your AWS cloud credential and register the environment.

Follow the steps to ensure that your AWS environment meets the Cloudera and Cloudera DataFlow requirements:

Understand your AWS account requirements for Cloudera

  • Review the Cloudera AWS account requirements. The link is in the Related information section below.

  • Verify that your AWS account for Cloudera has the required resources.

  • Verify that you have the permissions to manage these resources.

Understand the Cloudera DataFlow requirements

  • Verify that the following services are available in your environment for Cloudera DataFlow to use:

    • Network – Amazon VPC
    • Compute – Amazon Elastic Kubernetes Service (EKS)
    • Load Balancing – Amazon ELB Classic Load Balancer
    • Persistent Instance Storage – Amazon Elastic Block Store (EBS)
    • Database – Amazon Relational Database Service (RDS)
  • Determine your networking option:

    • Use your own VPC
    • Allow Cloudera to create a VPC

    To understand each option, see: Cloudera DataFlow Networking. The link is in the Related information section below.

  • Regions:

    • Select a Cloudera Public Cloud-supported region that also includes the AWS Elastic Kubernetes Service (EKS).

      For more information, see: Cloudera Supported AWS regions and the Region Table in AWS Regional Services. The links are in the Related information section below.

  • Ports and outbound network access:

    • Review the port requirements for the Cloudera default security group. See: Cloudera Management Console - Security groups. The link is in the Related information section below.
    • Configure ports for NiFi to access your source and destination systems in the data flow.
    • If you are using a firewall or a security group setting to prevent egress from the workspace, you must ensure that the outbound destinations required by Cloudera DataFlow are reachable. For more information, see Outbound network access destinations for AWS. The link is in the Related information section below.
    • If the egress is blocked to these URLs, then autoscaling fails to pull new images and the instances will have broken pods.

      Follow the recommended and minimum required security group settings by AWS. For more information, see Amazon EKS security group considerations. The link is in the Related information section below.

Set up an AWS Cloud credential

Create a role-based AWS credential that allows Cloudera Public Cloud to authenticate with your AWS account and has authorization to provision AWS resources on your behalf. Role-based authentication uses an IAM role with an attached IAM policy that has the minimum permissions required to use Cloudera.

To set up an AWS Cloud credential, see Creating a role based provisioning credential for AWS. The link is in the Related information section below.

After you have created this IAM policy, register it in Cloudera as a cloud credential. Reference this credential when you register an AWS environment in Cloudera environment as described in the next step.

Register an AWS environment in Cloudera Public Cloud

A Cloudera user must have the PowerUser role in order to register an environment. An environment determines the specific cloud provider region and virtual network in which resources can be provisioned, and includes the credential that should be used to access the cloud provider account.

To register an AWS environment in Cloudera Public Cloud, see Cloudera AWS Environments.