VPC and subnets

When registering an AWS environment in CDP, you will be asked to select a VPC and two or more subnets.

You have two options:

  • Use your existing VPC and subnets for provisioning CDP resources.
  • Have CDP create a new VPC and subnets. All CDP resources will be provisioned into this new VPC and subnets.

Existing VPC and subnets

Verify the limits of the VPC and subnets available in your AWS account to ensure that you have enough resources to create clusters in CDP.

If you would like to use your own AWS VPC, it must meet the following requirements.
  • The VPC has at least three subnets, each in a different availability zone; If a region has two AZs instead of three, then still three subnets are created, two in the same AZ.
  • The VPC subnets must be connected to an Internet Gateway OR a NAT Gateway. VPC should be able to make an outbound connection with the internet or set of CIDRs and ports provided by Cloudera.
  • CDP supports public subnets and private subnets. For private subnets, you must enable CCM.
  • Ensure the CIDR block for the subnets is sized appropriately. In general there is no way to increase the subnet size without recreating the environment and VPC, although Data Warehouse service allows you to use overlay networks.
  • If you are planning to use DataFlow, Data Engineering, Data Warehouse, or Machine Learning you must enable Amazon DNS with the VPC. Corporate DNS is not supported. For guidelines on how verify your DNS settings, refer to sections 1-3 in AWS environment requirements checklist for the Data Warehouse service.
  • If you are planning to use Data Engineering or Machine Learning, you must tag the VPC and the subnets as shared so that Kubernetes can find them. For load balancers to be able to choose the subnets correctly, you are also required to tag private subnets with the kubernetes.io/role/internal-elb:1 tag, and public subnets with the kubernetes.io/role/elb:1 tag.
  • If you are planning to use the Data Warehouse service, you must:
  • If you are planning to use Data Engineering, Data Warehouse, or Machine Learning, you may also want to review the following AWS documentation:

New VPC and subnets

If you choose to allow CDP to create a new VPC, three subnets will be created automatically. One subnet is created for each availability zone assuming three AZs per region; If a region has two AZs instead of three, then still three subnets are created, two in the same AZ.

You will need to specify a valid CIDR in IPv4 range that will be used to define the range of private IPs for EC2 instances provisioned into these subnets. Default is 10.10.0.0/16. Consider changing the IP range to correspond to corporate policies for standardized IP address ranges. The CIDR must match the <network mask>/16 pattern.

By default CDP creates 6 subnets (3 private and 3 public) and divides the address space as follows:
  • 3 x /19 private subnets for FreeIPA, Data Lake, Data Hub, Data Warehouse, Machine Learning

  • 3 x /24 public subnets

You can disable creating private subnets, in which case only 3 public subnets will be created.

Private endpoints

By default, when creating a new network CDP uses public endpoints, but during environment registration you can optionally select the “Create Private Endpoints” option to use private endpoints instead of public endpoints.

If you choose to use private endpoints, make sure to review Outbound network access destinations.