Overlay networks for AWS environments in Cloudera Data Warehouse service

If you have an existing Virtual Private Cloud (VPC) where you want to deploy a Virtual Warehouse, you must make sure that there is an adequate number of available IP addresses. Otherwise, the Cloudera Data Warehouse (CDW) service cannot be used efficiently and some features like auto-scaling do not work because the number of IP addresses might become exhausted. For example, each compute node in a Virtual Warehouse uses 8 IP addresses. If you create a SMALL-sized Virtual Warehouse with 2 executor nodes, it uses 16 IP addresses. If you create a LARGE-sized Virtual Warehouse with 40 executor nodes, it uses 320 IP addresses. If your VPC is small, the available IP addresses can be consumed quickly. Using overlay networks solve this issue.

About overlay networks

An overlay network is a software-defined layer of network abstraction that is used to run multiple separate, discrete virtualized network layers over the VPC network. In the case of the CDW service, a custom CNI (Container Network Interface) plugin is used to enable the overlay network. It creates two network spaces:

  • A node network space, which derives per-node IP addresses from the VPC.
  • A Kubernetes pod network space, which derives per-pod IP addresses from the CNI plugin's own network space.

The overlay network is bridged into the node network. As a result, one IP address is required per node instead of one IP address needed per pod. Consequently, there are more available IP addresses and you can use the CDW service efficiently, auto-scaling Virtual Warehouses as needed to meet the demands of your workloads.

When to use overlay networks for CDW service environments

By default, when you create a new network during environment registration with the Management Console, CDP creates 3 subnets with 8,192 IP addresses per subnet, which means there are 24,576 available IP addresses:

Figure 1. Registering an environment with a new network in Management Console


When CDP creates your environment, you most likely will have enough IP addresses to use and grow your Virtual Warehouses. However, if you decide to use an existing VPC network on your cloud provider, which might have a limited number of available IP addresses, configuring your environment to use overlay networks for the CDW service avoids IP address exhaustion. Using overlay networks, which can be configured for an environment in the CDW service, uses fewer IP addresses because it uses an "IP per host" model, so your Virtual Warehouse can be used efficiently.