Management Console to customer cloud network

Explains the possible ways in which CDP Control Plane can communicate with the compute infrastructure in the customer network, in the context of the Management Console.

As described previously, the CDP Admin would typically use the CDP Management Console that runs in the ‘CDP Control Plane’ to launch Data Lakes and experiences into their cloud accounts. In order to accomplish this, the CDP Control Plane and the compute infrastructure in the customer network (EC2 instances, or EKS clusters) should be able to communicate with each other. There are the following ways in which this communication can occur:

Publicly accessible networks

In this model, the compute infrastructure must be reachable over the public internet from the management console. While this is fairly easy to set up, it is usually not preferred by enterprise customers, as it implies that the EC2 nodes or EKS nodes are assigned public IP addresses. While the access control rules for these nodes can still be restricted to the IP addresses of the Cloudera Management Console components, it is still considered insecure.

Semi-private networks

Publicly accessible networks are easy to set up for connectivity, both from the CDP Control Plane and the customer on-prem network, but have a large surface area of exposure as all compute infrastructure has public IP addresses. In contrast, fully private networks need special configuration to enable connectivity from the customer on-prem network, due to having no surface area of exposure to any of the compute infrastructure. While very secure, it is more complex to establish.

There is a third configuration supported by CDP that provides some trade-offs between these two options. In this configuration, the user deploys the worker nodes of the compute infrastructure on fully private networks as described above. However, the user chooses to expose UIs or APIs of the services fronting these worker nodes over a public network load balancer. By using this capability, the data consumers can access the UIs or APIs of the compute infrastructure through these load balancers. It is also possible to restrict the IP ranges from which such access is allowed using security groups.

While this option provides a trade-off between ease of setup and exposure levels, it may not satisfy all use cases related to communication between various endpoints. For example, some compute workloads involving Kafka or NiFi would not benefit from having a simple publicly exposed NLB. It is recommended that customers evaluate their use cases against the tradeoff and choose an appropriately convenient and secure model of setup.

Fully private networks

In this model, the compute infrastructure is not assigned any public IP address. In this case, communication is established using a ‘reverse tunnel’ that originates from the customer network to the CDP Control Plane. All communication from the Control Plane to the compute nodes is then passed through this tunnel. From experience, Cloudera has determined that this is the preferred model of communication for customers.

To elaborate on the tunneling approach, Cloudera uses a solution called Cluster Connectivity Manager (CCM). At a high level, the solution uses two components, an agent (CCM Agent) that runs on a VM provisioned in the customer network and a service (CCM Service) that runs on the CDP Control Plane. The CCM agent, at start-up time, establishes a connection with the CCM service. This connection forms the ‘reverse tunnel’. This tunnel is secured by a keypair that is initialized between the two endpoints when the workloads are set up. The private key is shared with the agent over cloud specific initialization mechanisms, such as a user-data script in AWS.

When any service on the CDP Control Plane wants to send a request to a service deployed on the customer environment [depicted in this diagram as the logical flow], it physically sends a request to the CCM service running in the Control Plane. The CCM Agent and Service collaborate over the established ‘reverse tunnel’ to accept the request, forward it to the appropriate service, and send a response over the tunnel to be handed over the calling service on the Control Plane.



EKS

All EKS clusters provisioned by various experiences are enabled (see Amazon EKS cluster endpoint access control) with public and private cluster endpoints even under Fully Private Network setup. The EKS public endpoint is needed to facilitate the interactions between CDP Control Plane and the EKS cluster while worker nodes and Kubernetes Control Plane interact over private API endpoints. There are plans to support private EKS endpoints in the future. When this occurs, the documentation will be updated to reflect the same.

Fully private outbound restricted networks

A variant of the Fully Private Network is one where customers would like to pass outbound traffic originating from their cloud account through a proxy or firewall and explicitly allow-list URLs that are allowed to pass through them. This is what Cloudera refers to as the ‘Outbound Restricted’ configuration. CDP Public Cloud supports such configuration too. In such cases, the customer must ensure the following:

  • Users configure a proxy for the environment via CDP, as documented in Use a non-transparent proxy with Cloudera Data Warehouse on AWS environments for Cloudera Data Warehouse and Using a non-transparent proxy for all other compute workloads and the Data Lake itself.

  • Compute resources (VMs and experiences) can connect to the proxy or firewall via appropriate routing rules.

  • The proxy or firewall is set up to allow connections to all hosts, IP ranges, ports, and protocol types that are documented in Outbound network access destinations for AWS.