Architecture diagrams

This topic includes diagrams illustrating the various elements of the network architecture in the customer’s cloud account into which CDP environments with Data Lakes, Data Hubs, and data services will be launched.

Cloudera recommends that customers configure their cloud networks as described in this chapter and illustrated in the following diagrams. This will help onboarding Data Lakes, Data Hubs, and data services smoothly. The following diagram illustrates the configuration for a fully private network that can be configured by the customer. This configuration can be provided by the CDP admins when they are setting up CDP environments and workloads which will get launched into this configuration.

Note the following points about the architecture:

  • The configuration is a fully private network configuration - that is, the workloads are launched on nodes that do not have public IP addresses into a private subnet.
  • They connect outbound to the CDP Control Plane over a fixed IP and port range.
  • For users to be able to connect from the customer on-prem network to the CDP workloads in the private subnet, some network connectivity setup is required. In this case, a customer’s VPN server peered to an AWS virtual private gateway is shown.

Some of the CDP data services are based on AWS EKS clusters. Amazon EKS manages the Kubernetes Control Plane while the worker nodes that make up the cluster get provisioned in the customer’s VPC. The EKS Control Plane has an API endpoint for administrative purposes which is commonly referred to as "cluster endpoint". The CDP data service itself is accessible through a service endpoint ELB.

This is illustrated in the following diagram:

As can be seen in the above diagram, CDP workloads have dependencies on some AWS cloud services such as RDS, EFS and so on. A full list of these services, described in the context of each workload is specified in AWS outbound network access destinations.

In the chapters that follow, we detail the elements of this architecture, including specifying the configuration and options in each of the components.