CDP Public Cloud reference network architecture for AWS

A conceptual overview of the CDP Public Cloud architecture for AWS.


CDP Public Cloud allows customers to set up cloud Data Lakes and compute workloads in their cloud accounts on AWS, Azure, and Google Cloud. It maps a cloud account to a concept called the Environment into which all workloads are launched. For these Data Lakes and workloads to function correctly, several elements of the cloud architecture need to be configured appropriately. These include things like access permissions, networking setup, cloud storage and so on. Broadly, these elements can be configured in one of two ways:

  • CDP can set up these elements for the customer. In this model, the customer has to provide cloud account access to CDP via a cross-account role to create and manage these various elements. Usually, this model helps to set up a working environment quickly and try out CDP. However, many enterprise customers prefer or even mandate specific configurations of a cloud environment for Infosec or compliance reasons. Setting up elements like networking and cloud storage will require prior approvals and they would generally not prefer, or even actively prevent, a third party vendor like Cloudera to set up these elements automatically.
  • CDP can work with pre-created elements provided by the customer. In this option, the flow for creating the cloud Data Lakes or workloads will accept pre-created configurations of the cloud environment and launch workloads within those boundaries. This model will be clearly more aligned with enterprise requirements. However, it brings with it the risk that the configuration might not necessarily play well with CDP requirements. As a result, customers might face issues launching CDP workloads and the turnaround time to get to a working environment might be much longer and involve many tedious interactions between Cloudera and the customer cloud teams.

The most complicated of these elements of the cloud environment, from our experience in working with several enterprise customers, is the cloud network configuration. The purpose of this document is to clearly articulate the networking requirements needed for setting up a functional CDP Public Cloud environment into which the Data Lakes and compute workloads of different types can be launched and used. It tries to establish the different points of access to these workloads and establishes how the given architecture will help to accomplish this access.

You can use the cloudera-deploy tool to automatically set up a model of this reference architecture, which can then be reviewed for security and compliance purposes.