Use cases

Use cases for CDP Public Cloud for AWS.

CDP Public Cloud allows customers to process data in the cloud storage under a secure and governed Data Lake using different types of compute workloads, that are called Experiences. Typically the lifecycle of these workloads go as follows:

  • A CDP environment is set up by a CDP administrator using their cloud account. This sets up a cloud Data Lake cluster with security and governance services and an identity provider for this environment.
  • Then one or more compute experiences can be launched, linked to this Data Lake. Each of these experiences would typically serve a specific purpose such as data ingestion, analytics, machine learning and so on.
  • These compute experiences would be accessed by data consumers like data engineers, analysts or scientists. This is the core purpose of using CDP on the public cloud.
  • These compute experiences can be long running or ephemeral, depending on the customer needs.

As can be seen above, there may be two types of users for CDP who interact with it for different purposes:

  • CDP Admins - These persons are usually concerned with the launch and maintenance of the cloud environment and the Data Lake / CDP experiences running inside them. They use a Management Console running in the Cloudera AWS account to perform these operations of managing the environment.
  • Data Consumers - These are the data scientists, analysts, engineers who use the experiences to process data. They mostly interact directly with the experiences running in their cloud account. They could access these either from their corporate networks (typically through a VPN) or other cloud networks their corporate owns.

Based on this, it is clear the need for the following kinds of access to the workloads setup by CDP.

The above is represented in this diagram: