Cloudera DataFlow Architecture

Cloudera DataFlow for the Public Cloud (CDF-PC) follows a two-tier architecture where product capabilities like the Dashboard, Catalog and Environment management are hosted on the CDP Control Plane while the flow deployments processing your data are provisioned in a CDP environment which represents infrastructure in your cloud provider account. Learn more about the service architecture, and how CDF-PC enables the various service users to achieve their goals.

When you enable DataFlow for one of your registered CDP environments, DataFlow creates and configures the required infrastructure including a Kubernetes cluster, Kubernetes Operators and the DataFlow workload application in your cloud account. After DataFlow has been successfully enabled for an environment, users can deploy Flow Definitions into this environment. Deploying a Flow Definition creates a dedicated NiFi cluster on Kubernetes allowing you to treat NiFi flows as isolated flow deployments.

Flow deployments run the NiFi flow logic and process data in your cloud account. Therefore data that is being processed by a flow deployment does not traverse the CDP Control Plane. Flow deployments send heartbeats containing health and performance information to the Control Plane where this data is visualized and presented in the Dashboard.

The DataFlow Functions feature allows you to deploy NiFi flows stored in the DataFlow Catalog as functions executed within AWS Lambda, Azure Functions and/or Google Cloud Functions. Leveraging DataFlow Functions does not require to have DataFlow enabled in a CDP environment. When a DataFlow function is executed, the function will interact with the Control Plane to retrieve the flow definition and to send monitoring information.

Learn more about the specific details of CDF architecture from the diagram below.