Cloudera DataFlow Architecture

Learn about Cloudera DataFlow architecture.

Cloudera DataFlow follows a two-tier architecture where product capabilities like the Dashboard, Catalog and Environment management are hosted on the CDP Control Plane while the flow deployments processing your data are provisioned in a CDP environment which represents infrastructure in your cloud provider account.

When you enable DataFlow for one of your registered CDP environments, DataFlow creates and configures the required infrastructure including a Kubernetes cluster, Kubernetes Operators and the DataFlow workload application in your cloud account. After DataFlow has been successfully enabled for an environment, users can deploy Flow Definitions into this environment. Deploying a Flow Definition creates a dedicated NiFi cluster on Kubernetes allowing you to treat NiFi flows as isolated flow deployments.

Flow deployments execute the NiFi flow logic and process data in your cloud account. Therefore data that’s being processed by a flow deployment does not traverse the CDP Control Plane.

Flow deployments send heartbeats containing health and performance information to the Control Plane where this data is visualized and presented in the Dashboard.