Cloudera DataFlow Architecture

Cloudera DataFlow follows a two-tier architecture where product capabilities like the Dashboard, Catalog and Environment management are hosted on the Cloudera Control Plane while the flow deployments processing your data are provisioned in a Cloudera environment which represents infrastructure in your cloud provider account. Learn more about the service architecture, and how Cloudera DataFlow enables the various service users to achieve their goals.

When you enable Cloudera DataFlow for one of your registered Cloudera environments, Cloudera DataFlow creates and configures the required infrastructure including a Kubernetes cluster, Kubernetes Operators and the Cloudera DataFlow workload application in your cloud account. After Cloudera DataFlow has been successfully enabled for an environment, users can deploy Flow Definitions into this environment. Deploying a Flow Definition creates a dedicated NiFi cluster on Kubernetes allowing you to treat NiFi flows as isolated flow deployments.

Flow deployments run the NiFi flow logic and process data in your cloud account. Therefore data that is being processed by a flow deployment does not traverse the Cloudera Control Plane. Flow deployments send heartbeats containing health and performance information to the Control Plane where this data is visualized and presented in the Dashboard.

The Cloudera DataFlow Functions feature allows you to deploy NiFi flows stored in the Cloudera DataFlow Catalog as functions executed within AWS Lambda, Azure Functions and/or Google Cloud Functions. Leveraging Cloudera DataFlow Functions does not require to have Cloudera DataFlow enabled in a Cloudera environment. When a Cloudera DataFlow function is executed, the function will interact with the Control Plane to retrieve the flow definition and to send monitoring information.

Learn more about the specific details of Cloudera DataFlow architecture from the diagram below.