Cloudera DataFlow key concepts

Learn about the key concepts and terms used in Cloudera DataFlow (CDF).

Flow definition

A flow definition represents the data flow logic developed in Apache NiFi and exported by using the Download Flow Definition action on a NiFi process group or the root canvas. Flow definitions typically leverage parameterization to make the flows portable between for example development and production NiFi environments.

To run an existing NiFi data flow in CDF, you have to export it as a flow definition and upload it to the CDF Catalog.

Catalog

The CDF Catalog is where your flow definitions are stored and where you manage the CDF flow definition lifecycle from import through versioning to deletion. The Catalog is also the place from where you can initiate new deployments from.

Flow deployment

A flow deployment represents a NiFi cluster running on Kubernetes and executing a specific flow definition. When you initiate the flow deployment process from the CDF Catalog, a deployment wizard helps you turn a flow definition into a flow deployment. When using the wizard, specify your environment, provide configuration parameters, auto-scaling settings and KPI definitions for your flow deployment.

Deployment Manager

The Deployment Manager allows you to review and modify flow deployment parameters, settings for size and scaling, and KPI and alert definitions. It also allows you to initiate NiFi version upgrades, access the NiFi canvas of your flow deployments as well as terminate them. Click the Manage Deployment link in the Deployment Details pane to access the Deployment Manager.

Function

A function is a flow that is uploaded into the DataFlow Catalog and that can be run in serverless mode by serverless cloud provider services.

Environment

CDF works in the context of CDP environments. You can enable the DataFlow service for any supported environment you have registered with CDP. The enablement process creates the Kubernetes infrastructure required by CDF and each environment maps to one Kubernetes cluster.

Once DataFlow has been enabled for an environment, you can start deploying flow definitions to it.

ReadyFlow

A ReadyFlow is a predefined, out-of-the-box data flow which can be immediately deployed by providing a small set of required parameters.

ReadyFlow Gallery

The ReadyFlow Gallery is where you can find all available ReadyFlows. To use a ReadyFlow, you need to add it from the ReadyFlow Gallery to the Catalog and then use it to create a Flow Deployment.

KPI

Apache NiFi has multiple metrics to monitor the different statistics of the system such as memory usage, CPU usage, data flow statistics, and so on. Key Performance Indicators (KPIs) are representations of those metrics for a NiFi component in Cloudera DataFlow. They provide a critical monitoring tool for a real-time view into your data flow performance.

Dashboard

The Dashboard is the central monitoring component within CDF showing all flow deployments across environments at a glance. For each flow deployment, you can open the Deployment Details pane, which shows you the KPIs you have defined, system metrics, as well as system events and alerts.