Cloudera DataFlow key concepts

Learn about the key concepts and terms used in Cloudera DataFlow (CDF).

Catalog

The CDF Catalog is where your flow definitions are stored and where you manage the CDF flow definition lifecycle from import through versioning to deletion. The Catalog is also the place from where you can initiate new deployments from.

Dashboard

The Dashboard is the central monitoring component within CDF showing all flow deployments across environments at a glance. For each flow deployment, you can open the Deployment Details pane, which shows you the KPIs you have defined, system metrics, as well as system events and alerts.

Deployment Manager

The Deployment Manager allows you to review and modify flow deployment parameters, settings for size and scaling, and KPI and alert definitions. It also allows you to initiate NiFi version upgrades, access the NiFi canvas of your flow deployments as well as terminate them. Click the Manage Deployment link in the Deployment Details pane to access the Deployment Manager.

Environment

CDF works in the context of CDP environments. You can enable the DataFlow service for any supported environment you have registered with CDP. The enablement process creates the Kubernetes infrastructure required by CDF and each environment maps to one Kubernetes cluster.

Once DataFlow has been enabled for an environment, you can start deploying flow definitions to it.

Flow definition

A flow definition represents the data flow logic developed in Apache NiFi and exported by using the Download Flow Definition action on a NiFi process group or the root canvas. Flow definitions typically leverage parameterization to make the flows portable between for example development and production NiFi environments.

To run an existing NiFi data flow in CDF, you have to export it as a flow definition and upload it to the CDF Catalog.

Flow deployment

A flow deployment represents a NiFi cluster running on Kubernetes and executing a specific flow definition. When you initiate the flow deployment process from the CDF Catalog, a deployment wizard helps you turn a flow definition into a flow deployment. When using the wizard, specify your environment, provide configuration parameters, auto-scaling settings and KPI definitions for your flow deployment.

Function

A function is a flow that is uploaded into the DataFlow Catalog and that can be run in serverless mode by serverless cloud provider services.

KPI

Apache NiFi has multiple metrics to monitor the different statistics of the system such as memory usage, CPU usage, data flow statistics, and so on. Key Performance Indicators (KPIs) are representations of those metrics for a NiFi component in Cloudera DataFlow. They provide a critical monitoring tool for a real-time view into your data flow performance.

Project

A Project is a container for a set of DataFlow Resources (Deployments, Drafts, Inbound Connections, and custom NARs) that restricts the visibility of those Resources that are associated with it.

ReadyFlow

A ReadyFlow is a predefined, out-of-the-box data flow which can be immediately deployed by providing a small set of required parameters.

ReadyFlow Gallery

The ReadyFlow Gallery is where you find all available ReadyFlows. To use a ReadyFlow, you need to add it from the ReadyFlow Gallery to the Catalog and then use it to create a Flow Deployment.

Workspace

The Workspace view displays all resources (flow deployments, flow drafts, inbound connections, custom NARs) within an Environment, making it easier to switch between them and managing them.