Cloudera DataFlow key features

Cloudera DataFlow is a cloud-native universal data distribution service powered by Apache NiFi ​​ that enables you to connect to any data source, process and deliver data to any destination. For more details on features and functionalities, see the below list.

Flow and resource isolation

Cloudera DataFlow allows you to easily isolate data flows from each other and guarantee a set of resources to each data flow without requiring administrators to create additional NiFi clusters. For each flow deployment, Cloudera DataFlow creates a dedicated, auto-scaling NiFi cluster on the shared Kubernetes resources in an environment. This way flow deployments can scale independently from each other, allowing you to isolate flow deployments and assign resources to deployments as needed.

Flow isolation can be useful when you want to guarantee a set of resources for a specific data flow or when you want to isolate failure domains.

Auto-scaling flow deployments

Cloudera DataFlow offers two types of auto-scaling capabilities for Apache NiFi data flows. Flow deployments may automatically scale up and down based on CPU utilization within the boundaries that are set in the deployment wizard. In this case Cloudera DataFlow scales flow deployments by adding or removing NiFi pods on the Kubernetes cluster as needed, as well as scaling the Kubernetes cluster up or down within boundaries specified during Cloudera DataFlow enablement.

On top of scaling based on CPU utilization, you may also enable Flow Metrics Scaling in the deployment wizard. This feature adds or removes NiFi pods to the Kubernetes cluster based on anticipated traffic on connection(s) where data first enters the flow. Scaling happens automatically, driven by a prediction algorithm, that uses a backpressure prediction metric to forecast how full a queue will be within a predefined period of time. The metric targets only connections which are attached to source processors. You do not need to configure anything during flow deployment. When Flow Metrics Scaling is enabled, both this metric and CPU utilization are considered. Whichever metric calls for higher scale will be obeyed.

Fault tolerant flow deployments

Flow deployments use persistent volumes to store NiFi repositories in a durable way. In case of an instance or pod failure, Cloudera DataFlow automatically spins up new pods and re-attaches the persistent volumes to ensure data processing continues from where it was interrupted.

Quick flow deployment with predefined ReadyFlows

You can quickly deploy a predefined set of data flows with minimal configuration called ReadyFlows. ReadyFlows provide you with an easy way to implement the most common data flow use cases.

Serverless NiFi Flows with Cloudera DataFlow Functions

Cloudera DataFlow Functions allows you to deploy NiFi flows not only as long running auto scaling Kubernetes clusters but also as functions on cloud providers’ serverless compute services including AWS Lambda, Azure Functions, and Google Cloud Functions. Cloudera DataFlow Functions targets use cases that do not require always running NiFi flows, enables developers to focus more on business logic and less on operational management, and establishes a true pay for value model with a serverless architecture.

Central monitoring dashboard and KPIs

You can monitor your flow deployments across environments and cloud providers on a single dashboard. You can track important flow performance metrics by defining KPI alerts for your flow deployments.

Universal connectivity

You can connect to any data source or target using NiFi's rich processor library, including on-premise data sources, cloud data storage, cloud data warehouses, log data sources, cloud data analytics services, or cloud business process services.

Role-based access control

You can control which users are entitled to perform actions like enabling the data service, creating new flow deployments or new drafts by assigning predefined roles like Flow Administrator, Flow Developer or Flow User to individual Cloudera users or groups. By creating Projects and assigning resources to them, you can further restrict access to a subset of resources to Cloudera users or groups.

Secure inbound connections

You can easily provision secure, stable, and scalable endpoints, making it easy for any application to send data to flow deployments.

Parameter groups

You can create groups of parameters and share them between data flows. Parameter groups allow you to centrally manage, share and reuse common parameters that your data flows depend on. When developing new data flows or deploying existing data flows to production, developers and administrators alike can re-use these common parameters for a simplified development and deployment experience.

Continuous integration (CI) / Continuous deployment (CD)

The Cloudera DataFlow service is built with automation in mind. Any action that is performed on the UI can be turned into a CLI statement for automation. Deploying a new NiFi flow is as easy as executing a single CLI command.