Flow management terminology

Learn the most essential flow management terms and concepts used in connection with Cloudera Flow Management (CFM) components and operations.

There are a few key terms and concepts that you should be familiar with to understand the details of Cloudera Flow Management. For more terms and detailed definitions, see the Apache NiFi User Guide and the Apache NiFi Registry User Guide.

Term Definition UI icon (if applicable)
Processor

Apache NiFi processors are the basic blocks of creating a data flow. Every processor has a different functionality, which contributes to the creation of the output FlowFile. NiFi processors are responsible for creating, sending, receiving, transforming, routing, splitting, merging, and processing FlowFiles. There is a large number of processors available by default in NiFi, with the ability to write your own custom processors as well. Processors can be placed on the NiFi UI canvas and connected creating a data flow graph.

FlowFile

Each piece of user data (data that you bring to NiFi for processing and transfer) is wrapped in entities called FlowFiles. They have the original data as content, and some attributes. Attributes are key-value pairs that are associated with the user data. Some of these attributes are set by default, but processors can add, remove, or edit them.

Connection

In a NiFi data flow, FlowFiles move from processor one to another through a connection that gets validated using a relationship between processors. A connection is where FlowFiles are temporarily held between two connected NiFi processors. Whenever you create a connection, you have to select one or more relationships between those processors.The overall size of data in a connection is controlled by the configured Back Pressure Object Threshold and Back Pressure Data Size Threshold settings you can define per connection.

Data flow

CFM data flows are built from processors. Each processor has output relationships such as success or failure. You can connect these to the appropriate processing elements to create your custom data flow.

Data flow template

It is a NiFi component that helps you to combine basic NiFi data flow building blocks (processor, funnel, input/output port, process group, and remote process group) into larger building blocks. A template can be dragged to the canvas and used when creating a new data flow, or can be exported as an XML file and shared with others. Templates received from other users can be imported into NiFi.

Process group

It is a NiFi component that enables you to group together different data flows into a logical construct. Process group can make data flows more understandable from a higher level. A process group can be based on different projects, teams, or organizations.

Remote process group

It is a NiFi component similar to process groups. However, the remote process group references a remote instance of NiFi.

Input port

It is a NiFi component that is used to get data from a processor, which is not present in that process group.

Output port

It is a NiFi component that is used to transfer data to a processor, which is not present in that process group.

Funnel

It is a NiFi component that is used to combine the data from many connections into a single connection.

Label

It is a NiFi component that is used to add text about any component on the NiFi canvas. You can choose from a range of colors to add aesthetic sense.

Flow controller

It maintains the knowledge of how processes connect and manages the threads and allocations that all processes use. The flow controller is the brains of the operation. It acts as the broker facilitating the exchange of FlowFiles between processors. It provides threads for extensions to run on, and manages the schedule of when extensions receive resources to execute.

Bucket

A logical container that stores and organizes versioned items/resources (for example, flows) in NiFi Registry.

Versioned flow

It is a process group placed under version control with the help of NiFi Registry. Each versioned flow has a name, a description, and one or more 'snapshots' (versions).

Versioned flow snapshot

It means a single version of a versioned data flow. Each snapshot (or version) has metadata and content. The metadata contains a version number, a commit message, an author name, and a commit date. The content is the representation of the workflow itself when it has been committed.

Event

Events represent the change in FlowFiles while traversing through a NiFi data flow. These events are tracked in data provenance.

Data provenance

NiFi keeps granular information about each piece of data it handles. Each point in a data flow where a FlowFile is processed in some way is considered a 'provenance event'. Various types of provenance events can occur, depending on the data flow design. As the data is processed through the system, all historical information on what happened to a particular data object (FlowFile) is stored in NiFi’s Provenance Repository.