Data flow design

When planning how to size and configure Flow Management clusters, it is important to keep in mind the flow design factors that might impact your cluster sizing needs and the performance of your data flow.

You can use NiFi for a wide array of use cases, and the resource requirements are greatly determined by data flow design. Depending on the actions it performs in a data flow, each processor may or may not be required to read or to write the processed data from or on disks.

For example, a flow ingesting 100 MB of data per second with its first processor, may need to read and write this data on disks multiple times before the result is sent to the final destination. If the data flow has four processors writing the content on disks before being sent to the final destination, the disks used for the content repositories in the NiFi cluster should be able to handle 400 MB per second at the cluster level.

NiFi design principles

NiFi is designed to use all the available resources of the nodes where it is running. It takes advantage of:

  • All available cores
  • All network capacity
  • All disk speed and capacities

For more information about principles involved in the NiFi design, see the Apache NiFi Overview.