Tuning your data flow
Several concepts and parameters affect the performance of NiFi data flows. Understanding the options and recommendations for tuning your NiFi environment, configuring parameters, and allocating resources enables you to optimize the performance for your use case and type of data flow.
Fine tuning a NiFi workflow depends on a lot of parameters, and there is no single answer for every use case and type of data flow. A data flow can process small or large FlowFiles, can have a small or large number of events to process per second and can rely on various processors with their own specific characteristics.
In addition, there is no resource isolation between the data flows running in the same NiFi environment. So when running multiple data flows in the same NiFi environment, you must perform fine tuning tasks globally to ensure there is no side effect between data flows.
You can process significant data without configuration changes in a default NiFi environment. You may need to change some parameters when your use cases become larger or more complex.
The first bottleneck in NiFi is usually the I/O operations on disks, and it is highly recommended that you have one or many dedicated disks for each one of the NiFi repositories. For example, processing millions of events per second on a 6-node NiFi cluster is relatively easy with the right hardware configuration and without fine tuning.
See Cloudera's Sizing reccomendations and the Processing one billion events per second blog post for more information.