Recommendations
Learn how to configure your Flow Management cluster with sizing considerations in mind.
Cloudera recommends the following setup for on-premises, bare metal installations:
-
1 RAID 1 or 10 array for the OS
-
1 RAID 1 or 10 array for the FlowFile repository
-
1 or many RAID 1 or 10 array(s) for the content repository
-
1 or many RAID 1 or 10 array(s) for the provenance repository
For high performance setup, Cloudera recommends SSDs over spinning disks.
For cloud environments, larger disks usually provide better throughputs. Review your cloud provider documentation for more information.
In terms of memory, NiFi is optimized to support FlowFiles of any size. This is achieved by never materializing the file into memory directly. Instead, NiFi uses input and output streams to process events (there are a few exceptions with some specific processors). This means that NiFi does not require significant memory even if it is processing very large files. Most of the memory on the system should be left available for the OS cache. By having a large enough OS cache, many of the disk reads are skipped completely. Consequently, unless NiFi is used for very specific memory oriented data flows, setting the Java heap to 8 GB or 16 GB is usually sufficient.
The performance you can expect directly depends on the hardware and the flow design. For example, when reading compressed data from a cloud object store, decompressing the data, filtering it based on specific values, compressing the filtered data, and sending it to a cloud object store, you can achieve the following results:
Nodes | Data rate per second | Events per second | Data rate per day | Events per day |
---|---|---|---|---|
1 | 192.5 MB | 946,000 | 16.6 TB | 81.7 billion |
5 | 881 MB | 4.97 million | 76 TB | 429.4 billion |
25 | 5.8 GB | 26 million | 501 TB | 2.25 trillion |
100 | 22 GB | 90 million | 1.9 PB | 7.8 trillion |
150 | 32.6 GB | 141.3 million | 2.75 PB | 12.2 trillion |
Data rates and event rates were captured running the flow described above on Google Kubernetes Engine. Each node has 32 cores, 15 GB RAM, and a 2 GB heap. The Content Repository is a 1 TB Persistent SSD (400 MB per second write, 1200 MB second read).
NiFi scales well, both vertically and horizontally. Depending on the number of data flows running in the NiFi cluster and your operational requirements, you can add nodes to the NiFi cluster over time to meet your needs.
With this information in mind, Cloudera recommends:
- At least 4 cores per NiFi node (more is better and 8 cores usually provides the best starting point for the most common use cases)
- At least 6 disks per NiFi node to ensure dedicated disks for repositories
- At least 4GB of RAM for the NiFi heap
Now that you have finished reviewing the Flow Management cluster sizing considerations, see Processing one billion events per second with NiFi for additional information and a use case walk through.