Recommendations

Learn about the considerations involved in tuning your Flow Management cluster for optimum performance.

NiFi heap memory configuration guidance

Apache NiFi is designed to be extremely memory efficient and setting an optimal heap size is crucial for its performance. The appropriate heap size often depends on the flow design of your deployed use cases.

Here are some guidelines for configuring the heap memory:
  • Maximum heap size: Unless specifically recommended by Cloudera's support team, the heap size allocated to NiFi should not exceed 16 GB.
  • Memory allocation ratio: The heap size allocated to NiFi should not exceed 50% of the total memory available on the NiFi host.
If you encounter OutOfMemory (OOM) exception, contact Cloudera Support to review your flow design rather than increasing the heap size beyond 16 GB. Adjusting the heap size without proper analysis could lead to suboptimal performance and other issues.

Increasing thread pool size

The first Flow Management cluster tuning recommendation is that you adjust the thread pool size based on the number of cores. Once you have done this, you have the option for additional adjustments.

Do not increase the thread pool size unless you see that the active threads count is always equal to the maximum number of available threads. For example, if you have a 3-node cluster with eight cores per node, do not increase the thread pool size from 24 to a higher number, unless the active thread count displayed in the UI is often equal to 72. (3 nodes x 24 available threads per node = 72).

You can use the NiFi Summary UI to identify the number of threads used per processor.

Core load average of NiFi nodes

If the active threads count is equal to the maximum number of available threads, review the core load average on your NiFi nodes.

If the core load average is below 80% of the number of cores (below 6.4 in the example), and if the active threads count is at its maximum, you can slightly increase the thread pool size. Start by increasing the thread pool size by n+1 times the number of cores, where n is the current value. You usually want to keep the load average around 80% the number of cores to account for the loss of one node. You also want to have some resources available to process the additional amount of work on the remaining nodes.

Number of concurrent tasks

If:

  • You have backpressure somewhere in your workflow
  • Your load average is low
  • The active threads count is not at the maximum

you can consider increasing the number of concurrent tasks where the processors are not processing enough data and are causing backpressure.

You should increase concurrent tasks iteratively. Begin by increasing the number of concurrent tasks by 1.

Check

  • How things are evolving globally (the thread pool is shared across all the workflows running in the same NiFi environment)
  • Load average
  • Active thread count

Based on these considerations, decide if you need to increase this number again.

A processor displays active threads across the cluster:

If a processor has active threads and is not processing data as fast as expected while the load average on the server is low, the issue can be related to I/O operations on disks. In this case it is a good idea to check the I/O statistics on the disks used for the NiFi repositories.