This sections provides a 20,000 foot view of NiFi’s cornerstone fundamentals, so that
you can understand the Apache NiFi big picture, and some of its the most interesting features. The
key features categories include flow management, ease of use, security, extensible architecture,
and flexible scaling model.
- Flow Management
-
- Guaranteed Delivery
-
A core philosophy of NiFi has been that even at very high scale, guaranteed delivery is
a must. This is achieved through effective use of a purpose-built persistent write-ahead
log and content repository. Together they are designed in such a way as to allow for very
high transaction rates, effective load-spreading, copy-on-write, and play to the strengths
of traditional disk read/writes.
- Data Buffering w/ Back Pressure and Pressure Release
-
NiFi supports buffering of all queued data as well as the ability to provide back
pressure as those queues reach specified limits or to age off data as it reaches a
specified age (its value has perished).
- Prioritized Queuing
-
NiFi allows the setting of one or more prioritization schemes for how data is retrieved
from a queue. The default is oldest first, but there are times when data should be pulled
newest first, largest first, or some other custom scheme.
- Flow Specific QoS (latency v throughput, loss tolerance, etc.)
-
There are points of a dataflow where the data is absolutely critical and it is loss
intolerant. There are also times when it must be processed and delivered within seconds to
be of any value. NiFi enables the fine-grained flow specific configuration of these
concerns.
- Ease of Use
-
- Visual Command and Control
-
Dataflows can become quite complex. Being able to visualize those flows and express them
visually can help greatly to reduce that complexity and to identify areas that need to be
simplified. NiFi enables not only the visual establishment of dataflows but it does so in
real-time. Rather than being 'design and deploy' it is much more like molding clay. If you
make a change to the dataflow that change immediately takes effect. Changes are
fine-grained and isolated to the affected components. You don’t need to stop an entire
flow or set of flows just to make some specific modification.
- Flow Templates
-
Dataflows tend to be highly pattern oriented and while there are often many different
ways to solve a problem, it helps greatly to be able to share those best practices.
Templates allow subject matter experts to build and publish their flow designs and for
others to benefit and collaborate on them.
- Data Provenance
-
NiFi automatically records, indexes, and makes available provenance data as objects flow
through the system even across fan-in, fan-out, transformations, and more. This
information becomes extremely critical in supporting compliance, troubleshooting,
optimization, and other scenarios.
- Recovery / Recording a rolling buffer of fine-grained history
-
NiFi’s content repository is designed to act as a rolling buffer of history. Data is
removed only as it ages off the content repository or as space is needed. This combined
with the data provenance capability makes for an incredibly useful basis to enable
click-to-content, download of content, and replay, all at a specific point in an object’s
lifecycle which can even span generations.
- Security
-
- System to System
-
A dataflow is only as good as it is secure. NiFi at every point in a dataflow offers
secure exchange through the use of protocols with encryption such as 2-way SSL. In
addition NiFi enables the flow to encrypt and decrypt content and use shared-keys or other
mechanisms on either side of the sender/recipient equation.
- User to System
-
NiFi enables 2-Way SSL authentication and provides pluggable authorization so that it
can properly control a user’s access and at particular levels (read-only, dataflow
manager, admin). If a user enters a sensitive property like a password into the flow, it
is immediately encrypted server side and never again exposed on the client side even in
its encrypted form.
- Multi-tenant Authorization
-
The authority level of a given dataflow applies to each component, allowing the admin
user to have fine grained level of access control. This means each NiFi cluster is capable
of handling the requirements of one or more organizations. Compared to isolated
topologies, multi-tenant authorization enables a self-service model for dataflow
management, allowing each team or organization to manage flows with a full awareness of
the rest of the flow, to which they do not have access.
- Extensible Architecture
-
- Extension
-
NiFi is at its core built for extension and as such it is a platform on which dataflow
processes can execute and interact in a predictable and repeatable manner. Points of
extension include: processors, Controller Services, Reporting Tasks, Prioritizers, and
Customer User Interfaces.
- Classloader Isolation
-
For any component-based system, dependency problems can quickly occur. NiFi addresses
this by providing a custom class loader model, ensuring that each extension bundle is
exposed to a very limited set of dependencies. As a result, extensions can be built with
little concern for whether they might conflict with another extension. The concept of
these extension bundles is called 'NiFi Archives' and is discussed in greater detail in
the Developer’s Guide.
- Site-to-Site Communication Protocol
-
The preferred communication protocol between NiFi instances is the NiFi Site-to-Site
(S2S) Protocol. S2S makes it easy to transfer data from one NiFi instance to another
easily, efficiently, and securely. NiFi client libraries can be easily built and bundled
into other applications or devices to communicate back to NiFi via S2S. Both the socket
based protocol and HTTP(S) protocol are supported in S2S as the underlying transport
protocol, making it possible to embed a proxy server into the S2S communication.
- Flexible Scaling Model
-
- Scale-out (Clustering)
-
NiFi is designed to scale-out through the use of clustering many nodes together as
described above. If a single node is provisioned and configured to handle hundreds of MB
per second, then a modest cluster could be configured to handle GB per second. This then
brings about interesting challenges of load balancing and fail-over between NiFi and the
systems from which it gets data. Use of asynchronous queuing based protocols like
messaging services, Kafka, etc., can help. Use of NiFi’s 'site-to-site' feature is also
very effective as it is a protocol that allows NiFi and a client (including another NiFi
cluster) to talk to each other, share information about loading, and to exchange data on
specific authorized ports.
- Scale-up & down
-
NiFi is also designed to scale-up and down in a very flexible manner. In terms of
increasing throughput from the standpoint of the NiFi framework, it is possible to
increase the number of concurrent tasks on the processor under the Scheduling tab when
configuring. This allows more processes to execute simultaneously, providing greater
throughput. On the other side of the spectrum, you can perfectly scale NiFi down to be
suitable to run on edge devices where a small footprint is desired due to limited hardware
resources. To specifically solve the first mile data collection challenge and edge use
cases, you can find more details here: https://cwiki.apache.org/confluence/display/NIFI/MiNiFi regarding
a child project effort of Apache NiFi, MiNiFi (pronounced "minify", [min-uh-fahy]).