What is Apache NiFi?
Put simply, NiFi was built to automate the flow of data between systems. While the term 'dataflow' is used in a variety of contexts, we use it here to mean the automated and managed flow of information between systems. This problem space has been around ever since enterprises had more than one system, where some of the systems created data and some of the systems consumed data. The problems and solution patterns that emerged have been discussed and articulated extensively. A comprehensive and readily consumed form is found in the Enterprise Integration Patterns .
Some of the high-level challenges of dataflow include:
- Systems fail
-
Networks fail, disks fail, software crashes, people make mistakes.
- Data access exceeds capacity to consume
-
Sometimes a given data source can outpace some part of the processing or delivery chain - it only takes one weak-link to have an issue.
- Boundary conditions are mere suggestions
-
You will invariably get data that is too big, too small, too fast, too slow, corrupt, wrong, or in the wrong format.
- What is noise one day becomes signal the next
-
Priorities of an organization change - rapidly. Enabling new flows and changing existing ones must be fast.
- Systems evolve at different rates
-
The protocols and formats used by a given system can change anytime and often irrespective of the systems around them. Dataflow exists to connect what is essentially a massively distributed system of components that are loosely or not-at-all designed to work together.
- Compliance and security
-
Laws, regulations, and policies change. Business to business agreements change. System to system and system to user interactions must be secure, trusted, accountable.
- Continuous improvement occurs in production
-
It is often not possible to come even close to replicating production environments in the lab.
Over the years dataflow has been one of those necessary evils in an architecture. Now though there are a number of active and rapidly evolving movements making dataflow a lot more interesting and a lot more vital to the success of a given enterprise. These include things like; Service Oriented Architecture [soa], the rise of the API [api][api2], Internet of Things [iot], and Big Data [bigdata]. In addition, the level of rigor necessary for compliance, privacy, and security is constantly on the rise. Even still with all of these new concepts coming about, the patterns and needs of dataflow are still largely the same. The primary differences then are the scope of complexity, the rate of change necessary to adapt, and that at scale the edge case becomes common occurrence. NiFi is built to help tackle these modern dataflow challenges.