Overview

Apache NiFi offers a very robust set of Processors that are capable of ingesting, processing, routing, transforming, and delivering data of any format. This is possible because the NiFi framework itself is data-agnostic. It doesn't care whether your data is a 100-byte JSON message or a 100-gigabyte video. This is an incredibly powerful feature. However, there are many patterns that have quickly developed to handle data of differing types.

One class of data that is often processed by NiFi is record-oriented data. When we say record-oriented data, we are often (but not always) talking about structured data such as JSON, CSV, and Avro. There are many other types of data that can also be represented as "records" or "messages," though. As a result, a set of Controller Services have been developed for parsing these different data formats and representing the data in a consistent way by using the RecordReader API. This allows data that has been written in any data format to be treated the same, so long as there is a RecordReader that is capable of producing a Record object that represents the data.

When we talk about a Record, this is an abstraction that allows us to treat data in the same way, regardless of the format that it is in. A Record is made up of one or more Fields. Each Field has a name and a Type associated with it. The Fields of a Record are described using a Record Schema. The Schema indicates which fields make up a specific type of Record. The Type of a Field will be one of the following:

  • String

  • Boolean

  • Byte

  • Character

  • Short

  • Integer

  • Long

  • BigInt

  • Float

  • Double

  • Date - Represents a Date without a Time component

  • Time - Represents a Time of Day without a Date component

  • Timestamp - Represents a Date and Time

  • Embedded Record - Hierarchical data, such as JSON, can be represented by allowing a field to be of Type Record itself.

  • Choice - A field may be any one of several types.

  • Array - All elements of an array have the same type.

  • Map - All Map Keys are of type String. The Values are of the same type.

Once a stream of data has been converted into Records, the RecordWriter API allows us to then serialize those Records back into streams of bytes so that they can be passed onto other systems.

Of course, there's not much point in reading and writing this data if we aren't going to do something with the data in between. There are several processors that have already been developed for NiFi that provide some very powerful capabilities for routing, querying, and transforming Record-oriented data. Often times, in order to perform the desired function, a processor will need input from the user in order to determine which fields in a Record or which values in a Record should be operated on.

Enter the NiFi RecordPath language. RecordPath is intended to be a simple, easy-to-use Domain-Specific Language (DSL) to specify which fields in a Record we care about or want to access when configuring a processor.