Configuring MiNiFi C++ repositories

Learn how to configure and encrypt MiNiFi C++ repositories for efficient data storage.

Repository types

Similarly to NiFi, the MiNiFi C++ agent uses repositories to store various types of data. There are three primary repository types:
Content repository
It stores the actual contents of each flowfile.
Flow File repository
It contains flowfile metadata, including the current state and attributes of every flowfile, and a content repository identifier.
Provenance repository
It tracks changes and stores all events related to flowfiles. Note that this repository is disabled by default and is not well-maintained.

Configuring repository storage locations

Persistent repositories use configurable paths to store data. The repository locations and their defaults are defined below. By default the MINIFI_HOME environment variable is used. If you do not specify this variable, the root installation folder is used. You can specify your own path replacing the default variables in the minifi.properties file.
nifi.database.content.repository.directory.default=${MINIFI_HOME}/content_repository
nifi.flowfile.repository.directory.default=${MINIFI_HOME}/flowfile_repository
nifi.provenance.repository.directory.default=${MINIFI_HOME}/provenance_repository 

You can also use a single database to store multiple repositories with the minifidb:// scheme. This can help with migration and centralizing agent state persistence. In the scheme, the final path segment designates the column family in the repository, while the preceding path indicates the directory where the rocksdb database is created.

For example: In minifidb:///home/user/minifi/agent_state/flowfile, a directory is created at /home/user/minifi/agent_state populated with rocksdb-specific content. In that repository, a logically separate subdatabase is created under the name flowfile.
nifi.flowfile.repository.directory.default=minifidb://${MINIFI_HOME}/agent_state/flowfile nifi.database.content.repository.directory.default=minifidb://${MINIFI_HOME}/agent_state/content
nifi.state.manangement.provider.local.path=minifidb://${MINIFI_HOME}/agent_state/processor_states

Configuring repository backends

You can configure all three repositories (Content, Flow File, and Provenance) to be either volatile or persistent. Configuring volatile repositories means that their state is kept in memory and flushed upon restart. This can increase the performance, but can also cause data loss in case of restart while data is being processed by the agent.

Content repository

By default, the content repository uses a RocksDB backend, which is a persistent, in-process key-value store. There is another, filesystem-based alternative backend for this repository. In most use cases, the RocksDB backend offers better performance and transactional guarantees over the filesystem backend, but it has a limitation: it cannot store flow file contents larger than 4 GB.

You can configure the repository backends in the minifi.properties file as follows:

nifi.content.repository.class.name

Possible values:
  • DatabaseContentRepository: RocksDB backend - This is the default value.
  • FilesystemRepository: file system backend - Use it if you work with flow files larger than 4 GB.
  • VolatileContentRepository: in-memory, non-persistent storage backend - You should only use it for testing.
Flow File repository

By default, the flow file repository uses a RocksDB backend, which is a persistent, in-process key-value store.

You can configure the repository backends in the minifi.properties file as follows:

nifi.flowfile.repository.class.name

Possible values:

  • FlowFileRepository: RocksDB backend - this is the default value
  • VolatileFlowFileRepository: in-memory, non-persistent storage backend - you should only use it for testing.
Provenance repository

Cloudera does not recommend changing the provenance repository backend. If you need NiFi-style provenance data, contact your Cloudera representative with the feature request.

You can configure the repository backends in the minifi.properties file as nifi.provenance.repository.class.name.

Possible values:

  • NoOpRepository: disable provenance - This is the default value.
  • ProvenanceRepository: RocksDB backend - You should not use this backend repository as it is not thoroughly tested. Cloudera advises caution if you decide to use it.
  • VolatileProvenanceFileRepository: in-memory, non-persistent storage backend - You should only use it for testing.
Other configuration options for volatile repositories:
# configuration options
# maximum number of entries to keep in memory
nifi.volatile.repository.options.flowfile.max.count=10000


# maximum number of bytes to keep in memory
nifi.volatile.repository.options.flowfile.max.bytes=1M

# maximum number of entries to keep in memory
nifi.volatile.repository.options.provenance.max.count=10000

# maximum number of bytes to keep in memory
nifi.volatile.repository.options.provenance.max.bytes=1M


# maximum number of entries to keep in memory
nifi.volatile.repository.options.content.max.count=100000

# maximum number of bytes to keep in memory
nifi.volatile.repository.options.content.max.bytes=1M

# limits locking for the content repository
nifi.volatile.repository.options.content.minimal.locking=true
Systems that have limited memory must be cognisant of the above options. Limiting the maximum count for the number of entries limits memory consumption but it also limits the number of events that can be stored. If you limit the amount of volatile content you are configuring, you may have excessive session rollback due to invalid stream errors that occur when a claim cannot be found.

Repository encryption

You can encrypt the repository starting with the CEM Agents 1.21.06 release.

You can provide the RocksDB-backed repositories with a key to request their encryption (using AES-256-CTR) in the conf/bootstrap.conf file:
nifi.flowfile.repository.encryption.key=805D7B95EF44DC27C87FFBC4DFDE376DAE604D55DB2C5496DEEF5236362DE62E
nifi.database.content.repository.encryption.key=
# nifi.state.management.provider.local.encryption.key=

In the above configuration, the first line makes the Flow File repository use the specified 256-bit key. The second line triggers the generation of a random 256-bit key persisted back into conf/bootstrap.conf, which the Content repository then uses for encryption. This way, you can request encryption without being concerned about what key to use. Finally, as the last line is commented out, it makes the state manager use plaintext storage and not trigger encryption.

When multiple repositories use the same directory (as with minifidb:// scheme), the repositories should either be all plaintext or all encrypted with the same key.