Configuring MiNiFi C++ repositories

Learn how to configure and encrypt MiNiFi C++ repositories.

Similarly to NiFi, the MiNiFi C++ agent uses repositories to store state. There are three kinds of repositories:
  • The Content repository is used to store the contents of each flow file.

  • The Flow File repository contains flow file metadata, like the current state and attributes of every flowfile, and a content repository identifier.

  • The Provenance repository can track changes and store all the events of all flowfiles. It is not well maintained, and it is disabled by default.

Configuring repository storage locations

Persistent repositories use a configurable path to store data. The repository locations and their defaults are defined below. By default the MINIFI_HOME environment variable is used. If you do not specify this variable, the root installation folder is used. You may specify your own path replacing these defaults in the minifi.properties file.
nifi.database.content.repository.directory.default=${MINIFI_HOME}/content_repository
nifi.flowfile.repository.directory.default=${MINIFI_HOME}/flowfile_repository
nifi.provenance.repository.directory.default=${MINIFI_HOME}/provenance_repository 

You can also use a single database to store multiple repositories with the minifidb:// scheme. This could help with migration and centralize agent state persistence. In the scheme, the final path segment designates the column family in the repository, while the preceding path indicates the directory where the rocksdb database is created.

For example: In minifidb:///home/user/minifi/agent_state/flowfile, a directory is created at /home/user/minifi/agent_state populated with rocksdb-specific content. In that repository, a logically separate subdatabase is created under the name flowfile.
nifi.flowfile.repository.directory.default=minifidb://${MINIFI_HOME}/agent_state/flowfile nifi.database.content.repository.directory.default=minifidb://${MINIFI_HOME}/agent_state/content
nifi.state.manangement.provider.local.path=minifidb://${MINIFI_HOME}/agent_state/processor_states

Configuring repository backends

You can configure all three repositories to be volatile or persistent. Configuring volatile repositories means that their state is kept in memory and flushed upon restart. This can increase the performance, but can also cause data loss in case of restart while data is being processed by the agent.

Content repository

By default, the content repository uses a RocksDB backend, which is a persistent, in-process key-value store. There is another, filesystem-based alternative backend for this repository. In most use cases, the RocksDB backend offers better performance and transactional guarantees over the filesystem backend, but it has a limitation: it cannot store flow file contents larger than 4 GB.

You can configure the repository backends in the minifi.properties file as follows:

nifi.content.repository.class.name

Possible values:
  • DatabaseContentRepository: RocksDB backend - This is the default value.
  • FilesystemRepository: file system backend - Use it if you work with flow files larger than 4 GB.
  • VolatileContentRepository: in-memory, non-persistent storage backend - You should only use it for testing.
Flow File repository

By default, the flow file repository uses a RocksDB backend, which is a persistent, in-process key-value store.

You can configure the repository backends in the minifi.properties file as follows:

nifi.flowfile.repository.class.name

Possible values:

  • FlowFileRepository: RocksDB backend - this is the default value
  • VolatileFlowFileRepository: in-memory, non-persistent storage backend - you should only use it for testing.
Provenance repository

Cloudera does not recommend changing the provenance repository backend. If you need NiFi-style provenance data, contact your Cloudera representative with the feature request.

You can configure the repository backends in the minifi.properties file as follows:

nifi.provenance.repository.class.name

Possible values:

  • NoOpRepository: disable provenance - this is the default value
  • ProvenanceRepository: RocksDB backend - You should not use this backend repository as it is not thoroughly tested. Cloudera advises caution if you decide to use it.
  • VolatileProvenanceFileRepository: in-memory, non-persistent storage backend - You should only use it for testing.
Other configuration options for volatile repositories:
# configuration options
# maximum number of entries to keep in memory
nifi.volatile.repository.options.flowfile.max.count=10000


# maximum number of bytes to keep in memory
nifi.volatile.repository.options.flowfile.max.bytes=1M

# maximum number of entries to keep in memory
nifi.volatile.repository.options.provenance.max.count=10000

# maximum number of bytes to keep in memory
nifi.volatile.repository.options.provenance.max.bytes=1M


# maximum number of entries to keep in memory
nifi.volatile.repository.options.content.max.count=100000

# maximum number of bytes to keep in memory
nifi.volatile.repository.options.content.max.bytes=1M

# limits locking for the content repository
nifi.volatile.repository.options.content.minimal.locking=true
Systems that have limited memory must be cognisant of the above options. Limiting the maximum count for the number of entries limits memory consumption but it also limits the number of events that can be stored. If you limit the amount of volatile content you are configuring, you may have excessive session rollback due to invalid stream errors that occur when a claim cannot be found.

Repository encryption

You can encrypt the repository starting with the CEM Agents 1.21.06 release.

You can provide the RocksDB-backed repositories with a key to request their encryption (using AES-256-CTR) in the conf/bootstrap.conf file:
nifi.flowfile.repository.encryption.key=805D7B95EF44DC27C87FFBC4DFDE376DAE604D55DB2C5496DEEF5236362DE62E
nifi.database.content.repository.encryption.key=
# nifi.state.management.provider.local.encryption.key=

In the above configuration, the first line makes the Flow File repository use the specified 256-bit key. The second line triggers the generation of a random 256-bit key persisted back into conf/bootstrap.conf, which the Content repository then uses for encryption. This way, you can request encryption without being concerned about what key to use. Finally, as the last line is commented out, it makes the state manager use plaintext storage and not trigger encryption.

When multiple repositories use the same directory (as with minifidb:// scheme), the repositories should either be all plaintext or all encrypted with the same key.