Configure MiNiFi C++ repositories

Learn how to configure and encrypt MiNiFi C++ repositories.

Persistent repositories, such as the Flow File repository, use a configurable path to store data. The repository locations and their defaults are defined below. By default the MINIFI_HOME environment variable is used. If this variable is not specified, you extrapolate the path and use the root installation folder. You may specify your own path in place of these defaults in the file.${MINIFI_HOME}/provenance_repository${MINIFI_HOME}/flowfile_repository${MINIFI_HOME}/content_repository
You can also use a single database to store multiple repositories with the minifidb:// scheme. This could help with migration and centralize agent state persistence. In the scheme, the final path segment designates the column family in the repository, while the preceding path indicates the directory where the rocksdb database is created. For example, in minifidb:///home/user/minifi/agent_state/flowfile a directory is created at /home/user/minifi/agent_state populated with rocksdb-specific content, and in that repository a logically separate subdatabase is created under the name flowfile.${MINIFI_HOME}/agent_state/flowfile${MINIFI_HOME}/agent_state/content

Repository encryption

You can encrypt the repository starting with CEM Agents 1.21.06 release.

You can provide the rocksdb-backed repositories a key to request their encryption (using AES-256-CTR). In the conf/bootstrap.conf file:

In the above configuration, the first line causes Flow File repository to use the specified 256 bit key. The second line triggers the generation of a random 256 bits key persisted back into conf/bootstrap.conf, which the Database Content repository then uses for encryption. In this way, you can request encryption while not bothering with what key to use. Finally, as the last line is commented out, it makes the state manager use plaintext storage, and not trigger encryption.

When multiple repositories use the same directory (as with minifidb:// scheme), the repositories should either be all plaintext or all encrypted with the same key.

In-memory repositories

Each of the repositories can be configured to be volatile (state is kept in memory and flushed upon restart). This can increase the performance but also cause data loss in case of restart while data is being processed by the agent.

To configure the repositories in the file:
# For Volatile Repositories:

# configuration options
# maximum number of entries to keep in memory

# maximum number of bytes to keep in memory

# maximum number of entries to keep in memory

# maximum number of bytes to keep in memory

# maximum number of entries to keep in memory

# maximum number of bytes to keep in memory

# limits locking for the content repository

# For NO-OP Repositories:

Systems that have limited memory must be cognisant of the above options. Limiting the maximum count for the number of entries limits memory consumption but also limits the number of events that can be stored. If you are limiting the amount of volatile content you are configuring, you may have excessive session rollback due to invalid stream errors that occur when a claim cannot be found.

The content repository has a default option for minimal.locking to set to true. This attempts to use lock free structures. This may or may not be optimal as this requires additional searching of the underlying vector. This may be optimal for cases where max.count is not excessively high. In cases where object permanence is low within the repositories, minimal locking results in better performance. If there are many processors so that the content repository fills up quickly, performance may be reduced. In all cases a locking cache is used to avoid the worst case complexity of O(n) for the content repository, however, this caching is more heavily used when minimal.locking is set to false.