Configure MiNiFi C++ repositories
Learn how to configure and encrypt MiNiFi C++ repositories.
MINIFI_HOME
environment variable is used. If this variable is not
specified, you extrapolate the path and use the root installation folder. You may specify your
own path in place of these defaults in the minifi.properties
file.nifi.provenance.repository.directory.default=${MINIFI_HOME}/provenance_repository
nifi.flowfile.repository.directory.default=${MINIFI_HOME}/flowfile_repository
nifi.database.content.repository.directory.default=${MINIFI_HOME}/content_repository
minifidb://
scheme. This could help with migration and centralize agent
state persistence. In the scheme, the final path segment designates the column family in the
repository, while the preceding path indicates the directory where the rocksdb database is
created. For example, in minifidb:///home/user/minifi/agent_state/flowfile
a
directory is created at /home/user/minifi/agent_state
populated with
rocksdb-specific content, and in that repository a logically separate
subdatabase
is created under the name
flowfile
.nifi.flowfile.repository.directory.default=minifidb://${MINIFI_HOME}/agent_state/flowfile nifi.database.content.repository.directory.default=minifidb://${MINIFI_HOME}/agent_state/content
nifi.state.manangement.provider.local.path=minifidb://${MINIFI_HOME}/agent_state/processor_states
Repository encryption
You can encrypt the repository starting with CEM Agents 1.21.06 release.
conf/bootstrap.conf
file:nifi.flowfile.repository.encryption.key=805D7B95EF44DC27C87FFBC4DFDE376DAE604D55DB2C5496DEEF5236362DE62E
nifi.database.content.repository.encryption.key=
# nifi.state.management.provider.local.encryption.key=
In the above configuration, the first line causes Flow File repository to use the specified
256 bit key. The second line triggers the generation of a random 256 bits key persisted back
into conf/bootstrap.conf
, which the Database Content repository then uses
for encryption. In this way, you can request encryption while not bothering with what key to
use. Finally, as the last line is commented out, it makes the state manager use plaintext
storage, and not trigger encryption.
When multiple repositories use the same directory (as with minifidb://
scheme), the repositories should either be all plaintext or all encrypted with the same
key.
In-memory repositories
Each of the repositories can be configured to be volatile (state is kept in memory and flushed upon restart). This can increase the performance but also cause data loss in case of restart while data is being processed by the agent.
minifi.properties
file:# For Volatile Repositories:
nifi.flowfile.repository.class.name=VolatileFlowFileRepository
nifi.provenance.repository.class.name=VolatileProvenanceRepository
nifi.content.repository.class.name=VolatileContentRepository
# configuration options
# maximum number of entries to keep in memory
nifi.volatile.repository.options.flowfile.max.count=10000
# maximum number of bytes to keep in memory
nifi.volatile.repository.options.flowfile.max.bytes=1M
# maximum number of entries to keep in memory
nifi.volatile.repository.options.provenance.max.count=10000
# maximum number of bytes to keep in memory
nifi.volatile.repository.options.provenance.max.bytes=1M
# maximum number of entries to keep in memory
nifi.volatile.repository.options.content.max.count=100000
# maximum number of bytes to keep in memory
nifi.volatile.repository.options.content.max.bytes=1M
# limits locking for the content repository
nifi.volatile.repository.options.content.minimal.locking=true
# For NO-OP Repositories:
nifi.provenance.repository.class.name=NoOpRepository
Systems that have limited memory must be cognisant of the above options. Limiting the maximum count for the number of entries limits memory consumption but also limits the number of events that can be stored. If you are limiting the amount of volatile content you are configuring, you may have excessive session rollback due to invalid stream errors that occur when a claim cannot be found.
The content repository has a default option for minimal.locking
to set to
true. This attempts to use lock free structures. This may or may not be optimal as this
requires additional searching of the underlying vector. This may be optimal for cases where
max.count
is not excessively high. In cases where object permanence is
low within the repositories, minimal locking results in better performance. If there are
many processors so that the content repository fills up quickly, performance may be reduced.
In all cases a locking cache is used to avoid the worst case complexity of O(n) for the
content repository, however, this caching is more heavily used when
minimal.locking
is set to false.