Configuring MiNiFi C++ repositories
Learn how to configure and encrypt MiNiFi C++ repositories for efficient data storage.
Repository types
- Content repository
- It stores the actual contents of each flowfile.
- Flow File repository
- It contains flowfile metadata, including the current state and attributes of every flowfile, and a content repository identifier.
- Provenance repository
- It tracks changes and stores all events related to flowfiles. Note that this repository is disabled by default and is not well-maintained.
Configuring repository storage locations
MINIFI_HOME
environment
variable is used. If you do not specify this variable, the root installation folder is used.
You can specify your own path replacing the default variables in the
minifi.properties
file.nifi.database.content.repository.directory.default=${MINIFI_HOME}/content_repository
nifi.flowfile.repository.directory.default=${MINIFI_HOME}/flowfile_repository
nifi.provenance.repository.directory.default=${MINIFI_HOME}/provenance_repository
You can also use a single database to store multiple repositories with the minifidb:// scheme. This can help with migration and centralizing agent state persistence. In the scheme, the final path segment designates the column family in the repository, while the preceding path indicates the directory where the rocksdb database is created.
nifi.flowfile.repository.directory.default=minifidb://${MINIFI_HOME}/agent_state/flowfile nifi.database.content.repository.directory.default=minifidb://${MINIFI_HOME}/agent_state/content
nifi.state.manangement.provider.local.path=minifidb://${MINIFI_HOME}/agent_state/processor_states
Configuring repository backends
You can configure all three repositories (Content, Flow File, and Provenance) to be either volatile or persistent. Configuring volatile repositories means that their state is kept in memory and flushed upon restart. This can increase the performance, but can also cause data loss in case of restart while data is being processed by the agent.
- Content repository
-
By default, the content repository uses a RocksDB backend, which is a persistent, in-process key-value store. There is another, filesystem-based alternative backend for this repository. In most use cases, the RocksDB backend offers better performance and transactional guarantees over the filesystem backend, but it has a limitation: it cannot store flow file contents larger than 4 GB.
- Flow File repository
-
By default, the flow file repository uses a RocksDB backend, which is a persistent, in-process key-value store.
- Provenance repository
-
Cloudera does not recommend changing the provenance repository backend. If you need NiFi-style provenance data, contact your Cloudera representative with the feature request.
# configuration options
# maximum number of entries to keep in memory
nifi.volatile.repository.options.flowfile.max.count=10000
# maximum number of bytes to keep in memory
nifi.volatile.repository.options.flowfile.max.bytes=1M
# maximum number of entries to keep in memory
nifi.volatile.repository.options.provenance.max.count=10000
# maximum number of bytes to keep in memory
nifi.volatile.repository.options.provenance.max.bytes=1M
# maximum number of entries to keep in memory
nifi.volatile.repository.options.content.max.count=100000
# maximum number of bytes to keep in memory
nifi.volatile.repository.options.content.max.bytes=1M
# limits locking for the content repository
nifi.volatile.repository.options.content.minimal.locking=true
Systems that have limited memory must be cognisant of the above options.
Limiting the maximum count for the number of entries limits memory consumption but it also
limits the number of events that can be stored. If you limit the amount of volatile content
you are configuring, you may have excessive session rollback due to invalid stream errors
that occur when a claim cannot be found.Repository encryption
You can encrypt the repository starting with the CEM Agents 1.21.06 release.
nifi.flowfile.repository.encryption.key=805D7B95EF44DC27C87FFBC4DFDE376DAE604D55DB2C5496DEEF5236362DE62E
nifi.database.content.repository.encryption.key=
# nifi.state.management.provider.local.encryption.key=
In the above configuration, the first line makes the Flow File repository use the specified 256-bit key. The second line triggers the generation of a random 256-bit key persisted back into conf/bootstrap.conf, which the Content repository then uses for encryption. This way, you can request encryption without being concerned about what key to use. Finally, as the last line is commented out, it makes the state manager use plaintext storage and not trigger encryption.
When multiple repositories use the same directory (as with minifidb:// scheme), the repositories should either be all plaintext or all encrypted with the same key.