Embedded ZooKeeper Server
As mentioned above, the default State Provider for cluster-wide state is the
ZooKeeperStateProvider. At the time of this writing, this is the only State Provider that exists for handling cluster-wide state. What this means is that NiFi has dependencies on ZooKeeper in order to behave as a cluster. However, there are many environments in which NiFi is deployed where there is no existing ZooKeeper ensemble being maintained. In order to avoid the burden of forcing administrators to also maintain a separate ZooKeeper instance, NiFi provides the option of starting an embedded ZooKeeper server.
Specifies whether or not this instance of NiFi should run an embedded ZooKeeper server
Properties file that provides the ZooKeeper properties to use if
This can be accomplished by setting the
nifi.state.management.embedded.zookeeper.start property in nifi.properties to
true on those nodes that should run the embedded ZooKeeper server. Generally, it is advisable to run ZooKeeper on either 3 or 5 nodes. Running on fewer than 3 nodes provides less durability in the face of failure. Running on more than 5 nodes generally produces more network traffic than is necessary. Additionally, running ZooKeeper on 4 nodes provides no more benefit than running on 3 nodes, ZooKeeper requires a majority of nodes be active in order to function. However, it is up to the administrator to determine the number of nodes most appropriate to the particular deployment of NiFi.
nifi.state.management.embedded.zookeeper.start property is set to
nifi.state.management.embedded.zookeeper.properties property in
nifi.properties also becomes relevant. This specifies the ZooKeeper properties
file to use. At a minimum, this properties file needs to be populated with the list of
ZooKeeper servers. The servers are specified as properties in the form of
of NiFi 1.10.x, Zookeeper has been upgraded to 3.5.5 and servers are now defined with the
client port appended at the end as per the Zookeeper Documentation. As such, each of these
servers is configured as <hostname>:<quorum port>[:<leader election
port>][:role];[<client port address>:]<client port>. As a simple example
this would be
server.1 = myhost:2888:3888;2181. This list of nodes should
be the same nodes in the NiFi cluster that have the
nifi.state.management.embedded.zookeeper.start property set to
true. Also note that because ZooKeeper will be listening on these
ports, the firewall may need to be configured to open these ports for incoming traffic, at
least between nodes in the cluster.
When using an embedded ZooKeeper, the ./conf/zookeeper.properties file has a property named
dataDir. By default, this value is set to
./state/zookeeper. If more than one NiFi node is running an embedded ZooKeeper, it is important to tell the server which one it is. This is accomplished by creating a file named myid and placing it in ZooKeeper's data directory. The contents of this file should be the index of the server as specific by the
server.<number>. So for one of the ZooKeeper servers, we will accomplish this by performing the following commands:
cd $NIFI_HOME mkdir state mkdir state/zookeeper echo 1 > state/zookeeper/myid
For the next NiFi Node that will run ZooKeeper, we can accomplish this by performing the following commands:
cd $NIFI_HOME mkdir state mkdir state/zookeeper echo 2 > state/zookeeper/myid
And so on.
For more information on the properties used to administer ZooKeeper, see the ZooKeeper Admin Guide.
For information on securing the embedded ZooKeeper Server, see the Securing ZooKeeper with Kerberos section below.