Embedded ZooKeeper Server

As mentioned above, the default State Provider for cluster-wide state is the ZooKeeperStateProvider. At the time of this writing, this is the only State Provider that exists for handling cluster-wide state. What this means is that NiFi has dependencies on ZooKeeper in order to behave as a cluster. However, there are many environments in which NiFi is deployed where there is no existing ZooKeeper ensemble being maintained. In order to avoid the burden of forcing administrators to also maintain a separate ZooKeeper instance, NiFi provides the option of starting an embedded ZooKeeper server.

Property

Description

nifi.state.management.embedded.zookeeper.start

Specifies whether or not this instance of NiFi should run an embedded ZooKeeper server

nifi.state.management.embedded.zookeeper.properties

Properties file that provides the ZooKeeper properties to use if nifi.state.management.embedded.zookeeper.start is set to true

This can be accomplished by setting the nifi.state.management.embedded.zookeeper.start property in nifi.properties to true on those nodes that should run the embedded ZooKeeper server. Generally, it is advisable to run ZooKeeper on either 3 or 5 nodes. Running on fewer than 3 nodes provides less durability in the face of failure. Running on more than 5 nodes generally produces more network traffic than is necessary. Additionally, running ZooKeeper on 4 nodes provides no more benefit than running on 3 nodes, ZooKeeper requires a majority of nodes be active in order to function. However, it is up to the administrator to determine the number of nodes most appropriate to the particular deployment of NiFi.

If the nifi.state.management.embedded.zookeeper.start property is set to true, the nifi.state.management.embedded.zookeeper.properties property in nifi.properties also becomes relevant. This specifies the ZooKeeper properties file to use. At a minimum, this properties file needs to be populated with the list of ZooKeeper servers. The servers are specified as properties in the form of server.1, server.2, to server.n. As of NiFi 1.10.x, Zookeeper has been upgraded to 3.5.5 and servers are now defined with the client port appended at the end as per the Zookeeper Documentation. As such, each of these servers is configured as <hostname>:<quorum port>[:<leader election port>][:role];[<client port address>:]<client port>. As a simple example this would be server.1 = myhost:2888:3888;2181. This list of nodes should be the same nodes in the NiFi cluster that have the nifi.state.management.embedded.zookeeper.start property set to true. Also note that because ZooKeeper will be listening on these ports, the firewall may need to be configured to open these ports for incoming traffic, at least between nodes in the cluster.

When using an embedded ZooKeeper, the ./conf/zookeeper.properties file has a property named dataDir. By default, this value is set to ./state/zookeeper. If more than one NiFi node is running an embedded ZooKeeper, it is important to tell the server which one it is. This is accomplished by creating a file named myid and placing it in ZooKeeper's data directory. The contents of this file should be the index of the server as specific by the server.<number>. So for one of the ZooKeeper servers, we will accomplish this by performing the following commands:


         cd $NIFI_HOME
mkdir state
mkdir state/zookeeper
echo 1 > state/zookeeper/myid
      

For the next NiFi Node that will run ZooKeeper, we can accomplish this by performing the following commands:


         cd $NIFI_HOME
mkdir state
mkdir state/zookeeper
echo 2 > state/zookeeper/myid
      

And so on.

For more information on the properties used to administer ZooKeeper, see the ZooKeeper Admin Guide.

For information on securing the embedded ZooKeeper Server, see the Securing ZooKeeper with Kerberos section below.