Installing HDP Manually
Also available as:
PDF
loading table of contents...

Chapter 3. Installing Apache ZooKeeper

This section describes installing and testing Apache ZooKeeper, a centralized tool for providing services to highly distributed systems.

HDFS and YARN depend on ZooKeeper, so install ZooKeeper first.

Install the ZooKeeper Package

On all nodes of the cluster that you have identified as ZooKeeper servers, type:

  • For RHEL/CentOS/Oracle Linux

    yum install zookeeper

  • for SLES

    zypper install zookeeper

  • For Ubuntu and Debian:

    apt-get install zookeeper

[Note]Note

Grant the zookeeper user shell access on Ubuntu and Debian.

usermod -s /bin/bash zookeeper

Securing ZooKeeper with Kerberos (optional)

[Note]Note

Before starting the following steps, refer to Setting up Security for Manual Installs.

(Optional) To secure ZooKeeper with Kerberos, perform the following steps on the host that runs KDC (Kerberos Key Distribution Center):

  1. Start the kadmin.local utility:

    /usr/sbin/kadmin.local

  2. Create a principal for ZooKeeper:

    sudo kadmin.local -q 'addprinc zookeeper/<ZOOKEEPER_HOSTNAME>@STORM.EXAMPLE.COM'

  3. Create a keytab for ZooKeeper:

    sudo kadmin.local -q "ktadd -k /tmp/zk.keytab zookeeper/<ZOOKEEPER_HOSTNAME>@STORM.EXAMPLE.COM"

  4. Copy the keytab to all ZooKeeper nodes in the cluster.

    [Note]Note

    Verify that only the ZooKeeper and Storm operating system users can access the ZooKeeper keytab.

  5. Add the following properties to the zoo.cfg configuration file located at /etc/zookeeper/conf:

    authProvider.1 = org.apache.zookeeper.server.auth.SASLAuthenticationProvider
    kerberos.removeHostFromPrincipal = true
    kerberos.removeRealmFromPrincipal = true
[Note]Note

Grant the zookeeper user shell access on Ubuntu and Debian.

usermod -s /bin/bash zookeeper

Set Directories and Permissions

Create directories and configure ownership and permissions on the appropriate hosts as described below.

If any of these directories already exist, we recommend deleting and recreating them. Use the following instructions to create appropriate directories:

  1. We strongly suggest that you edit and source the bash script files included with the HDP companion files.

    Alternatively, you can also copy the contents to your ~/.bash_profile to set up these environment variables in your environment.

  2. Execute the following commands on all nodes:

    mkdir -p $ZOOKEEPER_LOG_DIR;chown -R $ZOOKEEPER_USER:$HADOOP_GROUP $ZOOKEEPER_LOG_DIR; chmod -R 755 $ZOOKEEPER_LOG_DIR;
    mkdir -p $ZOOKEEPER_PID_DIR;chown -R $ZOOKEEPER_USER:$HADOOP_GROUP $ZOOKEEPER_PID_DIR; chmod -R 755 $ZOOKEEPER_PID_DIR;
    mkdir -p $ZOOKEEPER_DATA_DIR; chmod -R 755 $ZOOKEEPER_DATA_DIR;chown -R $ZOOKEEPER_USER:$HADOOP_GROUP $ZOOKEEPER_DATA_DIR

    where:

    • $ZOOKEEPER_USER is the user owning the ZooKeeper services. For example, zookeeper.

    • $ZOOKEEPER_LOG_DIR is the directory to store the ZooKeeper logs. For example, /var/log/zookeeper.

    • $ZOOKEEPER_PID_DIR is the directory to store the ZooKeeper process ID. For example, /var/run/zookeeper.

    • $ZOOKEEPER_DATA_DIR is the directory where ZooKeeper will store data. For example, /grid/hadoop/zookeeper/data.

  3. Initialize the ZooKeeper data directories with the 'myid' file. Create one file per ZooKeeper server, and put the number of that server in each file:

    vi $ZOOKEEPER_DATA_DIR/myid

    • In the myid file on the first server, enter the corresponding number: 1

    • In the myid file on the second server, enter the corresponding number: 2

    • In the myid file on the second server, enter the corresponding number: 3

Set Up the Configuration Files

There are several configuration files that need to be set up for ZooKeeper.

  1. Extract the ZooKeeper configuration files to a temporary directory.

    The files are located in the configuration_files/zookeeper directories where you decompressed the companion files.

  2. Modify the configuration files.

    In the respective temporary directories, locate the following files and modify the properties based on your environment. Search for TODO variables in the files for the properties to replace.

    You must make changes to zookeeper-env.sh specific to your environment.

  3. Edit zoo.cfg and modify the following properties:

    dataDir=$zk.data.directory.path 
    server.1=$zk.server1.full.hostname:2888:3888 
    server.2=$zk.server2.full.hostname:2888:3888 
    server.3=$zk.server3.full.hostname:2888:3888
  4. Edit hbase-site.xml and modify the following properties:

    <property>
         <name>hbase.zookeeper.quorum</name>
         <value>$zk.server1.full.hostname,$zk.server2.full.hostname,$zk.server3. full.hostname</value>
         <description>Comma separated list of ZooKeeper servers (match to what is specified in zoo.cfg but without portnumbers)</description>
    </property>
  5. Copy the configuration files

    • On all hosts create the config directory:

      rm -r $ZOOKEEPER_CONF_DIR ;
      mkdir -p $ZOOKEEPER_CONF_DIR ;
    • Copy all the ZooKeeper configuration files to the $ZOOKEEPER_CONF_DIR directory.

    • Set appropriate permissions:

      chmod a+x $ZOOKEEPER_CONF_DIR/; 
      chown -R $ZOOKEEPER_USER: $HADOOP_GROUP $ZOOKEEPER_CONF_DIR/../; 
      chmod -R 755 $ZOOKEEPER_CONF_DIR/../

      Note:

      • $ZOOKEEPER_CONF_DIR is the directory to store the ZooKeeper configuration files. For example, /etc/zookeeper/conf.

      • $ZOOKEEPER_USER is the user owning the ZooKeeper services. For example, zookeeper.

Start ZooKeeper

To install and configure HBase and other Hadoop ecosystem components, you must start the ZooKeeper service and the ZKFC:

su - zookeeper -c "export ZOOCFGDIR=/usr/hdp/current/zookeeper-server/conf ; export ZOOCFG=zoo.cfg; source /usr/hdp/current/zookeeper-server/conf/zookeeper-env.sh ; /usr/hdp/current/zookeeper-server/bin/zkServer.sh start"

/usr/hdp/current/hadoop-client/sbin/hadoop-daemon.sh start zkfc

  • $ZOOCFDIR is the directory where ZooKeeper server configs are stored.