This section describes installing and testing Apache ZooKeeper, a centralized tool for providing services to highly distributed systems.
HDFS and YARN depend on ZooKeeper, so install ZooKeeper first.
On all nodes of the cluster that you have identified as ZooKeeper servers, type:
For RHEL/CentOS/Oracle Linux
yum install zookeeper
for SLES
zypper install zookeeper
For Ubuntu and Debian:
apt-get install zookeeper
Note | |
---|---|
Grant the
|
Note | |
---|---|
Before starting the following steps, refer to Setting up Security for Manual Installs. |
(Optional) To secure ZooKeeper with Kerberos, perform the following steps on the host that runs KDC (Kerberos Key Distribution Center):
Start the
kadmin.local
utility:/usr/sbin/kadmin.local
Create a principal for ZooKeeper:
sudo kadmin.local -q 'addprinc zookeeper/<ZOOKEEPER_HOSTNAME>@STORM.EXAMPLE.COM'
Create a keytab for ZooKeeper:
sudo kadmin.local -q "ktadd -k /tmp/zk.keytab zookeeper/<ZOOKEEPER_HOSTNAME>@STORM.EXAMPLE.COM"
Copy the keytab to all ZooKeeper nodes in the cluster.
Note Verify that only the ZooKeeper and Storm operating system users can access the ZooKeeper keytab.
Add the following properties to the
zoo.cfg
configuration file located at/etc/zookeeper/conf
:authProvider.1 = org.apache.zookeeper.server.auth.SASLAuthenticationProvider kerberos.removeHostFromPrincipal = true kerberos.removeRealmFromPrincipal = true
Note | |
---|---|
Grant the zookeeper user shell access on Ubuntu and Debian.
|
Create directories and configure ownership and permissions on the appropriate hosts as described below.
If any of these directories already exist, we recommend deleting and recreating them. Use the following instructions to create appropriate directories:
We strongly suggest that you edit and source the bash script files included with the HDP companion files.
Alternatively, you can also copy the contents to your ~/.bash_profile to set up these environment variables in your environment.
Execute the following commands on all nodes:
mkdir -p $ZOOKEEPER_LOG_DIR;chown -R $ZOOKEEPER_USER:$HADOOP_GROUP $ZOOKEEPER_LOG_DIR; chmod -R 755 $ZOOKEEPER_LOG_DIR; mkdir -p $ZOOKEEPER_PID_DIR;chown -R $ZOOKEEPER_USER:$HADOOP_GROUP $ZOOKEEPER_PID_DIR; chmod -R 755 $ZOOKEEPER_PID_DIR; mkdir -p $ZOOKEEPER_DATA_DIR; chmod -R 755 $ZOOKEEPER_DATA_DIR;chown -R $ZOOKEEPER_USER:$HADOOP_GROUP $ZOOKEEPER_DATA_DIR
where:
$ZOOKEEPER_USER is the user owning the ZooKeeper services. For example, zookeeper.
$ZOOKEEPER_LOG_DIR is the directory to store the ZooKeeper logs. For example, /var/log/zookeeper.
$ZOOKEEPER_PID_DIR is the directory to store the ZooKeeper process ID. For example, /var/run/zookeeper.
$ZOOKEEPER_DATA_DIR is the directory where ZooKeeper will store data. For example, /grid/hadoop/zookeeper/data.
Initialize the ZooKeeper data directories with the 'myid' file. Create one file per ZooKeeper server, and put the number of that server in each file:
vi $ZOOKEEPER_DATA_DIR/myid
In the myid file on the first server, enter the corresponding number: 1
In the myid file on the second server, enter the corresponding number: 2
In the myid file on the second server, enter the corresponding number: 3
There are several configuration files that need to be set up for ZooKeeper.
Extract the ZooKeeper configuration files to a temporary directory.
The files are located in the configuration_files/zookeeper directories where you decompressed the companion files.
Modify the configuration files.
In the respective temporary directories, locate the following files and modify the properties based on your environment. Search for TODO variables in the files for the properties to replace.
You must make changes to zookeeper-env.sh specific to your environment.
Edit zoo.cfg and modify the following properties:
dataDir=$zk.data.directory.path server.1=$zk.server1.full.hostname:2888:3888 server.2=$zk.server2.full.hostname:2888:3888 server.3=$zk.server3.full.hostname:2888:3888
Edit hbase-site.xml and modify the following properties:
<property> <name>hbase.zookeeper.quorum</name> <value>$zk.server1.full.hostname,$zk.server2.full.hostname,$zk.server3. full.hostname</value> <description>Comma separated list of ZooKeeper servers (match to what is specified in zoo.cfg but without portnumbers)</description> </property>
Copy the configuration files
On all hosts create the config directory:
rm -r $ZOOKEEPER_CONF_DIR ; mkdir -p $ZOOKEEPER_CONF_DIR ;
Copy all the ZooKeeper configuration files to the $ZOOKEEPER_CONF_DIR directory.
Set appropriate permissions:
chmod a+x $ZOOKEEPER_CONF_DIR/; chown -R $ZOOKEEPER_USER: $HADOOP_GROUP $ZOOKEEPER_CONF_DIR/../; chmod -R 755 $ZOOKEEPER_CONF_DIR/../
Note:
$ZOOKEEPER_CONF_DIR is the directory to store the ZooKeeper configuration files. For example, /etc/zookeeper/conf.
$ZOOKEEPER_USER is the user owning the ZooKeeper services. For example, zookeeper.
To install and configure HBase and other Hadoop ecosystem components, you must start the ZooKeeper service and the ZKFC:
su - zookeeper -c "export ZOOCFGDIR=/usr/hdp/current/zookeeper-server/conf ;
export ZOOCFG=zoo.cfg; source
/usr/hdp/current/zookeeper-server/conf/zookeeper-env.sh ;
/usr/hdp/current/zookeeper-server/bin/zkServer.sh start"
/usr/hdp/current/hadoop-client/sbin/hadoop-daemon.sh start zkfc
$ZOOCFDIR is the directory where ZooKeeper server configs are stored.