Configuring HBase in Pseudo-Distributed Mode

Pseudo-distributed mode differs from standalone mode in that each of the component processes run in a separate JVM. It differs from distributed mode in that each of the separate processes run on the same server, rather than multiple servers in a cluster. This section also assumes you wish to store your HBase data in HDFS rather than on the local filesystem.

Modifying the HBase Configuration

To enable pseudo-distributed mode, you must first make some configuration changes. Open /etc/hbase/conf/hbase-site.xml in your editor of choice, and insert the following XML properties between the <configuration> and </configuration> tags. The hbase.cluster.distributed property directs HBase to start each process in a separate JVM. The hbase.rootdir property directs HBase to store its data in an HDFS filesystem, rather than the local filesystem. Be sure to replace myhost with the hostname of your HDFS NameNode (as specified by fs.default.name or fs.defaultFS in your conf/core-site.xml file); you may also need to change the port number from the default (8020).

<property>
  <name>hbase.cluster.distributed</name>
  <value>true</value>
</property>
<property>
  <name>hbase.rootdir</name>
  <value>hdfs://myhost:8020/hbase</value>
</property>

Creating the /hbase Directory in HDFS

Before starting the HBase Master, you need to create the /hbase directory in HDFS. The HBase master runs as hbase:hbase so it does not have the required permissions to create a top level directory.

To create the /hbase directory in HDFS:

$ sudo -u hdfs hadoop fs -mkdir /hbase
$ sudo -u hdfs hadoop fs -chown hbase /hbase

Enabling Servers for Pseudo-distributed Operation

After you have configured HBase, you must enable the various servers that make up a distributed HBase cluster. HBase uses three required types of servers:

Installing and Starting ZooKeeper Server

HBase uses ZooKeeper Server as a highly available, central location for cluster management. For example, it allows clients to locate the servers, and ensures that only one master is active at a time. For a small cluster, running a ZooKeeper node collocated with the NameNode is recommended. For larger clusters, contact Cloudera Support for configuration help.

Install and start the ZooKeeper Server in standalone mode by running the commands shown in the Installing the ZooKeeper Server Package and Starting ZooKeeper on a Single Server

Starting the HBase Master

After ZooKeeper is running, you can start the HBase master in standalone mode.

$ sudo service hbase-master start

Starting an HBase RegionServer

The RegionServer is the HBase process that actually hosts data and processes requests. The RegionServer typically runs on all HBase nodes except for the node running the HBase master node.

To enable the HBase RegionServer On RHEL-compatible systems:

$ sudo yum install hbase-regionserver

To enable the HBase RegionServer on Ubuntu and Debian systems:

$ sudo apt-get install hbase-regionserver

To enable the HBase RegionServer on SLES systems:

$ sudo zypper install hbase-regionserver

To start the RegionServer:

$ sudo service hbase-regionserver start

Verifying the Pseudo-Distributed Operation

After you have started ZooKeeper, the Master, and a RegionServer, the pseudo-distributed cluster should be up and running. You can verify that each of the daemons is running using the jps tool from the Oracle JDK, which you can obtain from here. If you are running a pseudo-distributed HDFS installation and a pseudo-distributed HBase installation on one machine, jps will show the following output:

$ sudo jps
32694 Jps
30674 HRegionServer
29496 HMaster
28781 DataNode
28422 NameNode
30348 QuorumPeerMain

You should also be able to go to http://localhost:60010 and verify that the local RegionServer has registered with the Master.

Installing and Starting the HBase Thrift Server

The HBase Thrift Server is an alternative gateway for accessing the HBase server. Thrift mirrors most of the HBase client APIs while enabling popular programming languages to interact with HBase. The Thrift Server is multiplatform and more performant than REST in many situations. Thrift can be run collocated along with the RegionServers, but should not be collocated with the NameNode or the JobTracker. For more information about Thrift, visit http://thrift.apache.org/.

To enable the HBase Thrift Server On RHEL-compatible systems:

$ sudo yum install hbase-thrift

To enable the HBase Thrift Server on Ubuntu and Debian systems:

$ sudo apt-get install hbase-thrift

To enable the HBase Thrift Server on SLES systems:

$ sudo zypper install hbase-thrift

To start the Thrift server:

$ sudo service hbase-thrift start

See also Accessing HBase by using the HBase Shell, Using MapReduce with HBase and Troubleshooting HBase.