This is the documentation for CDH 5.1.x. Documentation for other versions is available at Cloudera Documentation.

Configuring HBase in Pseudo-Distributed Mode

Note: You can skip this section if you are already running HBase in distributed mode, or if you intend to use only standalone mode.

Pseudo-distributed mode differs from standalone mode in that each of the component processes run in a separate JVM. It differs from distributed mode in that each of the separate processes run on the same server, rather than multiple servers in a cluster. This section also assumes you wish to store your HBase data in HDFS rather than on the local filesystem.

Note: Before you start

This section assumes you have already installed the HBase master and gone through the standalone configuration steps.
If the HBase master is already running in standalone mode, stop it as follows before continuing with pseudo-distributed configuration:
To stop the CDH 4 version: sudo service hadoop-hbase-master stop, or
To stop the CDH 5 version if that version is already running: sudo service hbase-master stop

Modifying the HBase Configuration

To enable pseudo-distributed mode, you must first make some configuration changes. Open /etc/hbase/conf/hbase-site.xml in your editor of choice, and insert the following XML properties between the <configuration> and </configuration> tags. The hbase.cluster.distributed property directs HBase to start each process in a separate JVM. The hbase.rootdir property directs HBase to store its data in an HDFS filesystem, rather than the local filesystem. Be sure to replace myhost with the hostname of your HDFS NameNode (as specified by fs.default.name or fs.defaultFS in your conf/core-site.xml file); you may also need to change the port number from the default (8020).

<property>
  <name>hbase.cluster.distributed</name>
  <value>true</value>
</property>
<property>
  <name>hbase.rootdir</name>
  <value>hdfs://myhost:8020/hbase</value>
</property>

Creating the /hbase Directory in HDFS

Before starting the HBase Master, you need to create the /hbase directory in HDFS. The HBase master runs as hbase:hbase so it does not have the required permissions to create a top level directory.

To create the /hbase directory in HDFS:

$ sudo -u hdfs hadoop fs -mkdir /hbase
$ sudo -u hdfs hadoop fs -chown hbase /hbase

Note: If Kerberos is enabled, do not use commands in the form sudo -u <user> hadoop <command>; they will fail with a security error. Instead, use the following commands: $ kinit <user> (if you are using a password) or $ kinit -kt <keytab> <principal> (if you are using a keytab) and then, for each command executed by this user, $ <command>

Enabling Servers for Pseudo-distributed Operation

After you have configured HBase, you must enable the various servers that make up a distributed HBase cluster. HBase uses three required types of servers:

Installing and Starting ZooKeeper Server

HBase uses ZooKeeper Server as a highly available, central location for cluster management. For example, it allows clients to locate the servers, and ensures that only one master is active at a time. For a small cluster, running a ZooKeeper node collocated with the NameNode is recommended. For larger clusters, contact Cloudera Support for configuration help.

Install and start the ZooKeeper Server in standalone mode by running the commands shown in the Installing the ZooKeeper Server Package and Starting ZooKeeper on a Single Server

Starting the HBase Master

After ZooKeeper is running, you can start the HBase master in standalone mode.

$ sudo service hbase-master start

Starting an HBase RegionServer

The RegionServer is the part of HBase that actually hosts data and processes requests. The region server typically runs on all of the slave nodes in a cluster, but not the master node.

To enable the HBase RegionServer On Red Hat-compatible systems:

$ sudo yum install hbase-regionserver

To enable the HBase RegionServer on Ubuntu and Debian systems:

$ sudo apt-get install hbase-regionserver

To enable the HBase RegionServer on SLES systems:

$ sudo zypper install hbase-regionserver

To start the RegionServer:

$ sudo service hbase-regionserver start

Verifying the Pseudo-Distributed Operation

After you have started ZooKeeper, the Master, and a RegionServer, the pseudo-distributed cluster should be up and running. You can verify that each of the daemons is running using the jps tool from the Oracle JDK, which you can obtain from here. If you are running a pseudo-distributed HDFS installation and a pseudo-distributed HBase installation on one machine, jps will show the following output:

$ sudo jps
32694 Jps
30674 HRegionServer
29496 HMaster
28781 DataNode
28422 NameNode
30348 QuorumPeerMain

You should also be able to navigate to http://localhost:60010 and verify that the local region server has registered with the master.

Installing and Starting the HBase Thrift Server

The HBase Thrift Server is an alternative gateway for accessing the HBase server. Thrift mirrors most of the HBase client APIs while enabling popular programming languages to interact with HBase. The Thrift Server is multiplatform and more performant than REST in many situations. Thrift can be run collocated along with the region servers, but should not be collocated with the NameNode or the JobTracker. For more information about Thrift, visit http://incubator.apache.org/thrift/.

To enable the HBase Thrift Server On Red Hat-compatible systems:

$ sudo yum install hbase-thrift

To enable the HBase Thrift Server on Ubuntu and Debian systems:

$ sudo apt-get install hbase-thrift

To enable the HBase Thrift Server on SLES systems:

$ sudo zypper install hbase-thrift

To start the Thrift server:

$ sudo service hbase-thrift start

Page generated September 3, 2015.