7.2. Writing Data to HBase with the Storm-HBase Connector

The storm-hbase connector supports the following key features:

  • Apache HBase 0.96 and above

  • Incrementing counter columns

  • Tuples failure if an update to an HBase table fails

  • Ability to group puts in a single batch

  • Writing to Kerberized HBase clusters (for more information, see Configuring Connectors for a Secure Cluster)

The storm-hbase connector enables Storm developers to collect several PUTS in a single operation and write to multiple HBase column families and counter columns. A PUT is an HBase operation that inserts data into a single HBase cell. Use the HBase client's write buffer to automatically batch: hbase.client.write.buffer. The primary interface in the storm-hbase connector is the org.apache.storm.hbase.bolt.mapper.HBaseMapper interface. However, the default implementation, SimpleHBaseMapper, writes a single column family. Storm developers can implement the HBaseMapper interface themselves or extend SimpleHBaseMapper if they want to change or override this behavior.

 

Table 1.15. SimpleHBaseMapper Methods

SimpleHBaseMapper Method

Description

withRowKeyField

Specifies the row key for the target HBase row. A row key uniquely identifies a row in HBase.

withColumnFields

Specifies the target HBase column.

withCounterFields

Specifies the target HBase counter.

withColumnFamily

Specifies the target HBase column family.


Example

The following example specifies the 'word' tuple as the row key, adds an HBase column for the tuple 'word' field, adds an HBase counter column for the tuple 'count' field, and writes data to the 'cf' column family.

SimpleHBaseMapper mapper = new SimpleHBaseMapper()
 .withRowKeyField("word")
 .withColumnFields(new Fields("word"))
 .withCounterFields(new Fields("count"))
 .withColumnFamily("cf"); 

The storm-hbase connector supports the following versions of HBase:

  • 0.96

  • 0.98

Limitations

The current version of the storm-hbase connector has the following limitations:

  • HBase table must be predefined

  • Cannot dynamically add new HBase columns; can write to only one column family at a time

  • Assumes that hbase-site.xml is in the $CLASSPATH environment variable

  • Tuple field names must match HBase column names

  • Does not support the Trident API

  • Supports writes but not lookups