Creating and configuring the HBaseSinkFunction

The HBase sink instance is always created as a subclass of the HBaseSinkFunction.

When users create the subclass they have to provide required and optional parameters through the constructor of the superclass, the HBaseSinkFunction itself.

Required parameters:
  • Table name (the table itself must be created before the streaming job starts)
Optional parameters:
  • Hadoop Configuration object for setting up the HBase client
  • HBaseOptions for minimal connection configuration

The optional parameters are configured automatically by the Cloudera platform and should only be used for setting up custom HBase connections.

To configure the operation buffering parameters, you need to use the HBaseSinkFunction.setWriteOptions() method. You can set the following configuration parameters using the HBaseWriteOptions object:
  • setBufferFlushMaxSizeInBytes : Maximum byte size of the buffered operations before flushing
  • setBufferFlushMaxRows : Maximum number of operations buffered before flushing
  • setBufferFlushIntervalMillis : Maximum time interval before flushing
See the following example for setting up an HBase sink running on the Cloudera platform:
// Define a new HBase sink for writing to the ITEM_QUERIES table
HBaseSinkFunction<QueryResult> hbaseSink = new HBaseSinkFunction<QueryResult>("ITEM_QUERIES") {
public void executeMutations(QueryResult qresult, Context context, BufferedMutator mutator) throws Exception {
  // For each incoming query result we create a Put operation
  Put put = new Put(Bytes.toBytes(qresult.queryId));
  put.addColumn(Bytes.toBytes("itemId"), Bytes.toBytes("str"),   Bytes.toBytes(qresult.itemInfo.itemId));
  put.addColumn(Bytes.toBytes("quantity"), Bytes.toBytes("int"), Bytes.toBytes(qresult.itemInfo.quantity));
// Configure our sink to not buffer operations for more than a second (to reduce end-to-end latency)
// Add the sink to our query result streamqueryResultStream.addSink(hbaseSink);