Lily HBase batch indexing for Cloudera Search

You can batch index HBase tables using the Lily HBase batch indexer MapReduce job (HBaseMapReduceIndexerTool). This batch indexing does not require HBase replication or the Lily HBase Indexer Service. Subsequently you do not need to register a Lily HBase Indexer configuration with the Lily HBase Indexer Service.

The indexer supports flexible, custom, application-specific rules to extract, transform, and load HBase data into Solr. Solr search results can contain columnFamily:qualifier links back to the data stored in HBase. This way, applications can use the search result set to directly access matching raw HBase cells.

The following procedures demonstrate creating a small HBase table and using the HBaseMapReduceIndexerTool to index the table into a collection: