Running the HBaseMapReduceIndexerTool

HBaseMapReduceIndexerTool is a MapReduce batch job driver that takes input data from an HBase table, creates Solr index shards, and writes the indexes to HDFS or local file system in a flexible, scalable, and fault-tolerant manner. It also supports merging the output shards into a set of live customer-facing Solr servers in SolrCloud.

important

You must run the indexer tool with the following command-line argument:

-D 'mapreduce.job.user.classpath.first=true'

Running the tool without this argument triggers the following error:

ERROR [main] org.apache.hadoop.mapred.YarnChild: Error running child : java.lang.NoSuchMethodError:
com.codahale.metrics.MetricRegistry.meter(Ljava/lang/String;Lcom/codahale/metrics/MetricRegistry$MetricSupplier;)Lcom/codahale/metrics/Meter;

Run the command as follows:

hadoop --config /etc/hadoop/conf \
jar /opt/cloudera/parcels/CDH/lib/hbase-solr/tools/hbase-indexer-mr-*-job.jar \
--conf /etc/hbase/conf/hbase-site.xml -D 'mapreduce.job.user.classpath.first=true' \
-Dmapreduce.map.java.opts="-Xmx512m" -Dmapreduce.reduce.java.opts="-Xmx512m" \
--hbase-indexer-file $HOME/morphline-hbase-mapper.xml \
--zk-host 127.0.0.1/solr --collection hbase-collection1 \
--go-live --log4j src/test/resources/log4j.properties

note

For development purposes, use the --dry-run option to run in local mode and print documents to stdout, instead of loading them to Solr. Using this option causes the morphline to run in the client process without submitting a job to MapReduce. Running in the client process provides quicker results during early trial and debug sessions.

To print diagnostic information, such as the content of records as they pass through morphline commands, enable TRACE log level diagnostics by adding the following to your log4j.properties file:

log4j.logger.org.kitesdk.morphline=TRACE
log4j.logger.com.ngdata=TRACE

The log4j.properties file can be passed using the --log4j command-line option.

To invoke the command-line help, use:

hadoop jar /opt/cloudera/parcels/CDH/jars/hbase-indexer-mr-*-job.jar --help