Running HBaseMapReduceIndexerTool

HBaseMapReduceIndexerTool is a MapReduce batch job driver that takes input data from an HBase table, creates Solr index shards, and writes the indexes to HDFS in a flexible, scalable, and fault-tolerant manner. It also supports merging the output shards into a set of live customer-facing Solr servers in SolrCloud.
Run the command as follows:
  • For package-based deployments:
    hadoop --config /etc/hadoop/conf \
    jar /usr/lib/hbase-solr/tools/hbase-indexer-mr-*-job.jar \
    --conf /etc/hbase/conf/hbase-site.xml -D 'mapreduce.job.user.classpath.first=true' \
    -Dmapreduce.map.java.opts="-Xmx512m" -Dmapreduce.reduce.java.opts="-Xmx512m" \
    --hbase-indexer-file $HOME/morphline-hbase-mapper.xml \
    --zk-host 127.0.0.1/solr --collection hbase-collection1 \
    --go-live --log4j src/test/resources/log4j.properties
  • For parcel-based deployments:
    hadoop --config /etc/hadoop/conf \
    jar /opt/cloudera/parcels/CDH/lib/hbase-solr/tools/hbase-indexer-mr-*-job.jar \
    --conf /etc/hbase/conf/hbase-site.xml -D 'mapreduce.job.user.classpath.first=true' \
    -Dmapreduce.map.java.opts="-Xmx512m" -Dmapreduce.reduce.java.opts="-Xmx512m" \
    --hbase-indexer-file $HOME/morphline-hbase-mapper.xml \
    --zk-host 127.0.0.1/solr --collection hbase-collection1 \
    --go-live --log4j src/test/resources/log4j.properties
  • To invoke the command-line help in a default parcels installation, use:
    hadoop jar /opt/cloudera/parcels/CDH/jars/hbase-indexer-mr-*-job.jar --help
  • To invoke the command-line help in a default packages installation, use:
    hadoop jar /usr/lib/hbase-solr/tools/hbase-indexer-mr-*-job.jar --help