Indexing data with MapReduceIndexerTool in Solr backup format
MapReduceIndexerTool (MRIT) is capable of batch indexing a dataset and provide the output in the format of Solr backups, using morphlines. This backup can then be ingested into Solr using a backup opration.
The MapReduceIndexerTool (MRIT) backup format feature addresses the dilemma of ingesting indexes produced by MRIT jobs into Solr:
- Near-real-time (NRT) ingestion using the
--go-live
option is resource-intensive and involves merging indexes. - Batch indexing requires shutting down the Solr server.
solrctl
command line utility. This method is significantly less
resource intensive on the part of Solr compared to NRT with --go-live
.
Restoring the backup results in a new collection which can be queried directly or put
behind an alias.