Tuning replication
If you have sufficient additional hardware, you may add more replicas for a linear boost of query throughput and to prevent data loss.
Note, that adding replicas may slow write performance on the first replica, but otherwise this should have minimal negative consequences.
Transaction log replication
Cloudera Search supports configurable transaction log replication levels for replication logs stored in HDFS. Cloudera recommends leaving the value unchanged at 3 or, barring that, setting it to at least 2.
tlogDfsReplication
setting in solrconfig.xml
. The
tlogDfsReplication
is a setting in the updateLog
settings area. An excerpt of the solrconfig.xml
file where the transaction
log replication factor is set is as follows:
<updateHandler class="solr.DirectUpdateHandler2">
<!-- Enables a transaction log, used for real-time get, durability, and
solr cloud replica recovery. The log can grow as big as
uncommitted changes to the index, so use of a hard autoCommit
is recommended (see below).
"dir" - the target directory for transaction logs, defaults to the
solr data directory. -->
<updateLog>
<str name="dir">${solr.ulog.dir:}</str>
<int name="tlogDfsReplication">${solr.ulog.tlogDfsReplication:3}</int>
<int name="numVersionBuckets">${solr.ulog.numVersionBuckets:65536}</int>
</updateLog>
The default replication level is 3. For clusters with fewer than three DataNodes (such as proof-of-concept clusters), reduce this number to the amount of DataNodes in the cluster. Changing the replication level only applies to new transaction logs.
Initial testing shows no significant performance regression for common use cases.