Configure Atlas to Use SolrCloud
Important | |
---|---|
Currently, Atlas supports Solr in cloud mode only. HTTP mode is not supported. For more information, refer to the Solr documentation at https://cwiki.apache.org/confluence/display/solr/SolrCloud. |
Prerequisites for Switching to SolrCloud as the Indexing Backend for Atlas
Memory: Solr is both memory and CPU intensive. Ensure that the server that is running Solr has adequate memory, CPU, and disk space. Solr works well with 32GB of RAM. Provide as much memory as possible to the Solr process.
Disk: If you must store a large number of entities, make sure you have at least 500 GB of free space in the volume where Solr stores the index data.
SolrCloud support for replication and sharding: Hortonworks recommends that you use SolrCloud with at least 2 Solr nodes running on different servers with replication enabled. ZooKeeper must be installed and configured with 3 to 5 ZooKeeper nodes for SolrCloud.
Clear the storage backend data: When you switch the indexing backend, you must clear the storage backend data. Otherwise, there might be discrepancies between the storage and indexing backends because when you switch the indexing backend, indexing data is lost. This might cause fulltext queries to not work properly on the existing data.
To clear data for BerkeleyDB, delete the
ATLAS_HOME/data/berkeley
directory.To clear data for HBase, in the HBase shell, run:
disable titan drop titan
To configure the Titan graph database to use SolrCloud as the indexing backend on unsecured clusters:
Download and install Solr version 5.2.1. You can download Solr 5.2.1 from The Apache Organization. Documentation is included in the download.
Start Solr in cloud mode.
SolrCloud mode uses a ZooKeeper service as a central location for cluster management. For a small cluster, you can use an existing ZooKeeper quorum. For larger clusters, use a separate ZooKeeper quorum with at least 3 servers.
If the Atlas instance and the Solr instance are located on 2 different hosts, copy the required configuration files from the
ATLAS_HOME/conf/solr
directory on the Atlas instance host to the Solr instance host. Then run the following commands from the SOLR_HOME directory to create collections in Solr that correspond to the indexes that Atlas uses:bin/solr create -c vertex_index -d SOLR_CONF -shards <#numShards> -replicationFactor <#replicationFactors> bin/solr create -c edge_index -d SOLR_CONF -shards <#numShards> -replicationFactor <#replicationFactors> bin/solr create -c fulltext_index -d SOLR_CONF -shards <#numShards> -replicationFactor <#replicationFactors>
Where
SOLR_CONF
refers to the directory where the Solr configuration files have been copied to on Solr host from the Atlas host.If
numShards
andreplicationFactor
are not specified, they default to 1, which is adequate if you are testing Solr with Atlas on a single node instance. Otherwise, specify the number of hosts that are in the Solr cluster and themaxShardsPerNode
configuration.Important The number of shards cannot exceed the total number of Solr nodes in your SolrCloud cluster.
Set the following Atlas configuration parameters in the
application.properties
file that is located in the/conf
directory:atlas.graph.index.search.backend=solr5 atlas.graph.index.search.solr.mode=cloud atlas.graph.index.search.solr.zookeeper-url=<ZooKeeper_quorum>
Where
ZooKeeper_quorum
is the ZooKeeper quorum that is set up for Solr as a comma-separated value. For example:10.1.6.4:2181,10.1.6.5:2181
Restart Atlas.
/usr/hdp/current/atlas-server/bin/atlas_stop.py /usr/hdp/current/atlas-server/bin/atlas_start.py