Non-Ambari Cluster Installation Guide
Also available as:
PDF
loading table of contents...

Configure Atlas to Use SolrCloud

[Important]Important

Currently, Atlas supports Solr in cloud mode only. HTTP mode is not supported. For more information, refer to the Solr documentation at https://cwiki.apache.org/confluence/display/solr/SolrCloud.

Prerequisites for Switching to SolrCloud as the Indexing Backend for Atlas

  • Memory: Solr is both memory and CPU intensive. Ensure that the server that is running Solr has adequate memory, CPU, and disk space. Solr works well with 32GB of RAM. Provide as much memory as possible to the Solr process.

  • Disk: If you must store a large number of entities, make sure you have at least 500 GB of free space in the volume where Solr stores the index data.

  • SolrCloud support for replication and sharding: Hortonworks recommends that you use SolrCloud with at least 2 Solr nodes running on different servers with replication enabled. ZooKeeper must be installed and configured with 3 to 5 ZooKeeper nodes for SolrCloud.

  • Clear the storage backend data: When you switch the indexing backend, you must clear the storage backend data. Otherwise, there might be discrepancies between the storage and indexing backends because when you switch the indexing backend, indexing data is lost. This might cause fulltext queries to not work properly on the existing data.

    To clear data for BerkeleyDB, delete the ATLAS_HOME/data/berkeley directory.

    To clear data for HBase, in the HBase shell, run:

    disable titan
    drop titan

To configure the Titan graph database to use SolrCloud as the indexing backend on unsecured clusters:

  1. Download and install Solr version 5.2.1. You can download Solr 5.2.1 from here. Documentation is included in the download.

  2. Start Solr in cloud mode.

    SolrCloud mode uses a ZooKeeper service as a central location for cluster management. For a small cluster, you can use an existing ZooKeeper quorum. For larger clusters, use a separate ZooKeeper quorum with at least 3 servers.

  3. If the Atlas instance and the Solr instance are located on 2 different hosts, copy the required configuration files from the ATLAS_HOME/conf/solr directory on the Atlas instance host to the Solr instance host. Then run the following commands from the SOLR_HOME directory to create collections in Solr that correspond to the indexes that Atlas uses:

    bin/solr create -c vertex_index -d SOLR_CONF -shards <#numShards> -replicationFactor <#replicationFactors>
    bin/solr create -c edge_index -d SOLR_CONF -shards <#numShards> -replicationFactor <#replicationFactors>
    bin/solr create -c fulltext_index -d SOLR_CONF -shards <#numShards> -replicationFactor <#replicationFactors>

    Where SOLR_CONF refers to the directory where the Solr configuration files have been copied to on Solr host from the Atlas host.

    If numShards and replicationFactor are not specified, they default to 1, which is adequate if you are testing Solr with Atlas on a single node instance. Otherwise, specify the number of hosts that are in the Solr cluster and the maxShardsPerNode configuration.

    [Important]Important

    The number of shards cannot exceed the total number of Solr nodes in your SolrCloud cluster.

  4. Set the following Atlas configuration parameters in the application.properties file that is located in the /conf directory:

    atlas.graph.index.search.backend=solr5
    atlas.graph.index.search.solr.mode=cloud
    atlas.graph.index.search.solr.zookeeper-url=<ZooKeeper_quorum>

    Where ZooKeeper_quorum is the ZooKeeper quorum that is set up for Solr as a comma-separated value. For example: 10.1.6.4:2181,10.1.6.5:2181

  5. Restart Atlas.

    /usr/hdp/current/atlas-server/bin/atlas_stop.py
    /usr/hdp/current/atlas-server/bin/atlas_start.py