Dynamic handling of failure in updating index
The JanusGraph database transaction might fail in certain scenarios and this failure can be handled dynamically using a specific configuration.
There are certain scenarios in which JanusGraph transactions might fail, including, while indexing into Solr. When you create HIVE entities, the first level of storage includes persisting into HBase and the corresponding indexes are logged in Solr.
The index updation can be partially successful when data gets stored in HBase but Solr indexing fails. There could be inconsistencies with logged indexes for the failed transactions in Solr. These scenarios might lead to a mismatch between the basic and advanced search results in Atlas. Re-indexing the data is an option which can reindex the entire data; however, it is time consuming.
To have a better system resilience and evolution of data tracking, a transaction log processor option named write-ahead is implemented. When enabled, JanusGraph maintains all the transaction log information which can be used to recover indices in the event of failures as described earlier. The log data information consumes more storage space but having a robust system coupled with maintaining data movement overrides the need to have additional storage systems.
The write-ahead configuration inspects the Solr health and registers the time in the event of any issues with Solr. Once Solr is up and running, the indices can be recovered from the previously noted time.
About JanusGraph
Learn what JanusGraph is and how Apache Atlas uses the JanusGraph database to support search index capabilities.
JanusGraph is a scalable graph database optimized for storing and querying graphs containing hundreds of billions of vertices and edges distributed across a multi-machine cluster.
Apache Atlas uses the JanusGraph database at the heart of its metadata repository. This graph is used to show the interconnected relationships between data sources; the data sets they host; the business meaning of the data elements within each data set; the classification of these elements in terms of quality, confidentiality, retention; who (people and processes) are using them and for which purposes.
JanusGraph uses a pluggable persistence store to save the metadata content and a search index for its search API. Apache Atlas takes advantage of this configurability to support a range of size, scalability, and performance requirements.