Dynamic handling of failure in updating index
The JanusGraph database transaction can fail in a couple of instances and this can be handled dynamically using a separate configuration.
There are a couple of scenarios through which JanusGraph transactions can fail, including, while indexing into HBASE and Solr. When you create any HIVE entities, the first level of storage includes HBASE and the corresponding indexes are logged in Solr.
While indexing into Solr, when the indexes are logged, it might be partially successful, owing to the graph details being present in HBASE. There might be inconsistencies with logged indexes for the failed transactions at Solr.
These scenarios might lead to a mismatch between the basic and advanced search results in Atlas.
Re-indexing the data is the only option but it involves consumption of almost the entire database storage.
For better system resilience and data tracking, a transaction log processor option named write-ahead is implemented. When enabled, JanusGraph maintains all the transaction log information which can be used to recover indices in the event of failures. Even though the log data information consumes more storage, having a robust system coupled with data movement maintenance, overrides the need to have additional storage systems.
The write-ahead configuration inspects the Solr health and registers the time in the event of any issues with Solr. Once Solr is up and running, the indices can be recovered from the previously noted time.
About JanusGraph
JanusGraph is a scalable graph database optimized for storing and querying graphs containing hundreds of billions of vertices and edges distributed across a multi-machine cluster.
Apache Atlas uses the JanusGraph database as its metadata repository. This graph is used to show the interconnected relationships between data sources; the data sets they host; the business meaning of the data elements within each data set; the classification of these elements in terms of quality, confidentiality, retention; who (people and processes) are using them and for which purposes.
JanusGraph uses a pluggable persistence store to save the metadata content and a search index for its search API. Apache Atlas can take advantage of this configurability to support a range of size, scalability, and performance requirements.