Migrating Atlas data

In the HDP intermediate bits, Atlas uses the JanusGraph instead of Titan for data storage and processing. As part of the HDP intermediate bits upgrade, data must be migrated from Titan to JanusGraph.

Before upgrading HDP, estimate the amount of time that the export and import operations will take.

First, use one of the following methods to determine the number of entities in the Atlas metadata:

  • Within Solr, check the number of entities in the vertex_index collection.
  • Run the following Atlas metrics REST API query, with your values for user, password, and Atlas port:
    curl -g -X GET -u admin:admin -H "Content-Type: application/json" \
    -H"Cache-Control: no-cache" \
    "http://<atlas_server>:21000/api/atlas/admin/metrics" 

Then check the following Atlas property values:

  • atlas.migration.mode.batch.size: Recommended value is 3000.
  • atlas.migration.mode.workers: Value to be set depends on the number of cores on the node on which Atlas runs. Typically, set the value as (number of cores - 1) * 2. For an 8 core node, set this property to (8 - 1) * 2 = 14.
  • Atlas heap space.
Based on these values, the following estimates are for a node with a 4 GB RAM quad-core processor with both the Atlas and Solr servers on the same node:
  • Estimated duration for export: 2 million entities per hour.
  • Estimated duration for import: 0.75 million entities per hour.
Given 8 GB of heap space, the process runs faster. For example, based on the values above and 8 GB heap, the estimated duration for import increase to 2 million entities per hour.