Apache Ambari Upgrade
Also available as:
PDF

Migrating Atlas Data

In HDP 3.0, Atlas uses the JanusGraph instead of Titan for data storage and processing. As part of the HDP 3.0 upgrade, data must be migrated from Titan to JanusGraph. Perform the following steps to migrate the Atlas metadata from Titan to JanusGraph.

Before upgrading HDP, use one of the following methods to determine the size of the Atlas metadata:

  • Click SEARCH on the Atlas web UI, then slide the green toggle button from Basic to Advanced. Enter the following query in the Search by Query box, then click Search.

    Asset select count()

  • Run the following Atlas metrics REST API query:

    curl -g -X GET -u admin:admin -H "Content-Type: application/json"
    -H"Cache-Control: no-cache"
    "http://<atlas_server>:21000/api/atlas/admin/metrics"

Either of these methods returns the number of Atlas entities, which can be used to estimate the time required to export and import the Atlas metadata. This time varies depending on the cluster configuration. The following estimates are for a node with a 4 GB RAM quad-core processor with both the Atlas and Solr servers on the same node:

  • Estimated duration for export: 2 million entities per hour.

  • Estimated duration for import: 0.75 million entities per hour.

  1. In the Ambari Web UI, click Atlas, then select Actions > Stop.

  2. Make sure that HBase is Running. If it is not, in the Ambari Web UI, click HBase, then select Actions > Start.

  3. SSH to the host where your Atlas Metadata Server is running.

  4. Start exporting Atlas metadata, using the following command format:

    python /usr/hdp/3.0.0.0-<build_number>/atlas/tools/migration-exporter/atlas_migration_export.py -d <output directory>

For example, export the metadata to an output directory called /atlas_metadata.

While running, the Atlas migration tool prevents Atlas use, and blocks all REST APIs and Atlas hook notification processing. As described previously, the time it takes to export the Atlas metadata depends on the number of entities and your cluster configuration. You can use the following command to display the export status:

tail -f /var/log/atlas/atlas-migration-exporter.log

When the export is complete, a file named atlas-migration-data.json is created in the output directory specified using the -d parameter. This file contains the exported Atlas entity data.

The HDP upgrade starts Atlas automatically, which initiates the migration of the uploaded HDP-2.x Atlas metadata into HDP-3.x. During the migration import process, Atlas blocks all REST API calls and Atlas hook notification processing. In order for this migration process to succeed, Atlas must be configured with the location of the exported data. Use the following steps to configure the atlas.migration.data.filename property.

  1. In Ambari, select Services > Atlas > Configs > Advanced > Custom application-properties.

  2. Click Add Property, and add the atlas.migration.data.filename property. Set the value to the location of the directory containing your exported Atlas metadata.

    For example:

  3. Save the configuration.

  4. Click Services > Atlas > Restart > Restart All Affected.

  5. Since the configuration has been changed, you need to re-run the Atlas service check by clicking Actions > Run Service Check.

[Note]Note

During the HDP upgrade, you can use the following Atlas API URL to display the migration status:

http://[atlas_server]:21000/api/atlas/admin/status

The migration status is displayed in the browser window:

{"Status":"Migration","currentIndex":139,"percent":67,"startTimeUTC":"2018-04-06T00:54:53.399Z"}

Next Steps

Perform the Upgrade