How the migration process works

Before migrating from Navigator to Apache Atlas, review the migration paths. You must extract, transform, and import the content from Navigator to Apache Atlas. After the migration is completed, services start producing metadata for Atlas and audit information for Ranger.

There are two main paths that describe a Navigator-to-Atlas migration scenario:

  • Upgrading Cloudera Manager to CDP 7.0.0 and upgrading all your CDH clusters to CDP Runtime. In this case, you can stop Cloudera Navigator after migrating its content to Atlas.

  • Upgrading Cloudera Manager to CDP 7.0.0 but managing some or all of your existing CDH clusters as CDH 5.x or 6.x. In this case, the CDP cluster running Cloudera Navigator continues to extract metadata and audit information from existing CDH clusters and runs Atlas and Ranger to support metadata and audit extraction from new or potential new CDP runtime clusters.

In both the scenarios, you shall complete the upgrade of Cloudera Manager first. While Cloudera Manager is upgrading, Navigator pauses collection of metadata and audit information from cluster activities. After the upgrade is complete, Navigator processes the queued metadata and audit information.

In the timeline diagrams that follow, the blue color indicates steps and because you trigger the steps manually, you can control their timing.

The migration of Navigator content to Atlas occurs during the upgrade from CDH to CDP. The migration involves three phases:

  • Extracting metadata from Navigator

    The Atlas installation includes a script (cnav.sh) that calls Navigator APIs to extract all technical and business metadata from Navigator. The process takes about 4 minutes per one million Navigator entities. The script compresses the result and writes it to the local file system on the host where the Atlas server is installed. Plan for about 100 MB for every one million Navigator entities; lower requirements for larger numbers of entities.

  • Transforming the Navigator metadata into a form that Atlas can consume. Including time and resources. The Atlas installation includes a script (nav2atlas.sh) that converts the extracted content and again compresses it and writes it to the local file system. This process takes about 1.5 minutes per million Navigator entities. The script compresses the results and writes it to the local file system on the host where the Atlas server is installed. Plan for about 100 to 150 MB for every million Navigator entities; higher end of the range for larger numbers of entities.
  • Importing the transformed metadata into Atlas.

    After the CDP upgrade completes, Atlas starts in "migration mode," where it waits to find the transformed data file and does not collect metadata from cluster services. When the transformation is complete, Atlas begins importing the content, creating equivalent Atlas entities for each Navigator entity. This process takes about 35 minutes for a million Navigator entities, counting only the entities that are migrated into Atlas.

To make sure you do not miss metadata for cluster operations, provide time after the Cloudera Manager upgrade and before the CDH upgrade for Navigator, to process all the metadata produced by CDH service operations. See Navigator Extraction Timing for more information.

You can start extracting metadata from Navigator as soon as the CDP parcel is deployed on the cluster. After CDP is started, Navigator no longer collects metadata or audit information from the services on that cluster; instead services produce metadata for Atlas and audit information for Ranger.

Migration from Navigator to Atlas can be run only in non-HA mode

Migration import works only with a single Atlas instance.

If Atlas has been set up in HA mode before migration, you must remove the additional instances of Atlas, so that Atlas service has only one instance.

Later, start Atlas in the migration mode and complete the migration. Perform the necessary checks to verify if the data has been imported correctly.

Restart Atlas in non-migration mode.

  • If you have Atlas setup in HA mode, retain only one instance and remove the others.
  • Ensure that the ZIP files generated as an output from the Nav2Atlas conversion are placed at the same location where the Atlas node is present.