Assumptions and prerequisites

BBefore you transition your cluster to CDP Private Cloud Base or migrating content from Navigator to Apache Atlas, ensure that you have collected all the credentials and set expectations for the time required for completing the transition. The prerequisites in this section help you to prepare in advance to transition.

In addition to the prerequisites outlined for the Cloudera Manager and CDP upgrades, you'll need the following for the Navigator to Atlas transition:

  • Deleted entities in Navigator. Check the Navigator Administration page to make sure that a successful purge has run recently. If it hasn't, consider running a purge before the transition. See Managing Metadata Storage with Purge.
  • Role to host assignments. Before you begin upgrading to CDP, make a plan for where you will install the Atlas server. In addition, Atlas depends upon HBase, Kafka, and Solr services; your plan should include host assignments for installing the components of these services. See Runtime Cluster Hosts and Role Assignments.
  • Resources for Atlas service. Atlas requires 16 GB of Java heap (Atlas Max Heapsize property) and 4 Solr shards (Initial Solr Shards for Atlas Collections property). Make sure the host you choose for Atlas has enough resources for all the services' requirements.
  • Resources for Solr service. During transition, Solr running to serve as Atlas' index requires 12 GB of Java heap (Java Heap Size of Solr Server in Bytes property). You can reset this back to Make sure the host you choose for Atlas has enough resources for all the services' requirements.
  • Navigator credentials. The transition requires the username and password for a Navigator user with administrator privileges.
  • Local disk space needed for intermediate processing. The first two phases of the Navigator-to-Atlas transition produce intermediate files in /tmp in the local file system where Atlas is installed. See Estimating the time and resources needed for transition.
  • Local disk space for transition staging files. The first two phases of the Navigator-to-Atlas transition produce staging files on the local disk where Atlas is installed. See Estimating the time and resources needed for transition.


  • Time estimates for transition phases. Each phase of the transition runs independently from the upgrade. You can trigger them to run when convenient. See Estimating the time and resources needed for transition.

Estimating the time and resources needed for transition

While the cluster is starting up, you can plan for and start the transition process.

  1. Inspect Navigator installation to determine the number of Navigator entities that will be transitioned. See How many Navigator entities are transitioned?
  2. Estimate the time and disk space required for each phase of the transition.

    The following transition rates are approximate and depend on the resources available on the Atlas host and other unknown factors. Note that the number of entities actually imported may be considerably less that the number of entities extracted. The transition process discards HDFS entities that are not referenced by processes that are transitioned (Hive, Impala, Spark).

    Transition Phase Transition Rate Disk Space Output File Size Trial Data Points
    Extraction 4 minutes / 1 million entities 100 MB / 1 million entities, less as volumes increase 65 MB / 1 million entities 10 million entities takes about 30 minutes; 256 million takes about 18 hours.
    Transformation 1.5 minutes / 1 million entities 100 to 150 MB / 1 million entities, higher end of range with larger volumes 150 MB / 1 million entities 10 million entities takes about 20 minutes; 256 million takes about 6 hours.
    Import 35 minutes / 1 million migrated entities N/A N/A 10 million entities takes about 4 hours; 256 million takes about 6 days.

How many Navigator entities are transitioned?

When preparing to transition content from Navigator to Atlas, it helps in planning the transition to know how many Navigator entities will be extracted. Use Navigator's search facets to figure this out.

To determine the number of Navigator entities extracted for extraction and transformation phases of the transition:

  1. Log into Navigator.
  2. In the Cluster Group facet in the left panel, select the cluster you are migrating from.

    The main panel displays the count of entities in that cluster. Use this value for estimating the extraction and transformation phase durations.



Not all Navigator entities are imported into Atlas. To estimate the subset of entities included in the import phase:

  1. Log into Navigator.
  2. In the Cluster Group facet in the left panel, select the cluster you are migrating from.
  3. In the Source Type facet in the left panel, select "Hive", "Impala", and "Spark".

    The main panel displays the count of entities in from these sources in this cluster.

  4. Double the number from the search results to account for the physical files that correspond to the tables and jobs. The HDFS entities referenced by the Hive, Impala, and Spark entities are included in the transition.

The transition brings over all business metadata definitions and associations with transitioned entities. To determine the number of Navigator managed properties to transition:

  1. Log into Navigator.
  2. In the left Search panel, find the Tags facet.

    This facet lists all the tags defined in Navigator. Navigator tags are imported into Atlas as labels.

  3. Go to Administration > Managed Properties.

    The Navigator namespaces are imported as Atlas business metadata collections. Each managed property is imported as a business metadata attribute.