Assumptions and prerequisites
Before you transition your cluster to CDP Private Cloud Base or migrating content from Navigator to Apache Atlas, ensure that you have collected all the credentials and set expectations for the time required for completing the transition. The prerequisites in this section help you to prepare in advance to transition.
In addition to the prerequisites outlined for the Cloudera Manager and CDP upgrades, you'll need the following for the Navigator to Atlas transition:
- Deleted entities in Navigator. Check the Navigator Administration page to make sure that a successful purge has run recently. If it hasn't, consider running a purge before the transition. See Managing Metadata Storage with Purge.
- Role to host assignments. Before you begin upgrading to CDP, make a plan for where you will install the Atlas server. In addition, Atlas depends upon HBase, Kafka, and Solr services; your plan should include host assignments for installing the components of these services. See Runtime Cluster Hosts and Role Assignments.
- Resources for Atlas service. Atlas requires 16 GB of Java heap (Atlas Max Heapsize property) and 4 Solr shards (Initial Solr Shards for Atlas Collections property). Make sure the host you choose for Atlas has enough resources for all the services' requirements.
- Resources for Solr service. During transition, Solr running to serve as Atlas' index requires 12 GB of Java heap (Java Heap Size of Solr Server in Bytes property). You can reset this back to Make sure the host you choose for Atlas has enough resources for all the services' requirements.
- Navigator credentials. The transition requires the username and password for a Navigator user with administrator privileges.
- Local disk space needed for intermediate processing. The first two phases of the
Navigator-to-Atlas transition produce intermediate files in
/tmpin the local file system where Atlas is installed. See Estimating the time and resources needed for transition.
- Local disk space for transition staging files. The first two phases of the Navigator-to-Atlas transition produce staging files on the local disk where Atlas is installed. See Estimating the time and resources needed for transition.
- Time estimates for transition phases. Each phase of the transition runs independently from the upgrade. You can trigger them to run when convenient. See Estimating the time and resources needed for transition.
Estimating the time and resources needed for transition
While the cluster is starting up, you can plan for and start the transition process.
- Inspect Navigator installation to determine the number of Navigator entities that will be transitioned. See How many Navigator entities are transitioned?
- Estimate the time and disk space required for each phase of the transition.
The following transition rates are approximate and depend on the resources available on the Atlas host and other unknown factors. Note that the number of entities actually imported may be considerably less that the number of entities extracted. The transition process discards HDFS entities that are not referenced by processes that are transitioned (Hive, Impala, Spark).
Transition Phase Transition Rate Disk Space Output File Size Trial Data Points Extraction 4 minutes / 1 million entities 100 MB / 1 million entities, less as volumes increase 65 MB / 1 million entities 10 million entities takes about 30 minutes; 256 million takes about 18 hours. Transformation 1.5 minutes / 1 million entities 100 to 150 MB / 1 million entities, higher end of range with larger volumes 150 MB / 1 million entities 10 million entities takes about 20 minutes; 256 million takes about 6 hours. Import 35 minutes / 1 million migrated entities N/A N/A 10 million entities takes about 4 hours; 256 million takes about 6 days.