Repurposing CDP Private Cloud Base Nodes for CDP Private Cloud Data Services on ECS

Separating compute and storage using CDP Private Cloud (PvC) Data Services provides cloud-native capabilities to help optimize compute resources.

Workloads that are candidates for migration from CDP PvC Base to CDP PvC Data Services include CDSW, Hive, Spark on YARN, and Impala. You can identify underutilized nodes in CDP PvC Base, repurpose them as PvC Data Services nodes, and then migrate existing workloads from PvC Base to PvC Data Services.

Review the following hardware requirements before repurposing the PvC Base nodes for PvC Data Services.

  1. Check the CDP Private Cloud Data Services Hardware Requirements and confirm that the nodes you intend to repurpose meet the requirements. This may include new requirements that necessitate added storage such as fast cache devices, increased RAM, or upgraded network cards.

  2. Ensure that your existing PvC Base cluster complies with PvC Data Services system requirements. If not, you may need to upgrade your Private Cloud Base cluster to a supported version.

  3. Ensure that your existing PvC Base cluster will still function properly without the repurposed nodes. Remember that removing nodes reduces overall storage and compute capacity in the PvC Base cluster.
    • Target existing worker nodes – avoid repurposing “master” nodes (i.e., the NameNode).

    • Avoid using edge nodes, or gateways.

    • Calculate how much storage is being removed (the combined footprint of the JBOD drives on nodes). The revised utilized capacity for HDFS, Ozone, and Kudu should generally not exceed 75% after the nodes are removed.

    • Review the PvC Base cluster utilization reports and ensure that the remaining compute capacity can support the remaining PvC Base workloads:
      1. HBase, Solr, MapReduce, and NiFi workloads will remain on PvC Base. Ensure that your remaining compute capacity can support these workloads.

      2. Ranger, Atlas, Ozone, HMS, Zookeeper, and HDFS will also remain on PvC Base. Ensure that your remaining compute capacity can support these workloads.

Repurposing the nodes

If the high availability (HA) configuration is not changing on the PvC Base cluster, you can run the following steps to repurpose nodes without any downtime.

  1. Remove the host from the PvC Base cluster. Using the Cloudera Manager Hosts Decommission feature ensures that workloads are shut down gracefully on each node, and no new work is accepted.
    • When the target includes a HDFS Datanode role, HDFS blocks will be evicted from that host and copied to available space in the cluster. This can take some time, depending on how much data is stored and how your environment is set up.

    • After removing a host from the PvC Base cluster, it is highly recommended that you treat this node according to your normal procedures for redeployed hardware. This may involve wiping drives and reimaging the operating system (OS).

  2. Treat the old node as a new server, commissioning it by adding it to an existing PvC Data Services Cluster or including it in a new PvC Data Services cluster. See Install CDP Private Cloud Data Services for more information.