Repurposing Cloudera Base on premises Nodes for Cloudera Data Services on premises on Cloudera Embedded Container Service

Separating compute and storage using Cloudera Data Services on premises provides cloud-native capabilities to help optimize compute resources.

Workloads that are candidates for migration from Cloudera Base on premises to Cloudera Data Services on premises include CDSW, Hive, Spark on YARN, and Impala. You can identify underutilized nodes in Cloudera Base on premises, repurpose them as Cloudera Data Services on premises nodes, and then migrate existing workloads from Cloudera Base on premises to Cloudera Data Services on premises.

Review the following hardware requirements before repurposing the Cloudera Base on premises nodes for Cloudera Data Services on premises.

  1. Check the Cloudera on Premises Data Services Hardware Requirements and confirm that the nodes you intend to repurpose meet the requirements. This may include new requirements that necessitate added storage such as fast cache devices, increased RAM, or upgraded network cards.

  2. Ensure that your existing Cloudera Base on premises cluster complies with on Premises Data Services system requirements. If not, you may need to upgrade your on Premises Base cluster to a supported version.

  3. Ensure that your existing Cloudera Base on premises cluster will still function properly without the repurposed nodes. Remember that removing nodes reduces overall storage and compute capacity in the Cloudera Base on premises cluster.
    • Target existing worker nodes – avoid repurposing “master” nodes (i.e., the NameNode).

    • Avoid using edge nodes, or gateways.

    • Calculate how much storage is being removed (the combined footprint of the JBOD drives on nodes). The revised utilized capacity for HDFS, Ozone, and Kudu should generally not exceed 75% after the nodes are removed.

    • Review the On Premises Base cluster utilization reports and ensure that the remaining compute capacity can support the remaining PvC Base workloads:
      1. HBase, Solr, MapReduce, and NiFi workloads will remain on Cloudera Base on premises. Ensure that your remaining compute capacity can support these workloads.

      2. Ranger, Atlas, Ozone, HMS, Zookeeper, and HDFS will also remain on Cloudera Base on premises. Ensure that your remaining compute capacity can support these workloads.

Repurposing the nodes

If the high availability (HA) configuration is not changing on the Cloudera Base on premises cluster, you can run the following steps to repurpose nodes without any downtime.

  1. Remove the host from the Cloudera Base on premises cluster. Using the Cloudera Manager Hosts Decommission feature ensures that workloads are shut down gracefully on each node, and no new work is accepted.
    • When the target includes a HDFS Datanode role, HDFS blocks will be evicted from that host and copied to available space in the cluster. This can take some time, depending on how much data is stored and how your environment is set up.

    • After removing a host from the Cloudera Base on premises cluster, it is highly recommended that you treat this node according to your normal procedures for redeployed hardware. This may involve wiping drives and reimaging the operating system (OS).

  2. Treat the old node as a new server, commissioning it by adding it to an existing Cloudera Data Services on premises Cluster or including it in a new Cloudera Data Services on premises cluster. See Install Cloudera on Premises Data Services for more information.