Modifying a Cluster with the Configuration File

This section describes how to make changes to the cluster through Cloudera Director, using the client and the configuration file.

Growing or Shrinking a Cluster with the Configuration File

After launching a cluster with the bootstrap command (using the stand-alone Cloudera Director client), you can add or remove instances with the update command:

  1. Open the cluster.conf file that you used to launch the cluster.
  2. Change the value for the type of instance you want to change.  For example, the following increases the number of workers to 15:
    workers {
          count: 15
          minCount: 5
    
          instance: ${instances.hs18} {
            tags {
              group: worker
            }
          }
  3. Enter the following command:
    cloudera-director update cluster.conf
    Cloudera Director increases the number of worker instances.
  4. Assign roles to the new master instances through Cloudera Manager. Cloudera Director does not automatically assign roles.

Rebalancing the Cluster After Adding or Removing Hosts

After hosts have been added to or removed from a cluster, HDFS data is likely to be distributed unevenly across DataNodes. Cloudera Director does not rebalance HDFS when you add hosts or remove them from the cluster, so after growing or shrinking the cluster, you must perform manual rebalances in Cloudera Manager, as described in the Cloudera Manager documentation, HDFS Balancers.

The need for rebalancing depends on the amount of data in HDFS and the number of hosts added or removed during the cluster. Cloudera Director decommissions hosts before removing them from the cluster during a shrink operation. As part of decommissioning a DataNode, Cloudera Manager will move all the blocks from that host to other hosts so that the replication factor will be maintained even after the hosts are decommissioned. So there is no risk of data loss if the cluster is shrunk by more than two instances at a time. Rebalancing is necessary so that the blocks are placed in an optimal manner and is not required when a small number of hosts have been removed from a cluster, but only when there has been a large movement of data.