Specifying Racks for Hosts
To get maximum performance, it is important to configure Cloudera Manager so that it knows the topology of your network. Network locations such as hosts and racks are represented in a tree, which reflects the network “distance” between locations. HDFS will use the network location to place block replicas more intelligently to trade off performance and resilience.
Minimum Required Role: Cluster Administrator (also provided by Full Administrator)
When placing jobs on hosts, CDP prefers within-rack transfers (where there is more bandwidth available) to off-rack transfers; the MapReduce and YARN schedulers use network location to determine where the closest replica is as input to a map task. These computations are performed with the assistance of rack awareness scripts.
Cloudera Manager includes internal rack awareness scripts, but you must specify the racks where the hosts in your cluster are located. If your cluster contains more than 10 hosts, Cloudera recommends that you specify the rack for each host. HDFS, MapReduce, and YARN will automatically use the racks you specify.
Cloudera Manager supports nested rack specifications. For example, you could specify the
rack /rack3
, or /group5/rack3
to indicate the third rack
in the fifth group. All hosts in a cluster must have the same number of path
components in their rack specifications.
- Click .
- Select the hosts that you want to assign to a rack.
- Click .
- Enter a rack name or ID that starts with a slash
/
, such as/rack123
or/aisle1/rack123
. - Click Confirm.
- Optionally, restart any affected services. Rack assignments are not automatically updated for running services.