Weight
The storage density of the latest generation of servers means that the weight of the racks needs to be taken into account. You should verify that the weight of a rack is not more than the capacity of the datacenter’s floor.
Scalability
It is easy to scale a Hadoop cluster by adding new servers or whole server racks to the cluster and increasing the memory in the master nodes to deal with the increased load. This will generate a lot of “rebalancing traffic” at first, but will deliver extra storage and computation. The master nodes do matter, we therefore recommend that you pay the premiums for those machines.
Use the following guidelines to scale your existing Hadoop cluster:
Ensure there is potential free space in the data centre near to the Hadoop cluster. This space should be able to accommodate the power budget for more racks.
Plan the network to cope with more servers
It might be possible to add more disks and RAM to the existing servers -and extra CPUs if the servers have spare sockets. This can expand an existing cluster without adding more racks or network changes.
To perform this hardware upgrade in a live cluster can take considerable time and effort and so we recommend that you should plan the expansion one server at a time.
CPU parts do not remain on the vendors price list forever. If you do plan to add a second CPU, consult with your reseller on when they will cut the price of CPUs that your existing parts and buy these parts when available. This typically takes at least 18 months time period.
You are likely to need more memory in the master servers.
Support contracts
The concept to consider here is “care for the master nodes, keep an eye on the slave nodes”. You do not need traditional enterprise-class support contracts for the majority of the nodes in the cluster, as their failures are more of a statistics issue than a crisis. The money saved in support can go into more slave nodes.
Commissioning
Hortonworks plans to cover the best practices commissioning a Hadoop cluster in a future document. For now, note that the “smoke tests” that come with the Hadoop cluster are a good initial test, followed by Terasort. Some of the major server vendors offer in factory commissioning of Hadoop clusters for an extra fee. This can have a direct benefit in ensuring that the cluster is working before you receive and pay for it. There is an indirect benefit in that if the terasort performance is lower on-site than in-factory, it is possible to conclude that the network is the likely culprit and so it is possible to track down the problem faster.