File system partitioning recommendations

This section helps you to understand the recommendations and set up the file system partitions on master and worker nodes on a CDP Private Cloud Base cluster.

Partitioning recommendations for all nodes

  • Root partition: OS and core program files
  • Swap: Size 2X system memory

Partitioning recommendations for worker nodes

  • Hadoop worker node: Hadoop must have its partitions for Hadoop files and logs. Drives must be partitioned using XFS, ext4, or ext3 in that order of preference.
  • Worker nodes: All Hadoop partitions must be mounted individually from drives in the /grid/[0-n] format.

Hadoop worker node partitioning configuration example

  • /swap: Cloudera recommends following the guidelines provided by your operating system vendor to configure the swap space on each host. If your vendor recommends a swap space range, then use the lowest recommended value.
  • /root: 20 GB (sufficient space for existing files, future log file growth, and OS upgrades)
  • /grid/0/: [full disk GB] first partition for Hadoop to use for local storage
  • /grid/1/: second partition for Hadoop to use
  • /grid/2/: third partition for Hadoop to use, and so on

Redundancy (RAID) recommendations

  • Master nodes: Configured for reliability (RAID 10, dual Ethernet cards, dual power supplies, and so on.)
  • Worker nodes: RAID is not necessary as the cluster manages the worker nodes' failure automatically. Data is stored across at least three different hosts, therefore redundancy is built-in. Worker nodes must be built for speed and low cost.