Managing Data Hub and Data Lake ClustersPDF version

File system partitioning recommendations

This section helps you to understand the recommendations and set up the file system partitions on master and worker nodes on a CDP Private Cloud Base cluster.

  • Root partition: OS and core program files
  • Swap: Size 2X system memory
  • Hadoop worker node: Hadoop must have its partitions for Hadoop files and logs. Drives must be partitioned using XFS, ext4, or ext3 in that order of preference.
  • Worker nodes: All Hadoop partitions must be mounted individually from drives in the /grid/[0-n] format.
  • /swap: Cloudera recommends following the guidelines provided by your operating system vendor to configure the swap space on each host. If your vendor recommends a swap space range, then use the lowest recommended value.
  • /root: 20 GB (sufficient space for existing files, future log file growth, and OS upgrades)
  • /grid/0/: [full disk GB] first partition for Hadoop to use for local storage
  • /grid/1/: second partition for Hadoop to use
  • /grid/2/: third partition for Hadoop to use, and so on
  • Master nodes: Configured for reliability (RAID 10, dual Ethernet cards, dual power supplies, and so on.)
  • Worker nodes: RAID is not necessary as the cluster manages the worker nodes' failure automatically. Data is stored across at least three different hosts, therefore redundancy is built-in. Worker nodes must be built for speed and low cost.