Filesystems
In Linux, there are several choices for formatting and organizing drives. However, only a few choices are optimal for Hadoop.
In RHEL and CentOS, the Logical Volume Manager (LVM) should not be used for data drives. It is not optimal and can lead to combining multiple drives into one logical disk, which is in complete contrast to how Hadoop manages fault tolerance across HDFS. It is beneficial to keep LVM enabled on the OS drives. Any performance impact that may occur is countered by the improvement of system manageability. Using LVM on the OS drives enables the admin to avoid over-allocating space on partitions. Space needs can change over time and the ability to dynamically grow a filesystem is better than having to rebuild a system. Do not use LVM to stripe or span logical volumes across multiple physical volumes to mimic RAID.
Cloudera recommends using an extent-based filesystem. This includes ext3
,
ext4
, and xfs
. Most new Hadoop clusters use the
ext4
filesystem by default. RHEL 7 uses xfs
as its default
filesystem.
Filesystem creation options
ext4
filesystems for use with Hadoop data volumes, Cloudera
recommends reducing the superuser block reservation from 5% to 1% for root (using the
-m1
option) as well as setting the following options:- use one inode per 1 MB (largefile)
- minimize the number of super block backups (sparse_super)
- enable journaling (has_journal)
- use b-tree indexes for directory trees (dir_index)
- use extent-based allocations (extent)
ext4
filesystem:mkfs –t ext4 –m 1 –O -T largefile sparse_super,dir_index,extent,has_journal /dev/sdb1
xfs
filesystem:mkfs –t xfs /dev/sdb1
Disk mount options
By design, HDFS is a fault-tolerant filesystem. All drives used by DataNode machines for data
need to be mounted without the use of RAID. Drives should be mounted in the
/etc/fstab filesystem table using the noatime
option
(which also implies nodiratime
). In case of SSD or flash, turn on TRIM by specifying the discard
option when mounting. This reduces
premature SSD wear and device failures, while primarily avoiding long garbage collection
pauses.
noatime
mount option
specified:/dev/sda1 / ext4 noatime 0 0
discard
mount
option:/dev/sdb1 /data ext4 noatime,discard 0 0
Disk mount naming convention
/data1
/data2
/data3
/data4
/data5
/data6