Configuring Mountable HDFS

CDH includes a FUSE (Filesystem in Userspace) interface into HDFS. The hadoop-hdfs-fuse package enables you to use your HDFS cluster as if it were a traditional filesystem on Linux. Proceed as follows.

Before you start: You must have a working HDFS cluster and know the hostname and port that your NameNode exposes. If you use parcels to install CDH, you do not need to install the FUSE packages.

To install hadoop-hdfs-fuses On Red Hat-compatible systems:

sudo yum install hadoop-hdfs-fuse

To install hadoop-hdfs-fuse on Ubuntu systems:

sudo apt-get install hadoop-hdfs-fuse

To install hadoop-hdfs-fuse on SLES systems:

sudo zypper install hadoop-hdfs-fuse

You now have everything you need to begin mounting HDFS on Linux.

To set up and test your mount point in a non-HA installation:

mkdir -p <mount_point>
hadoop-fuse-dfs dfs://<name_node_hostname>:<namenode_port> <mount_point>

where namenode_port is the NameNode's RPC port, dfs.namenode.servicerpc-address.

To set up and test your mount point in an HA installation:

mkdir -p <mount_point>
hadoop-fuse-dfs dfs://<nameservice_id> <mount_point>

where nameservice_id is the value of fs.defaultFS. In this case the port defined for dfs.namenode.rpc-address.[nameservice ID].[name node ID] is used automatically. See Enabling HDFS HA for more information about these properties.

You can now run operations as if they are on your mount point. Press Ctrl+C to end the fuse-dfs program, and umount the partition if it is still mounted.

To clean up your test:

umount <mount_point>

You can now add a permanent HDFS mount which persists through reboots.

To add a system mount:

  1. Open /etc/fstab and add lines to the bottom similar to these:
    hadoop-fuse-dfs#dfs://<name_node_hostname>:<namenode_port> <mount_point> fuse allow_other,usetrash,rw 2 0

    For example:

    hadoop-fuse-dfs#dfs://localhost:8020 /mnt/hdfs fuse allow_other,usetrash,rw 2 0
  2. Test to make sure everything is working properly:
    mount <mount_point>

Your system is now configured to allow you to use the ls command and use that mount point as if it were a normal system disk.

For more information, see the help for hadoop-fuse-dfs:

hadoop-fuse-dfs --help

Optimizing Mountable HDFS

  • Cloudera recommends that you use the -obig_writes option on kernels later than 2.6.26. This option allows for better performance of writes.
  • By default, the CDH package installation creates the /etc/default/hadoop-fuse file with a maximum heap size of 128 MB. You might need to change the JVM minimum and maximum heap size for better performance. For example:
    export LIBHDFS_OPTS="-Xms64m -Xmx256m"

    Be careful not to set the minimum to a higher value than the maximum.