Mountable HDFS
CDH 5 includes a FUSE (Filesystem in Userspace) interface into HDFS. FUSE enables you to write a normal userland application as a bridge for a traditional filesystem interface. The hadoop-hdfs-fuse package enables you to use your HDFS cluster as if it were a traditional filesystem on Linux. It is assumed that you have a working HDFS cluster and know the hostname and port that your NameNode exposes.
To install fuse-dfs On Red Hat-compatible systems:
$ sudo yum install hadoop-hdfs-fuse
To install fuse-dfs on Ubuntu systems:
$ sudo apt-get install hadoop-hdfs-fuse
To install fuse-dfs on SLES systems:
$ sudo zypper install hadoop-hdfs-fuse
You now have everything you need to begin mounting HDFS on Linux.
To set up and test your mount point in a non-HA installation:
$ mkdir -p <mount_point> $ hadoop-fuse-dfs dfs://<name_node_hostname>:<namenode_port> <mount_point>
where namenode_port is the NameNode's RPC port, dfs.namenode.servicerpc-address.
To set up and test your mount point in an HA installation:
$ mkdir -p <mount_point> $ hadoop-fuse-dfs dfs://<nameservice_id> <mount_point>
where nameservice_id is the value of fs.defaultFS. In this case the port defined for dfs.namenode.rpc-address.[nameservice ID].[name node ID] is used automatically. See Configuring Software for HDFS HA for more information about these properties.
To find its configuration directory, hadoop-fuse-dfs uses the HADOOP_CONF_DIR configured at the time the mount command is invoked.
To clean up your test:
$ umount <mount_point>
You can now add a permanent HDFS mount which persists through reboots. To add a system mount:
- Open /etc/fstab and
add lines to the bottom similar to these:
hadoop-fuse-dfs#dfs://<name_node_hostname>:<namenode_port> <mount_point> fuse allow_other,usetrash,rw 2 0
For example:
hadoop-fuse-dfs#dfs://localhost:8020 /mnt/hdfs fuse allow_other,usetrash,rw 2 0
- Test to make sure everything is working properly:
$ mount <mount_point>
Your system is now configured to allow you to use the ls command and use that mount point as if it were a normal system disk.
By default, the CDH 5 package installation creates the /etc/default/hadoop-fuse file with a maximum heap size of 128 MB. You can change the JVM minimum and maximum heap size; for example
To change it:
export LIBHDFS_OPTS="-Xms64m -Xmx256m"
Be careful not to set the minimum to a higher value than the maximum.
For more information, see the help for hadoop-fuse-dfs:
$ hadoop-fuse-dfs --help
<< The HDFS Balancer | Configuring an NFSv3 Gateway >> | |