Access HDFS
To access HDFS, first mount the export "/". Currently NFS v3 is supported. It uses TCP, as the transportation protocol is TCP.
Mount the HDFS namespace as follows:
mount -t nfs -o vers=3,proto=tcp,nolock,sync,rsize=1048576,wsize=1048576 $server:/ $mount_point
Access HDFS as part of the local file system, except that hard/symbolic link and random write are not supported in this release.
Note Because NLM is not supported, the mount option
nolock
is needed.We recommend using the
sync
option for performance when writing large files.Here is additional information about
sync
,rtmax
/wtmax
, andHADOOP_NFS3_OPTS
(for gateway heap space):Option Description sync
The
sync
mount option to the NFS client improves the performance and reliability of writing large files to HDFS via the NFS gateway. If the sync option is specified, the NFS client machine will flush write operations to the NFS gateway before returning control to the client application. A useful side effect of sync is that the client will not issue reordered writes which reduces buffering requirements on the NFS gateway.Note:
sync
is specified on the client machine when mounting the NFS share.rtmax
,wtmax
The
dfs.nfs.rtmax
anddfs.nfs.wtmax
properties are HDFS configuration settings on the HDFS NFS gateway server. These options change the maximum read and write request size supported by the gateway. The default value for both settings is 1 MB. Increasing these values may improve the performance of large file transfers. The defaults are expected to work well for most deployments.HADOOP_NFS3_OPTS Specify JVM heap space for the NFS Gateway. This option is useful for increasing heap space in fairly large environments where the gateway might run out of heap space, leading to an "out of memory" error.
To set this option, specify the following in the
hadoop-env.sh
file:export HADOOP_NFS3_OPTS=<memory-setting(s)>
The following example specifies a 2 GB process heap (2GB starting size and 2GB maximum):
export HADOOP_NFS3_OPTS="-Xms2048m -Xmx2048m"
User authentication and mapping:
NFS gateway uses
AUTH_UNIX
-style authentication, which means that the the login user on the client is the same user that NFS passes to the HDFS. For example, if the NFS client has current user asadmin
, when the user accesses the mounted directory, NFS gateway will access HDFS as useradmin
. To access HDFS ashdfs
user, you must first switch the current user tohdfs
on the client system before accessing the mounted directory.Set up client machine users to interact with HDFS through NFS.
The NFS gateway converts the User Identifier (UID) to username, and HDFS uses username to check permissions.
The system administrator must ensure that the user on NFS client machine has the same name and UID as that on the NFS gateway machine. This is usually not a problem if you use the same user management system such as LDAP/NIS to create and deploy users to HDP nodes and to client node.
If the user is created manually, you might need to modify the UID on either the client or NFS gateway host in order to make them the same:
usermod -u 123 $myusername
The following diagram illustrates how the UID and name are communicated between the NFS client, NFS gateway, and NameNode.