Chapter 2. User Guide - HDFS NFS Gateway

The NFS Gateway for HDFS allows HDFS to be mounted as part of the client's local file system.

This release of NFS Gateway supports and enables the following usage patterns:

  • Users can browse the HDFS file system through their local file system on NFSv3 client compatible operating systems.

  • Users can download files from the the HDFS file system on to their local file system

  • Users can upload files from their local file system directly to the HDFS file system

[Note]Note

NFS access to HDFS does not support random write and file appends in this release of HDP. If you need support for file appends to stream data to HDFS through NFS, upgrade to HDP 2.0.

Prerequisites:

  • The NFS gateway machine needs everything to run an HDFS client like Hadoop core JAR file, HADOOP_CONF directory.

  • The NFS gateway can be on any DataNode, NameNode, or any HDP client machine. Start the NFS server on that machine.

Instructions: Use the following instructions to configure and use the HDFS NFS gateway:

  1. Configure settings for the HDFS NFS gateway:

    NFS gateway uses the same configurations as used by the NameNode and DataNode. Configure the following three properties based on your application's requirement:

    1. Edit the hdfs-default.xml file on your NFS gateway machine and modify the following property:

      <property>
        <name>dfs.access.time.precision</name>
        <value>3600000</value>
        <description>The access time for HDFS file is precise upto this value. 
                     The default value is 1 hour. Setting a value of 0 disables
                     access times for HDFS.
        </description>
      </property>

      [Note]Note

      If the export is mounted with access time update allowed, make sure this property is not disabled in the configuration file. Only NameNode needs to restart after this property is changed. If you have disabled access time update by mounting with "noatime" you do NOT have to change this property nor restart your NameNode.

    2. Update the following property to hdfs-site.xml:

      <property>    
          <name>dfs.datanode.max.xcievers</name>    
          <value>1024</value> 
      </property>

      [Note]Note

      If the number files being uploaded in parallel through the NFS gateway exceeds this value (1024), increase the value of this property accordingly. The new value must be based on the maximum number of files being uploaded in parallel.

      Restart your DataNodes after making this change to the configuration file.

    3. Add the following property to hdfs-site.xml:

      <property>    
          <name>dfs.nfs3.dump.dir</name>    
          <value>/tmp/.hdfs-nfs</value> 
      </property>

      [Note]Note

       NFS client often reorders writes. Sequential writes can arrive at the NFS gateway at random order. This directory is used to temporarily save out-of-order writes before writing to HDFS. One needs to make sure the directory has enough space. For example, if the application uploads 10 files with each having 100MB, it is recommended for this directory to have 1GB space in case if a worst-case write reorder happens to every file.

    4. Optional - Customize log settings.

      Edit the log4j.property file to add the following:

      To change trace level, add the following:

      log4j.logger.org.apache.hadoop.hdfs.nfs=DEBUG

      To get more details on RPC requests, add the following:

      log4j.logger.org.apache.hadoop.oncrpc=DEBUG

  2. Start NFS gateway service.

    Three daemons are required to provide NFS service: rpcbind (or portmap), mountd and nfsd. The NFS gateway process has both nfsd and mountd. It shares the HDFS root "/" as the only export. It is recommended to use the portmap included in NFS gateway package as shown below:

    1. Stop nfs/rpcbind/portmap services provided by the platform:

      service nfs stop
      service rpcbind stop

    2. Start package included portmap (needs root privileges):

      hadoop portmap

      OR

      hadoop-daemon.sh start portmap

    3. Start mountd and nfsd.

      No root privileges are required for this command. However, verify that the user starting the Hadoop cluster and the user starting the NFS gateway are same.

      hadoop nfs3
      

      OR

      hadoop-daemon.sh start nfs3

      [Note]Note

      If the hadoop-daemon.sh script starts the NFS gateway, its log can be found in the hadoop log folder.

    4. Stop NFS gateway services.

      hadoop-daemon.sh stop nfs3
      hadoop-daemon.sh stop portmap

  3. Verify validity of NFS related services.

    1. Execute the following command to verify if all the services are up and running:

      rpcinfo -p $nfs_server_ip

      You should see output similar to the following:

      program vers proto   port
      
          100005    1   tcp   4242  mountd
      
          100005    2   udp   4242  mountd
      
          100005    2   tcp   4242  mountd
      
          100000    2   tcp    111  portmapper
      
          100000    2   udp    111  portmapper
      
          100005    3   udp   4242  mountd
      
          100005    1   udp   4242  mountd
      
          100003    3   tcp   2049  nfs
      
          100005    3   tcp   4242  mountd
    2. Verify if the HDFS namespace is exported and can be mounted by any client.

      showmount -e $nfs_server_ip                         

      You should see output similar to the following:

      Exports list on $nfs_server_ip :
      / (everyone)

  4. Mount the export “/”.

    Currently NFS v3 is supported and uses TCP as the transportation protocol is TCP. The users can mount the HDFS namespace as shown below:

    mount -t nfs -o vers=3,proto=tcp,nolock $server:/  $mount_point

    Then the users can access HDFS as part of the local file system except that, hard/symbolic link and random write are not supported in this release.  We do not recommend using tools like vim, for creating files on the mounted directory. The supported use cases for this release are file browsing, uploading, and downloading.

    User authentication and mapping:

    NFS gateway in this release uses AUTH_UNIX style authentication which means that the the login user on the client is the same user that NFS passes to the HDFS. For example, if the NFS client has current user as admin, when the user accesses the mounted directory, NFS gateway will access HDFS as user admin. To access HDFS as hdfs user, you must first switch the current user to hdfs on the client system before accessing the mounted directory.

  5. Set up client machine users to interact with HDFS through NFS.

    NFS gateway converts UID to user name and HDFS uses username for checking permissions.

    The system administrator must ensure that the user on NFS client machine has the same name and UID as that on the NFS gateway machine. This is usually not a problem if you use same user management system (e.g., LDAP/NIS) to create and deploy users to HDP nodes and to client node.

    If the user is created manually, you might need to modify UID on either client or NFS gateway host in order to make them the same.

    The following illustrates how the user ID and name are communicated between NFS client, NFS gateway, and NameNode.