Configuring Short-Circuit Reads

So-called "short-circuit" reads bypass the DataNode, allowing a client to read the file directly, as long as the client is co-located with the data. Short-circuit reads provide a substantial performance boost to many applications and help improve HBase random read profile and Impala performance.

Short-circuit reads require libhadoop.so (the Hadoop Native Library) to be accessible to both the server and the client. libhadoop.so is not available if you have installed from a tarball. You must install from an .rpm, .deb, or parcel in order to use short-circuit local reads.

Configuring Short-Circuit Reads Using Cloudera Manager

Minimum Required Role: Configurator (also provided by Cluster Administrator, Full Administrator)

  1. Go to the HDFS service.
  2. Click the Configuration tab.
  3. Select Scope > Gateway or HDFS (Service-Wide).
  4. Select Category > Performance.
  5. Locate the Enable HDFS Short Circuit Read property or search for it by typing its name in the Search box. Check the box to enable it.

    If more than one role group applies to this configuration, edit the value for the appropriate role group. See Modifying Configuration Properties Using Cloudera Manager.

  6. Click Save Changes to commit the changes.

Configuring Short-Circuit Reads Using the Command Line

Configure the following properties in hdfs-site.xml to enable short-circuit reads in a cluster that is not managed by Cloudera Manager:
<property>
    <name>dfs.client.read.shortcircuit</name>
    <value>true</value>
</property>

<property>
    <name>dfs.client.read.shortcircuit.streams.cache.size</name>
    <value>1000</value>
</property>


<property>
    <name>dfs.client.read.shortcircuit.streams.cache.expiry.ms</name>
    <value>10000</value>
</property>

<property>
    <name>dfs.domain.socket.path</name>
    <value>/var/run/hadoop-hdfs/dn._PORT</value>
</property>

If /var/run/hadoop-hdfs/ is group-writable, make sure its group is root.