Configuring Short-Circuit Reads
So-called "short-circuit" reads bypass the DataNode, allowing a client to read the file directly, as long as the client is co-located with the data. Short-circuit reads provide a substantial performance boost to many applications and help improve HBase random read profile and Impala performance.
Short-circuit reads require libhadoop.so (the Hadoop Native Library) to be accessible to both the server and the client. libhadoop.so is not available if you have installed from a tarball. You must install from an .rpm, .deb, or parcel in order to use short-circuit local reads.
Configuring Short-Circuit Reads Using Cloudera Manager
Minimum Required Role: Configurator (also provided by Cluster Administrator, Full Administrator)
- Go to the HDFS service.
- Click the Configuration tab.
- Select .
- Select .
- Locate the Enable HDFS Short Circuit Read property or search for it by typing its name in the Search box. Check the box to enable it.
If more than one role group applies to this configuration, edit the value for the appropriate role group. See Modifying Configuration Properties.
- Click Save Changes to commit the changes.
Configuring Short-Circuit Reads Using the Command Line
<property> <name>dfs.client.read.shortcircuit</name> <value>true</value> </property> <property> <name>dfs.client.read.shortcircuit.streams.cache.size</name> <value>1000</value> </property> <property> <name>dfs.client.read.shortcircuit.streams.cache.expiry.ms</name> <value>10000</value> </property> <property> <name>dfs.domain.socket.path</name> <value>/var/run/hadoop-hdfs/dn._PORT</value> </property>
If /var/run/hadoop-hdfs/ is group-writable, make sure its group is root.