Configuring Short-Circuit Reads
So-called "short-circuit" reads bypass the DataNode, allowing a client to read the file directly, as long as the client is co-located with the data. Short-circuit reads provide a substantial performance boost to many applications and help improve HBase random read profile and Impala performance.
Short-circuit reads require libhadoop.so (the Hadoop Native Library) to be accessible to both the server and the client. libhadoop.so is not available if you have installed from a tarball. You must install from an .rpm, .deb, or parcel in order to use short-circuit local reads.
Configuring Short-Circuit Reads Using Cloudera Manager
Minimum Required Role: Configurator (also provided by Cluster Administrator, Full Administrator)
- Go to the HDFS service.
- Click the Configuration tab.
- Type "shortcircuit" into the Search field to display the Enable HDFS Short Circuit Read property, and verify that this feature is enabled (set to True).
- Go to the HBase service.
- Click the Configuration tab.
- Search for "shortcircuit".
- Verify that the Enable HDFS Short Circuit Read property is enabled.
Configuring Short-Circuit Reads Using the Command Line
<property> <name>dfs.client.read.shortcircuit</name> <value>true</value> </property> <property> <name>dfs.client.read.shortcircuit.streams.cache.size</name> <value>1000</value> </property> <property> <name>dfs.client.read.shortcircuit.streams.cache.expiry.ms</name> <value>10000</value> </property> <property> <name>dfs.domain.socket.path</name> <value>/var/run/hadoop-hdfs/dn._PORT</value> </property>
If /var/run/hadoop-hdfs/ is group-writable, make sure its group is root.