Configure WebHDFS for Knox
REST API access to HDFS in a Hadoop cluster is provided by WebHDFS. The WebHDFS REST API documentation is available online. The following properties
for Knox WebHDFS must be enabled in the /etc/hadoop/conf/hdfs-site.xml
configuration file. The example values shown in these properties are from an installed
instance of the Hortonworks Sandbox.
<property> <name>dfs.webhdfs.enabled</name> <value>true</value> </property> <property> <name>dfs.namenode.rpc-address</name> <value>sandbox.hortonworks.com:8020</value> </property> <property> <name>dfs.namenode.http-address</name> <value>sandbox.hortonworks.com:50070</value> </property> <property> <name>dfs.https.namenode.https-address</name> <value>sandbox.hortonworks.com:50470</value> </property>
The values above must be reflected in each topology descriptor file deployed to the
gateway. The gateway by default includes a sample topology descriptor file located at
{GATEWAY_HOME}/deployments/sandbox.xml
. The values in the following
sample are also configured to work with an installed Hortonworks Sandbox VM.
<service> <role>NAMENODE</role> <url>hdfs://localhost:8020</url> </service> <service> <role>WEBHDFS</role> <url>http://localhost:50070/webhdfs</url> </service>
The URL provided for the NAMENODE role does not result in an endpoint being exposed by the gateway. This information is only required so that other URLs can be rewritten that reference the Name Node’s RPC address. This prevents clients from needing to be aware of the internal cluster details.