Configure WebHDFS for Knox
REST API access to HDFS in a Hadoop cluster is provided by WebHDFS. The WebHDFS REST API documentation is available online. The following
properties for Knox WebHDFS must be enabled in the
/etc/hadoop/conf/hdfs-site.xml
configuration file. The example
values shown in these properties are from an installed instance of the
Hortonworks Sandbox.
<property> <name>dfs.webhdfs.enabled</name> <value>true</value> </property> <property> <name>dfs.namenode.rpc-address</name> <value>sandbox.hortonworks.com:8020</value> </property> <property> <name>dfs.namenode.http-address</name> <value>sandbox.hortonworks.com:50070</value> </property> <property> <name>dfs.https.namenode.https-address</name> <value>sandbox.hortonworks.com:50470</value> </property>
The values above must be reflected in each topology descriptor file deployed
to the gateway. The gateway by default includes a sample topology descriptor
file located at {GATEWAY_HOME}/deployments/sandbox.xml
. The values
in the following sample are also configured to work with an installed
Hortonworks Sandbox VM.
<service> <role>NAMENODE</role> <url>hdfs://localhost:8020</url> </service> <service> <role>WEBHDFS</role> <url>http://localhost:50070/webhdfs</url> </service>
The URL provided for the NAMENODE role does not result in an endpoint being exposed by the gateway. This information is only required so that other URLs can be rewritten that reference the Name Node’s RPC address. This prevents clients from needing to be aware of the internal cluster details.