1. Configure WebHDFS for Knox

REST API access to HDFS in a Hadoop cluster is provided by WebHDFS. The WebHDFS REST API documentation is available online. The following properties for Knox WebHDFS must be enabled in the /etc/hadoop/conf/hdfs-site.xml configuration file. The example values shown in these properties are from an installed instance of the Hortonworks Sandbox.

<property>
    <name>dfs.webhdfs.enabled</name>
    <value>true</value>
</property>
<property>
    <name>dfs.namenode.rpc-address</name>
    <value>sandbox.hortonworks.com:8020</value>
</property>
<property>
    <name>dfs.namenode.http-address</name>
    <value>sandbox.hortonworks.com:50070</value>
</property>
<property>
    <name>dfs.https.namenode.https-address</name>
    <value>sandbox.hortonworks.com:50470</value>
</property>

The values above must be reflected in each topology descriptor file deployed to the gateway. The gateway by default includes a sample topology descriptor file located at {GATEWAY_HOME}/deployments/sandbox.xml. The values in the following sample are also configured to work with an installed Hortonworks Sandbox VM.

<service>
    <role>NAMENODE</role>
    <url>hdfs://localhost:8020</url>
</service>
<service>
    <role>WEBHDFS</role>
    <url>http://localhost:50070/webhdfs</url>
</service>

The URL provided for the NAMENODE role does not result in an endpoint being exposed by the gateway. This information is only required so that other URLs can be rewritten that reference the Name Node’s RPC address. This prevents clients from needing to be aware of the internal cluster details.


loading table of contents...