Scaling Namespaces and Optimizing Data Storage
Also available as:
PDF
loading table of contents...

Configure an HDFS federation

All the nodes in a federation share a common set of configuration files. To support the common configuration, you must configure a NameService ID for all the NameNodes that you want to include in the federation.

  • Ensure that you configure the federation during a cluster maintenance window.
  • Verify that you have configured HA for all the NameNodes that you want to include in the federation. In addition, ensure that you have configured the value of dfs.nameservices for the NameNodes in hdfs-site.xml.

A federation allows you to add new NameService IDs to a cluster. Each NameService denotes a new filesystem namespace. You can configure a maximum of four namespaces in a federated environment.

An active NameNode and its standby belong to the NameService ID. To support the common configuration, you must suffix the parameters for the various NameNodes in a federation with the NameService ID.

For example, if you define ns2 as the NameService ID for a federation, then you must add ns2 as a suffix to parameters such as dfs.namenode.rpc-address, dfs.namenode.http-address, and dfs.namenode.secondaryhttp-address.
Note
Note
This task explains how you can configure an HDFS federation using the command line interface. For information about using Ambari to configure a federation, see the topic Configure HDFS Federation in the Ambari documentation.
  1. Verify whether the newly added namespaces are added to the dfs.internal.nameservices parameter in hdfs-site.xml.
    The particular parameter lists all the namespaces that belong to the local cluster.
  2. Add the following configuration parameters suffixed with the correct NameService ID for the active and standby NameNodes in the hdfs-site.xml.
    Daemon Configuration Parameter
    NameNode
    • dfs.namenode.rpc-address
    • dfs.namenode.http-address
    • dfs.namenode.https-address
    • dfs.namenode.servicerpc-address
    • dfs.namenode.keytab.file
    • dfs.namenode.name.dir
    • dfs.namenode.checkpoint.dir
    • dfs.namenode.checkpoint.edits.dir
    Note
    Note
    The parameters dfs.namenode.http-address and dfs.namenode.https-address are optional depending on the http policy configured. In addition, the parameters dfs.namenode.checkpoint.dir and dfs.namenode.checkpoint.edits.dir are optional.
  3. Propagate the configuration file updates to all the nodes in the cluster.

The following example shows the configuration for two NameNodes in a federation:

<configuration>
  <property>
    <name>dfs.nameservices</name>
    <value>ns1,ns2</value>
  </property>
  <property>
    <name>dfs.namenode.rpc-address.ns1</name>
    <value>nn-host1:rpc-port</value>
  </property>
  <property>
    <name>dfs.namenode.http-address.ns1</name>
    <value>nn-host1:http-port</value>
  </property>
  <property>
    <name>dfs.namenode.rpc-address.ns2</name>
    <value>nn-host2:rpc-port</value>
  </property>
  <property>
    <name>dfs.namenode.http-address.ns2</name>
    <value>nn-host2:http-port</value>
  </property>

  .... Other common configuration ...
</configuration>
Format every new NameNode that you want to include in the federation.