3.1. Configure HA Cluster

Add High Availability configurations to your HDFS configuration files. Start by taking the HDFS configuration files from the original NameNode in your HDP cluster, and use that as the base, adding the properties mentioned below to those files.

After you have added the configurations below, ensure that the same set of HDFS configuration files are propogated to all nodes in the HDP cluster. This ensures that all the nodes and services are able to interact with the highly available NameNode.

Add the following configuration options to your hdfs-site.xml file:

  • dfs.nameservices:

    Choose an arbitrary but logical name (for example mycluster) as the value for dfs.nameservices option. This name will be used for both configuration and authority component of absolute HDFS paths in the cluster.

    <property>
      <name>dfs.nameservices</name>
      <value>mycluster</value>
      <description>Logical name for this new nameservice</description>
    </property>

    If you are also using HDFS Federation, this configuration setting should also include the list of other nameservices, HA or otherwise, as a comma-separated list.

  • dfs.ha.namenodes.[$nameservice ID]:

    Provide a list of comma-separated NameNode IDs. DataNodes use this this property to determine all the NameNodes in the cluster.

    For example, for the nameservice ID mycluster and individual NameNode IDs nn1 and nn2, the value of this property will be:

    <property>
      <name>dfs.ha.namenodes.mycluster</name>
      <value>nn1,nn2</value>
      <description>Unique identifiers for each NameNode in the nameservice</description>
    </property>

    [Note]Note

    Currently, a maximum of two NameNodes may be configured per nameservice.

  • dfs.namenode.rpc-address.[$nameservice ID].[$name node ID]:

    Use this property to specify the fully-qualified RPC address for each NameNode to listen on.

    Continuning with the previous example, set the full address and IPC port of the NameNode process for the above two NameNode IDs - nn1 and nn2 .

    Note that there will be two separate configuration options.

    <property>
      <name>dfs.namenode.rpc-address.mycluster.nn1</name>
      <value>machine1.example.com:8020</value>
    </property>
    <property>
      <name>dfs.namenode.rpc-address.mycluster.nn2</name>
      <value>machine2.example.com:8020</value>
    </property>

  • dfs.namenode.http-address.[$nameservice ID].[$name node ID]:

    Use this property to specify the fully-qualified HTTP address for each NameNode to listen on.

    Set the addresses for both NameNodes HTTP servers to listen on. For example:

    <property>
    <name>dfs.namenode.http-address.mycluster.nn1</name>
    <value>machine1.example.com:50070</value>
    </property>
    <property>
    <name>dfs.namenode.http-address.mycluster.nn2</name>
    <value>machine2.example.com:50070</value>
    </property>
    

    [Note]Note

    If you have Hadoop security features enabled, set the https-address for each NameNode.

  • dfs.namenode.shared.edits.dir:

    Use this property to specify the URI that identifies a group of JournalNodes (JNs) where the NameNode will write/read edits.

    Configure the addresses of the JNs that provide the shared edits storage. The Active nameNode writes to this shared storage and the Standby NameNode reads from this location to stay up-to-date with all the file system changes.

    Although you must specify several JournalNode addresses, you must configure only one of these URIs for your cluster.

    The URI should be of the form:

    qjournal://host1:port1;host2:port2;host3:port3/journalId

    The Journal ID is a unique identifier for this nameservice, which allows a single set of JournalNodes to provide storage for multiple federated namesystems. You can reuse the nameservice ID for the journal identifier.

    For example, if the JournalNodes for a cluster were running on the node1.example.com, node2.example.com, and node3.example.com machines and the nameservice ID were mycluster, you would use the following as the value for this setting:

    <property>
      <name>dfs.namenode.shared.edits.dir</name>
      <value>qjournal://node1.example.com:8485;node2.example.com:8485;node3.example.com:8485/mycluster</value>
    </property>

    [Note]Note

    Note that the default port for the JournalNode is 8485.

  • dfs.client.failover.proxy.provider.[$nameservice ID]:

    This property specifies the Java class that HDFS clients use to contact the Active NameNode. DFS Client uses this Java class to determine which NameNode is the current Active and therefore which NameNode is currently serving client requests.

    Use the ConfiguredFailoverProxyProvider implementation if you are not using a custom implementation.

    For example:

    <property>
      <name>dfs.client.failover.proxy.provider.mycluster</name>
      <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
    </property>

  • dfs.ha.fencing.methods:

    This property specifies a list of scripts or Java classes that will be used to fence the Active NameNode during a failover.

    It is important for maintaining correctness of the system that only one NameNode be in the Active state at any given time. Especially, when using the Quorum Journal Manager, only one NameNode will ever be allowed to write to the JournalNodes, so there is no potential for corrupting the file system metadata from a split-brain scenario. However, when a failover occurs, it is still possible that the previous Active NameNode could serve read requests to clients, which may be out of date until that NameNode shuts down when trying to write to the JournalNodes. For this reason, it is still recommended to configure some fencing methods even when using the Quorum Journal Manager. To improve the availability of the system in the event the fencing mechanisms fail, it is advisable to configure a fencing method which is guaranteed to return success as the last fencing method in the list. Note that if you choose to use no actual fencing methods, you must set some value for this setting, for example shell(/bin/true).

    The fencing methods used during a failover are configured as a carriage-return-separated list, which will be attempted in order until one indicates that fencing has succeeded. The following two methods are packaged with Hadoop: shell and sshfence. For information on implementing custom fencing method, see the org.apache.hadoop.ha.NodeFencer class.

    • sshfence: SSH to the Active NameNode and kill the process.

      The sshfence option SSHes to the target node and uses fuser to kill the process listening on the service's TCP port. In order for this fencing option to work, it must be able to SSH to the target node without providing a passphrase. Ensure that you configure the dfs.ha.fencing.ssh.private-key-files option, which is a comma-separated list of SSH private key files.

      For example:

      <property>
        <name>dfs.ha.fencing.methods</name>
        <value>sshfence</value>
      </property>
      
      <property>
        <name>dfs.ha.fencing.ssh.private-key-files</name>
        <value>/home/exampleuser/.ssh/id_rsa</value>
      </property>

      Optionally, you can also configure a non-standard username or port to perform the SSH. You can also configure a timeout, in milliseconds, for the SSH, after which this fencing method will be considered to have failed. To configure non-standard username or port and timeout, see the properties given below:

      <property>
        <name>dfs.ha.fencing.methods</name>
        <value>sshfence([[username][:port]])</value>
      </property>
      <property>
        <name>dfs.ha.fencing.ssh.connect-timeout</name>
        <value>30000</value>
      </property>
    • shell: Run an arbitrary shell command to fence the Active NameNode.

      The shell fencing method runs an arbitrary shell command:

      <property>
        <name>dfs.ha.fencing.methods</name>
        <value>shell(/path/to/my/script.sh arg1 arg2 ...)</value>
      </property>

      The string between '(' and ')' is passed directly to a bash shell and may not include any closing parentheses.

      The shell command will be run with an environment set up to contain all of the current Hadoop configuration variables, with the '_' character replacing any '.' characters in the configuration keys. The configuration used has already had any namenode-specific configurations promoted to their generic forms -- for example dfs_namenode_rpc-address will contain the RPC address of the target node, even though the configuration may specify that variable as dfs.namenode.rpc-address.ns1.nn1.

      Additionally, the following variables (referring to the target node to be fenced) are also available:

      • $target_host: Hostname of the node to be fenced

      • $target_port: IPC port of the node to be fenced

      • $target_address: The combination of $target_host and $target_port as host:port

      • $target_nameserviceid: The nameservice ID of the NN to be fenced

      • $target_namenodeid: The namenode ID of the NN to be fenced

      These environment variables may also be used as substitutions in the shell command. For example:

      <property>
        <name>dfs.ha.fencing.methods</name>
        <value>shell(/path/to/my/script.sh --nameservice=$target_nameserviceid $target_host:$target_port)</value>
      </property>

      If the shell command returns an exit code of 0, the fencing is successful.

      [Note]Note

      This fencing method does not implement any timeout. If timeouts are necessary, they should be implemented in the shell script itself (for example, by forking a subshell to kill its parent in some number of seconds).

    • fs.defaultFS:

      The default path prefix used by the Hadoop FS client. Optionally, you may now configure the default path for Hadoop clients to use the new HA-enabled logical URI. For example, for mycluster nameservice ID, this will be the value of the authority portion of all of your HDFS paths.

      Configure this property in the core-site.xml file:

      <property>
        <name>fs.defaultFS</name>
        <value>hdfs://mycluster</value>
      </property>
    • dfs.journalnode.edits.dir:

      This is the absolute path on the JournalNode machines where the edits and other local state (used by the JNs) will be stored. You may only use a single path for this configuration. Redundancy for this data is provided by either running multiple separate JournalNodes or by configuring this directory on a locally-attached RAID array. For example:

      <property>
        <name>dfs.journalnode.edits.dir</name>
        <value>/path/to/journal/node/local/data</value>
      </property>