Use the instructions provided in this section to configure Full-Stack HA fail over resiliency for the HDP clients.
Note | |
---|---|
Your Hadoop configuration directories are defined during the HDP installation. For details, see: Setting Up Hadoop Configuration. |
Step 1: Edit the $HADOOP_CONF_DIR/hdfs-site.xml
file to add the following properties:
Enable the HDFS client retry policy.
<property> <name>dfs.client.retry.policy.enabled</name> <value>true</value> <description> Enables HDFS client retry in case of NameNode failure.</description> </property>
Configure protection for NameNode edit log.
<property> <name>dfs.namenode.edits.toleration.length</name> <value>8192</value> <description> Prevents corruption of NameNode edit log.</description> </property>
Configure safe mode extension time.
<property> <name>dfs.safemode.extension</name> <value>10</value> <description> The default value (30 seconds) is applicable for very large clusters. For small to large clusters (upto 200 nodes), recommended value is 10 seconds.</description> </property>
Ensure that the allocated DFS blocks persist across multiple fail overs.
<property> <name>dfs.persist.blocks</name> <value>true</value> <description>Ensure that the allocated DFS blocks persist across multiple fail overs.</description> </property>
Configure delay for first block report.
<property> <name>dfs.blockreport.initialDelay</name> <value>10</value> <description> Delay (in seconds) for first block report.</description> </property>
Step 2: Modify the following property in the $HADOOP_CONF_DIR/core-site.xml
file:
<property> <name>fs.checkpoint.period</name> <value>3600</value> <description> The number of seconds between two periodic checkpoints.</description> </property>
This will ensure that the checkpoint is performed on an hourly basis.
Step 3: Edit the
file to add the following properties:$HADOOP_CONF_DIR
/mapred-site.xml
Enable the JobTracker’s safe mode functionality.
<property> <name>mapreduce.jt.hdfs.monitor.enable</name> <value>true</value> <description> Enable the JobTracker to go into safe mode when the NameNode is not responding.</description> </property>
Enable retry for JobTracker clients (when the JobTracker is in safe mode).
<property> <name>mapreduce.jobclient.retry.policy.enabled</name> <value>true</value> <description> Enable the MapReduce job client to retry job submission when the JobTracker is in safe mode.</description> </property>
Enable recovery of JobTracker’s queue after it is restarted.
<property> <name>mapred.jobtracker.restart.recover</name> <value>true</value> <description> Enable the JobTracker to recover its queue after it is restarted.</description> </property>