2. Configuration Cluster Properties

HDP provides the option to customize different parameters to tune your Hadoop cluster.

[Note]Note

You can modify these properties from the master-install-location/gsInstaller/gsCluster.properties file. Any changes to this file will override all the default configurations for your Hadoop cluster. It is therefore strongly recommended to exercise caution while changing this file.

[Important]Important

The value of NameNode new generation size (default size of Java new generation for NameNode (Java option -XX:NewSize)) should be 1/8 of maximum heap size (-Xmx) above. Please check, as the default setting may not be accurate. This value is specified in the namenode_opt_newsize property.

Table 5.12. Hadoop-HDFS Properties
Property Name Notes
hadoop_heap_size JVM heap size for the balancer. (Default: 1000 MB (1000m))
namenode_javaheap NameNode's Java heap size. (Default: 4 GB (4G))
namenode_opt_newsize Bound for new generation size (in bytes). (Default: 640 MB (640m)).
dt_heapsize DataNodes' heap size. (Default: 1024 MB (1024m))
jtnode_opt_newsize Lower bound for JobTracker newgen size. (Default: 200 MB (200m))
jtnode_opt_maxnewsize Upper bound for JobTracker newgen size. (Default: 200 MB (200m))
jt_heapsize JobTracker heap size. (Default: 24000 MB (24000m))
fs_inmemory_size Memory allocated for in-memory file-system. Used to merge map-outputs for the reduces. (Default: 256)
datanode_du_reserved Reserved space in bytes per volume. Ensure that you always leave this amount of space free for non HDFS use. (Default: 1,073,741,824)
dfs_datanode_failed_volume_tolerated The number of volumes that are allowed to fail before a DataNode stops offering service. By default, any volume failure will cause a DataNode to shutdown. (Default: 0)

Table 5.13. Hadoop-MapReduce Properties
Property Name Notes
mapred_cluster_map_mem_mb Size (in terms of virtual memory) for a single map slot in the Map Reduce framework. (Default: -1)
mapred_cluster_red_mem_mb Size (in terms of virtual memory) for a single reduce slot in the Map Reduce framework. (Default: -1)
mapred_cluster_max_map_mem_mb Maximum number of map and reduce tasks that can be executed in parallel. This parameter is dpends on the mapred_cluster_map_mem_mb and mapred_cluster_map_mem_mb properties. Ensure that the number of map tasks is always greater than the number of reduce tasks. (Default: -1)
mapred_job_map_mem_mb Size (in terms of virtual memory) of a single map task for the job. (Default: -1 )
mapred_job_red_mem_mb Size (in terms of virtual memory) of a single reduce task for the job. (Default: -1 )
mapred_child_java_opts_sz Java opts for the map and reduce tasks. (Default: -Xmx768m)
mapred_map_tasks_max Nmber of map tasks per TaskTracker concurrently. Ensure that the maximum slots are greateer than the number of CPU cores because map tasks consume majority of free slots. For example: If you have 6 CPU cores and 8 slots, you must set the value for this parameter to 6. (Default: 4)
mapred_red_tasks_max Number of reduce tasks per TaskTracker concurrently. (Default: 4)
io_sort_mb Buffer memory used while sorting files. In order to minimize seek time, each merge stream is assigned 1 MB by default. (Default: 200 MB (200m))
io_sort_spill_percent Soft limit for either the buffer or the record collection buffers. Once reached, a background thread starts spilling the contents to disk. Note that this does not imply any chunking of data to the spill. A value less than 0.5 is not recommended. (Default: 0.9)
mapreduce_userlog_retainhours Maximum retention period for user-logs, post job completion. (Default: 24 hrs. (24))
maxtasks_per_job Maximum number of tasks allowed for single job (map and reduce). (Default: -1)

Table 5.14. Hadoop-ZooKeeper Properties
Property Name Notes
tickTime ZooKeeper uses this time unit to regulate heartbeats and timeouts. For example, if the tickTime is set to 2000, the minimum session timeout will be two ticks. (Default: 2000 milliseconds). Required only if installzookeeper is set to yes. Note, you must also set installhbase to yes.
initLimit

Amount of time (in ticks) to allow followers to connect and sync to a leader. Increase this value only if ZooKeeper manages large amount of data in your cluster. (Default: 10). Required only if installzookeeper is set to yes. Note, you must also set installhbase to yes.

syncLimit Amount of time (in ticks) to allow followers to sync with ZooKeeper. All those followers that fall too far behind a leade will be dropped. (Default: 5). Required only if installzookeeper is set to yes. Note, you must also set installhbase to yes.
clientPort Default port used for listening to the client connections. (Default: 2181). Required only if installzookeeper is set to yes. Note, you must also set installhbase to yes.


loading table of contents...