HDFS heap sizing
You can provision an HDFS cluster for optimal performance based on the desired storage capacity.
Component | Memory | CPU | Disk |
---|---|---|---|
JournalNode | 1 GB (default) Set this value using the Java Heap Size of JournalNode in Bytes HDFS configuration property. |
1 core minimum | 1 dedicated disk |
NameNode |
See Sizing NameNode Heap Memory. Set this value using the Java Heap Size of NameNode in Bytes HDFS configuration property. |
Minimum of 4 dedicated cores; more may be required for larger clusters |
|
DataNode |
Minimum: 4 GB Maximum: 8 GB Increase the memory for higher replica counts or a higher number of blocks per DataNode. When increasing the memory, Cloudera recommends an additional 1 GB of memory for every 1 million replicas above 4 million on the DataNodes. For example, 5 million replicas require 5 GB of memory. Set this value using the Java Heap Size of DataNode in Bytes HDFS configuration property. |
Minimum: 4 cores. Add more cores for highly active clusters. |
Minimum: 4 Maximum: 24 The maximum acceptable size will vary depending upon how large average block size is. The DN’s scalability limits are mostly a function of the number of replicas per DN, not the overall number of bytes stored. That said, having ultra-dense DNs will affect recovery times in the event of machine or rack failure. Cloudera does not support exceeding 100 TB per data node. You could use 12 x 8 TB spindles or 24 x 4TB spindles. Cloudera does not support drives larger than 8 TB. Configure the disks in JBOD mode. Do not use RAID/LVM/ZFS. |