HDFS heap sizing

You can provision an HDFS cluster for optimal performance based on the desired storage capacity.

Component Memory CPU Disk
JournalNode 1 GB (default)

Set this value using the Java Heap Size of JournalNode in Bytes HDFS configuration property.

1 core minimum 1 dedicated disk
  • Minimum: 1 GB (for proof-of-concept deployments)
  • Add an additional 1 GB for each additional 1,000,000 blocks

    Snapshots and encryption can increase the required heap memory.

See Sizing NameNode Heap Memory.

Set this value using the Java Heap Size of NameNode in Bytes HDFS configuration property.

Minimum of 4 dedicated cores; more may be required for larger clusters
  • Minimum of 2 dedicated disks for metadata
  • 1 dedicated disk for log files (This disk may be shared with the operating system.)
  • Maximum disks: 4

Minimum: 4 GB

Maximum: 8 GB

Increase the memory for higher replica counts or a higher number of blocks per DataNode. When increasing the memory, Cloudera recommends an additional 1 GB of memory for every 1 million replicas above 4 million on the DataNodes. For example, 5 million replicas require 5 GB of memory.

Set this value using the Java Heap Size of DataNode in Bytes HDFS configuration property.

Minimum: 4 cores. Add more cores for highly active clusters.

Minimum: 4

Maximum: 24

The maximum acceptable size will vary depending upon how large average block size is. The DN’s scalability limits are mostly a function of the number of replicas per DN, not the overall number of bytes stored. That said, having ultra-dense DNs will affect recovery times in the event of machine or rack failure. Cloudera does not support exceeding 100 TB per data node. You could use 12 x 8 TB spindles or 24 x 4TB spindles. Cloudera does not support drives larger than 8 TB.

Configure the disks in JBOD mode. Do not use RAID/LVM/ZFS.