Chapter 10. HBase Cluster Capacity and Region Sizing

This chapter discusses the topics of planning the capacity of an HBase cluster and the size of its region servers. There are several considerations when planning the capacity of an Apache HBase cluster and performing the initial configuration:

  • Node Count and JVM Configuration

  • Region Count and Size

  • Initial Configuration and Tuning

    This chapter requires an understanding of the following HBase concepts:

     

    Table 10.1. HBase Concepts

    HBase ConceptDescription
    RegionA group of contiguous HBase table rows. Tables start with one region and additional regions are dynamically added as the table grows. Regions can be spread across multiple hosts to provide load balancing and quick recovery from failure. There are two types of region: primary and secondary. A secondary region is a replicated primary region located on a different region server.
    region serverServes data requests for one or more regions. A single region is serviced by only one region server, but a region server may serve multiple regions.
    Column familyA group of semantically related columns stored together.
    MemstoreIn-memory storage for a region server. region servers write files to HDFS after the memstore reaches a configurable maximum value specified with the hbase.hregion.memstore.flush.size property in the hbase-site.xml configuration file.
    Write Ahead Log (WAL)In-memory log where operations are recorded before they are stored in the memstore.
    Compaction stormWhen the operations stored in the memstore are flushed to disk, HBase consolidates and merges many smaller files into fewer large files. This consolidation is called compaction, and it is usually very fast. However, if many region servers hit the data limit specified by the memstore at the same time, HBase performance may degrade from the large number of simultaneous major compactions. Administrators can avoid this by manually splitting tables over time.



loading table of contents...