HDP 2.2 enables HBase administrators to configure HBase clusters with read-only high availability (HA). This feature benefits HBase applications that require low-latency queries and can tolerate minimal staleness for read operations. Examples include queries on remote sensor data, distributed messaging, object stores, and user profile management.
High Availability for HBase features the following functionality:
Data is safely protected in HDFS
Failed nodes are automatically recovered
No single point of failure
All HBase API and region operations are supported, including scans, region split/merge, and META table support. The META table stores information about regions.
However, HBase administrators should carefully consider the following costs associated with using high availability features:
Double or triple
MemStore
usageIncreased
BlockCache
usageIncreased network traffic for log replication
Extra backup RPCs for secondary region replicas
HBase is a distributed key-value store designed for fast table scans and read operations at petabyte scale. Before configuring HA for HBase, you should understand the concepts in the following table.
HBase Concept | Description |
---|---|
Region | A group of contiguous rows in an HBase table. Tables start with one region. Additional regions are added dynamically as the table grows. Regions can be spread across multiple hosts to balance workloads and recover quickly from failure. There are two types of regions: primary and secondary. A secondary region is a copy of a primary region that is replicated on a different Region Server. |
Region Server | A Region Server serves data requests for one or more regions. A single region is serviced by only one Region Server, but a Region Server may serve multiple regions. When region replication is enabled, a Region Server can serve regions in primary and secondary mode concurrently. |
Column family | A column family is a group of semantically related columns that are stored together. |
|
|
Write Ahead Log (WAL) | The WAL is a log file that records all changes to data until the data is successfully written
to disk and the |
Compaction | When operations stored in the |
For information about configuring regions, see "HBase Cluster Capacity and Region Sizing" in the System Administration Guides.