Checking and Repairing HBase Tables
HBaseFsck (hbck) is a command-line tool that checks for region consistency and table integrity problems and repairs corruption. It works in two basic
modes — a read-only inconsistency identifying mode and a multi-phase read-write repair mode.
- Read-only inconsistency identification: In this mode, which is the default, a report is generated but no repairs are attempted.
- Read-write repair mode: In this mode, if errors are found, hbck attempts to repair them.
You can run hbck manually or configure the hbck poller to run hbck periodically.
Always run HBase administrative commands such as the HBase Shell, hbck, or bulk-load commands as the HBase user (typically hbase).
Running hbck Manually
The hbck command is located in the bin directory of the HBase install.
- With no arguments, hbck checks HBase for inconsistencies and prints OK if no inconsistencies are found, or the number of inconsistencies otherwise.
- With the -details argument, hbck checks HBase for inconsistencies and prints a detailed report.
- To limit hbck to only checking specific tables, provide them as a space-separated list: hbck <table1> <table2>
- If region-level inconsistencies are found, use the -fix argument to direct hbck to try to fix them. The following sequence
of steps is followed:
- The standard check for inconsistencies is run.
- If needed, repairs are made to tables.
- If needed, repairs are made to regions. Regions are closed during repair.
- You can also fix individual region-level inconsistencies separately, rather than fixing them automatically with the -fix argument.
- -fixAssignments repairs unassigned, incorrectly assigned or multiply assigned regions.
- -fixMeta removes rows from hbase:meta when their corresponding regions are not present in HDFS and adds new meta rows if regions are present in HDFS but not in hbase:meta.
- -repairHoles creates HFiles for new empty regions on the filesystem and ensures that the new regions are consistent.
- -fixHdfsOrphans repairs a region directory that is missing a region metadata file (the .regioninfo file).
- -fixHdfsOverlaps fixes overlapping regions. You can further tune this argument using the following options:
- -maxMerge <n> controls the maximum number of regions to merge.
- -sidelineBigOverlaps attempts to sideline the regions which overlap the largest number of other regions.
- -maxOverlapsToSideline <n> limits the maximum number of regions to sideline.
- To try to repair all inconsistencies and corruption at once, use the -repair option, which includes all the region and table consistency options.
For more details about the hbck command, see Appendix C of the HBase Reference Guide.
Configuring the hbck Poller
The hbck poller is a feature of Cloudera Manager, which can be configured to run hbck automatically, in read-only mode, and send alerts if errors are found. By default, it runs every 30 minutes. Several configuration settings are available for the hbck poller. The hbck poller is not provided if you use CDH without Cloudera Manager.
Configuring the hbck Poller
- Go to the HBase service and click Configuration.
- Configure the alert behavior with the following settings:
- HBase Hbck Poller Maximum Error Count: The maximum number of errors that the hbck poller will retain through a given run.
- HBase Hbck Region Error Count: An alert is published if at least this number of regions is detected with errors across all regions in this service. If the value is not set, alerts will not be published based on the count of regions with errors.
- Alert Threshold: An alert is published if the number of errors reaches this threshold.
- HBase Hbck Error Count Alert Threshold: An alert is published if at least this number of tables is detected with errors across all tables in this service. Some errors are not associated with a region, such as RS_CONNECT_FAILURE. If the value is not set, alerts will not be published based on the count of tables with errors.
- HBase Hbck Alert Error Codes: An alert is published errors match any of the specified codes. The default behavior is not to limit the error codes
which trigger an alert. May be set to one or more of the following:
- UNKNOWN
- NO_META_REGION
- NULL_ROOT_REGION
- NO_VERSION_FILE
- NOT_IN_META_HDFS
- NOT_IN_META
- NOT_IN_META_OR_DEPLOYED
- NOT_IN_HDFS_OR_DEPLOYED
- NOT_IN_HDFS
- SERVER_DOES_NOT_MATCH_META
- NOT_DEPLOYED
- MULTI_DEPLOYED
- SHOULD_NOT_BE_DEPLOYED
- MULTI_META_REGION
- RS_CONNECT_FAILURE
- FIRST_REGION_STARTKEY_NOT_EMPTY
- LAST_REGION_ENDKEY_NOT_EMPTY
- DUPE_STARTKEYS
- HOLE_IN_REGION_CHAIN
- OVERLAP_IN_REGION_CHAIN
- REGION_CYCLE
- DEGENERATE_REGION
- ORPHAN_HDFS_REGION
- LINGERING_SPLIT_PARENT
- NO_TABLEINFO_FILE
- To configure the polling interval, edit the Service Monitor Derived Configs Advanced Configuration Snippet with a setting such as the following, which
sets the polling interval to 60 minutes. Restart the RegionServers for the changes to take effect.
<property> <name>smon.hbase.fsckpoller.interval.ms</name> <value>3600000</value> </property>