Use the HBase command-line utilities
Besides the HBase Shell, HBase includes several other command-line utilities, which are
available in the hbase/bin/
directory of each HBase host. This topic provides
basic usage instructions for the most commonly used utilities.
PerformanceEvaluation
The PerformanceEvaluation
utility allows you to run several preconfigured
tests on your cluster and reports its performance. To run the
PerformanceEvaluation
tool, use the bin/hbase
pe
command.
$ hbase pe
Usage: java org.apache.hadoop.hbase.PerformanceEvaluation \
<OPTIONS> [-D<property=value>]* <command> <nclients>
Options:
nomapred Run multiple clients using threads (rather than use mapreduce)
rows Rows each client runs. Default: One million
size Total size in GiB. Mutually exclusive with --rows. Default: 1.0.
sampleRate Execute test on a sample of total rows. Only supported by randomRead.
Default: 1.0
traceRate Enable HTrace spans. Initiate tracing every N rows. Default: 0
table Alternate table name. Default: 'TestTable'
multiGet If >0, when doing RandomRead, perform multiple gets instead of single
gets.
Default: 0
compress Compression type to use (GZ, LZO, ...). Default: 'NONE'
flushCommits Used to determine if the test should flush the table. Default: false
writeToWAL Set writeToWAL on puts. Default: True
autoFlush Set autoFlush on htable. Default: False
oneCon all the threads share the same connection. Default: False
presplit Create presplit table. Recommended for accurate perf analysis (see
guide). Default: disabled
inmemory Tries to keep the HFiles of the CF inmemory as far as possible. Not
guaranteed that reads are always served from memory. Default: false
usetags Writes tags along with KVs. Use with HFile V3. Default: false
numoftags Specify the no of tags that would be needed. This works only if usetags
is true.
filterAll Helps to filter out all the rows on the server side there by not returning
anything back to the client. Helps to check the server side performance.
Uses FilterAllFilter internally.
latency Set to report operation latencies. Default: False
bloomFilter Bloom filter type, one of [NONE, ROW, ROWCOL]
valueSize Pass value size to use: Default: 1024
valueRandom Set if we should vary value size between 0 and 'valueSize'; set on read
for stats on size: Default: Not set.
valueZipf Set if we should vary value size between 0 and 'valueSize' in zipf form:
Default: Not set.
period Report every 'period' rows: Default: opts.perClientRunRows / 10
multiGet Batch gets together into groups of N. Only supported by randomRead.
Default: disabled
addColumns Adds columns to scans/gets explicitly. Default: true
replicas Enable region replica testing. Defaults: 1.
splitPolicy Specify a custom RegionSplitPolicy for the table.
randomSleep Do a random sleep before each get between 0 and entered value. Defaults: 0
columns Columns to write per row. Default: 1
caching Scan caching to use. Default: 30
Note: -D properties will be applied to the conf used.
For example:
-Dmapreduce.output.fileoutputformat.compress=true
-Dmapreduce.task.timeout=60000
Command:
append Append on each row; clients overlap on keyspace so some concurrent
operations
checkAndDelete CheckAndDelete on each row; clients overlap on keyspace so some concurrent
operations
checkAndMutate CheckAndMutate on each row; clients overlap on keyspace so some concurrent
operations
checkAndPut CheckAndPut on each row; clients overlap on keyspace so some concurrent
operations
filterScan Run scan test using a filter to find a specific row based on it's value
(make sure to use --rows=20)
increment Increment on each row; clients overlap on keyspace so some concurrent
operations
randomRead Run random read test
randomSeekScan Run random seek and scan 100 test
randomWrite Run random write test
scan Run scan test (read every row)
scanRange10 Run random seek scan with both start and stop row (max 10 rows)
scanRange100 Run random seek scan with both start and stop row (max 100 rows)
scanRange1000 Run random seek scan with both start and stop row (max 1000 rows)
scanRange10000 Run random seek scan with both start and stop row (max 10000 rows)
sequentialRead Run sequential read test
sequentialWrite Run sequential write test
Args:
nclients Integer. Required. Total number of clients (and HRegionServers)
running: 1 <= value <= 500
Examples:
To run a single client doing the default 1M sequentialWrites:
$ bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation sequentialWrite 1
To run 10 clients doing increments over ten rows:
$ bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation --rows=10 --nomapred increment 10
LoadTestTool
The LoadTestTool
utility load-tests your cluster by performing writes,
updates, or reads on it. To run the LoadTestTool
, use the bin/hbase
ltt
command. To print general usage information, use the -h
option.
$ bin/hbase ltt -h
Options:
-batchupdate Whether to use batch as opposed to separate updates for every column
in a row
-bloom <arg> Bloom filter type, one of [NONE, ROW, ROWCOL]
-compression <arg> Compression type, one of [LZO, GZ, NONE, SNAPPY, LZ4]
-data_block_encoding <arg> Encoding algorithm (e.g. prefix compression) to use for data blocks
in the test column family, one of
[NONE, PREFIX, DIFF, FAST_DIFF, PREFIX_TREE].
-deferredlogflush Enable deferred log flush.
-encryption <arg> Enables transparent encryption on the test table, one of [AES]
-families <arg> The name of the column families to use separated by comma
-generator <arg> The class which generates load for the tool. Any args for this class
can be passed as colon separated after class name
-h,--help Show usage
-in_memory Tries to keep the HFiles of the CF inmemory as far as possible. Not
guaranteed that reads are always served from inmemory
-init_only Initialize the test table only, don't do any loading
-key_window <arg> The 'key window' to maintain between reads and writes for concurrent
write/read workload. The default is 0.
-max_read_errors <arg> The maximum number of read errors to tolerate before terminating all
reader threads. The default is 10.
-mob_threshold <arg> Desired cell size to exceed in bytes that will use the MOB write path
-multiget_batchsize <arg> Whether to use multi-gets as opposed to separate gets for every
column in a row
-multiput Whether to use multi-puts as opposed to separate puts for every
column in a row
-num_keys <arg> The number of keys to read/write
-num_regions_per_server <arg> Desired number of regions per region server. Defaults to 5.
-num_tables <arg> A positive integer number. When a number n is specified, load test tool
will load n table parallely. -tn parameter value becomes table name prefix.
Each table name is in format <tn>_1...<tn>_n
-read <arg> <verify_percent>[:<#threads=20>]
-reader <arg> The class for executing the read requests
-region_replica_id <arg> Region replica id to do the reads from
-region_replication <arg> Desired number of replicas per region
-regions_per_server <arg> A positive integer number. When a number n is specified, load test tool
will create the test table with n regions per server
-skip_init Skip the initialization; assume test table already exists
-start_key <arg> The first key to read/write (a 0-based index). The default value is 0.
-tn <arg> The name of the table to read or write
-update <arg> <update_percent>[:<#threads=20>][:<#whether to ignore nonce collisions=0>]
-updater <arg> The class for executing the update requests
-write <arg> <avg_cols_per_key>:<avg_data_size>[:<#threads=20>]
-writer <arg> The class for executing the write requests
-zk <arg> ZK quorum as comma-separated host names without port numbers
-zk_root <arg> name of parent znode in zookeeper
wal
The wal
utility prints information about the contents of a specified WAL
file. To get a list of all WAL files, use the HDFS command hadoop fs -ls -R
/hbase/WALs
. To run the wal
utility, use the bin/hbase
wal
command. Run it without options to get usage information.
hbase wal
usage: WAL <filename...> [-h] [-j] [-p] [-r <arg>] [-s <arg>] [-w <arg>]
-h,--help Output help message
-j,--json Output JSON
-p,--printvals Print values
-r,--region <arg> Region to filter by. Pass encoded region name; e.g.
'9192caead6a5a20acb4454ffbc79fa14'
-s,--sequence <arg> Sequence to filter by. Pass sequence number.
-w,--row <arg> Row to filter by. Pass row name.
hfile
The hfile
utility prints diagnostic information about a specified hfile,
such as block headers or statistics. To get a list of all hfiles, use the HDFS command
hadoop fs -ls -R /hbase/data
. To run the hfile
utility,
use the bin/hbase hfile
command. Run it without options to get usage
information.
$ hbase hfile
usage: HFile [-a] [-b] [-e] [-f <arg> | -r <arg>] [-h] [-i] [-k] [-m] [-p]
[-s] [-v] [-w <arg>]
-a,--checkfamily Enable family check
-b,--printblocks Print block index meta data
-e,--printkey Print keys
-f,--file <arg> File to scan. Pass full-path; e.g.
hdfs://a:9000/hbase/hbase:meta/12/34
-h,--printblockheaders Print block headers for each block.
-i,--checkMobIntegrity Print all cells whose mob files are missing
-k,--checkrow Enable row order check; looks for out-of-order
keys
-m,--printmeta Print meta data of file
-p,--printkv Print key/value pairs
-r,--region <arg> Region to scan. Pass region name; e.g.
'hbase:meta,,1'
-s,--stats Print statistics
-v,--verbose Verbose output; emits file and meta data
delimiters
-w,--seekToRow <arg> Seek to this row and print all the kvs for this
row only
hbck
The hbck
utility checks and optionally repairs errors in HFiles.
To run hbck
, use the bin/hbase hbck
command. Run it with
the -h
option to get more usage information.
-----------------------------------------------------------------------
NOTE: As of HBase version 2.0, the hbck tool is significantly changed.
In general, all Read-Only options are supported and can be be used
safely. Most -fix/ -repair options are NOT supported. Please see usage
below for details on which options are not supported.
-----------------------------------------------------------------------
Usage: fsck [opts] {only tables}
where [opts] are:
-help Display help options (this)
-details Display full report of all regions.
-timelag <timeInSeconds> Process only regions that have not experienced any metadata updates in the last <timeInSeconds> seconds.
-sleepBeforeRerun <timeInSeconds> Sleep this many seconds before checking if the fix worked if run with -fix
-summary Print only summary of the tables and status.
-metaonly Only check the state of the hbase:meta table.
-sidelineDir <hdfs://> HDFS path to backup existing meta.
-boundaries Verify that regions boundaries are the same between META and store files.
-exclusive Abort if another hbck is exclusive or fixing.
Datafile Repair options: (expert features, use with caution!)
-checkCorruptHFiles Check all Hfiles by opening them to make sure they are valid
-sidelineCorruptHFiles Quarantine corrupted HFiles. implies -checkCorruptHFiles
Replication options
-fixReplication Deletes replication queues for removed peers
Metadata Repair options supported as of version 2.0: (expert features, use with caution!)
-fixVersionFile Try to fix missing hbase.version file in hdfs.
-fixReferenceFiles Try to offline lingering reference store files
-fixHFileLinks Try to offline lingering HFileLinks
-noHdfsChecking Don't load/check region info from HDFS. Assumes hbase:meta region info is good. Won't check/fix any HDFS issue, e.g. hole, orphan, or overlap
-ignorePreCheckPermission ignore filesystem permission pre-check
NOTE: Following options are NOT supported as of HBase version 2.0+.
UNSUPPORTED Metadata Repair options: (expert features, use with caution!)
-fix Try to fix region assignments. This is for backwards compatiblity
-fixAssignments Try to fix region assignments. Replaces the old -fix
-fixMeta Try to fix meta problems. This assumes HDFS region info is good.
-fixHdfsHoles Try to fix region holes in hdfs.
-fixHdfsOrphans Try to fix region dirs with no .regioninfo file in hdfs
-fixTableOrphans Try to fix table dirs with no .tableinfo file in hdfs (online mode only)
-fixHdfsOverlaps Try to fix region overlaps in hdfs.
-maxMerge <n> When fixing region overlaps, allow at most <n> regions to merge. (n=5 by default)
-sidelineBigOverlaps When fixing region overlaps, allow to sideline big overlaps
-maxOverlapsToSideline <n> When fixing region overlaps, allow at most <n> regions to sideline per group. (n=2 by default)
-fixSplitParents Try to force offline split parents to be online.
-removeParents Try to offline and sideline lingering parents and keep daughter regions.
-fixEmptyMetaCells Try to fix hbase:meta entries not referencing any region (empty REGIONINFO_QUALIFIER rows)
UNSUPPORTED Metadata Repair shortcuts
-repair Shortcut for -fixAssignments -fixMeta -fixHdfsHoles -fixHdfsOrphans -fixHdfsOverlaps -fixVersionFile -sidelineBigOverlaps -fixReferenceFiles-fixHFileLinks
-repairHoles Shortcut for -fixAssignments -fixMeta -fixHdfsHoles
clean
After you have finished using a test or proof-of-concept cluster, the hbase
clean
utility can remove all HBase-related data from ZooKeeper and HDFS.
To run the hbase clean
utility, use the bin/hbase clean command. Run it
with no options for usage information.
$ bin/hbase clean
Usage: hbase clean (--cleanZk|--cleanHdfs|--cleanAll)
Options:
--cleanZk cleans hbase related data from zookeeper.
--cleanHdfs cleans hbase related data from hdfs.
--cleanAll cleans hbase related data from both zookeeper and hdfs.