Versions
When you put data into HBase, a timestamp is required.
The timestamp can be generated automatically by the RegionServer or can be supplied by you. The timestamp must be unique per version of a given cell, because the timestamp identifies the version. To modify a previous version of a cell, for instance, you would issue a Put with a different value for the data itself, but the same timestamp.
HBase's behavior regarding versions is highly configurable. The maximum
number of versions defaults to 1. You can change the default value for
HBase by configuring hbase.column.max.version
in
hbase-site.xml
, either using an advanced
configuration snippet if you use Cloudera Manager, or by editing the
file directly otherwise.
alter
statements in HBase Shell to
create new column families with the given characteristics, but you can
use the same syntax when creating a new table or to alter an existing
column family. This is only a fraction of the options you can specify
for a given column
family.hbase> alter ‘t1′, NAME => ‘f1′, VERSIONS => 5
hbase> alter ‘t1′, NAME => ‘f1′, MIN_VERSIONS => 2
hbase> alter ‘t1′, NAME => ‘f1′, TTL => 15
HBase sorts the versions of a cell from newest to oldest, by sorting
the timestamps lexicographically. When a version needs to be deleted
because a threshold has been reached, HBase always chooses the "oldest"
version, even if it is in fact the most recent version to be inserted.
Keep this in mind when designing your timestamps. Consider using the
default generated timestamps and storing other version-specific data
elsewhere in the row, such as in the row key. If
MIN_VERSIONS
and TTL
conflict,
MIN_VERSIONS
takes precedence.