Reading data from HBase
The Get and Scan are the two ways to read data from
  HBase, aside from manually parsing HFiles.
A Get is simply a Scan limited by the API to one row. A
    Scan fetches zero or more rows of a table. By default, a Scan
   reads the entire table from start to end. You can limit your Scan results in
   several different ways, which affect the Scan's load in terms of IO, network, or
   both, as well as processing load on the client side. This topic is provided as a quick reference.
   Refer to the API documentation for Scan for more in-depth information. You can also
   perform Get and Scan using the HBase Shell, the REST API, or the Thrift API.
- Specify a 
startroworstoprowor both. Neitherstartrownorstoprowneed to exist. Because HBase sorts rows lexicographically, it will return the first row afterstartrowwould have occurred, and will stop returning rows afterstoprowwould have occurred.The goal is to reduce IO and network.- The 
startrowis inclusive and thestoprowis exclusive. Given a table with rowsa,b,c,d,e,f, andstartrowofcandstoprowoff, rowsc-eare returned. - If you omit 
startrow, the first row of the table is thestartrow. - If you omit the 
stoprow, all results afterstartrow(includingstartrow) are returned. - If 
startrowis lexicographically afterstoprow, and you setScan setReversed(boolean reversed)totrue, the results are returned in reverse order. Given the same table above, with rowsa-f, if you specifycas the stoprow andfas the startrow, rowsf,e, anddare returned. 
Scan() Scan(byte[] startRow) Scan(byte[] startRow, byte[] stopRow) - The 
 - Specify a scanner cache that will be filled before the Scan result is returned,
    setting 
setCachingto the number of rows to cache before returning the result. By default, the caching setting on the table is used. The goal is to balance IO and network load.public Scan setCaching(int caching) - To limit the number of columns if your table has very wide rows (rows with a
    large number of columns), use setBatch(int batch) and set it to the
    number of columns you want to return in one batch. A large number of
    columns is not a recommended design pattern.
public Scan setBatch(int batch) - To specify a maximum result size, use 
setMaxResultSize(long), with the number of bytes. The goal is to reduce IO and network.public Scan setMaxResultSize(long maxResultSize) - When you use 
setCachingandsetMaxResultSizetogether, single server requests are limited by either number of rows or maximum result size, whichever limit comes first. - You can limit the scan to specific column families or columns by
    using 
addFamilyoraddColumn. The goal is to reduce IO and network. IO is reduced because each column family is represented by a Store on each RegionServer, and only the Stores representing the specific column families in question need to be accessed.public Scan addColumn(byte[] family, byte[] qualifier) public Scan addFamily(byte[] family) - You can specify a range of timestamps or a single timestamp by specifying
    setTimeRange or setTimestamp.
public Scan setTimeRange(long minStamp, long maxStamp) throws IOException public Scan setTimeStamp(long timestamp) throws IOException - You can retrieve a maximum number of versions by using setMaxVersions.
public Scan setMaxVersions(int maxVersions) - You can use a filter by using 
setFilter. .public Scan setFilter(Filter filter) - You can disable the server-side block cache for a specific scan
    using the API 
setCacheBlocks(boolean). This is an expert setting and should only be used if you know what you are doing. 
