@InterfaceAudience.Public @InterfaceStability.Stable public class Scan extends Query
All operations are identical to Get
with the exception of
instantiation. Rather than specifying a single row, an optional startRow
and stopRow may be defined. If rows are not specified, the Scanner will
iterate over all rows.
To scan everything for each row, instantiate a Scan object.
To modify scanner caching for just this scan, use setCaching
.
If caching is NOT set, we will use the caching value of the hosting Table
.
In addition to row caching, it is possible to specify a
maximum result size, using setMaxResultSize(long)
. When both are used,
single server requests are limited by either number of rows or maximum result size, whichever
limit comes first.
To further define the scope of what to get when scanning, perform additional methods as outlined below.
To get all columns from specific families, execute addFamily
for each family to retrieve.
To get specific columns, execute addColumn
for each column to retrieve.
To only retrieve columns within a specific range of version timestamps,
execute setTimeRange
.
To only retrieve columns with a specific timestamp, execute
setTimestamp
.
To limit the number of versions of each column to be returned, execute
setMaxVersions
.
To limit the maximum number of values returned for each call to next(),
execute setBatch
.
To add a filter, execute setFilter
.
Expert: To explicitly disable server-side block caching for this scan,
execute setCacheBlocks(boolean)
.
Note: Usage alters Scan instances. Internally, attributes are updated as the Scan runs and if enabled, metrics accumulate in the Scan instance. Be aware this is the case when you go to clone a Scan instance or if you go to reuse a created Scan instance; safer is create a Scan instance per usage.
Modifier and Type | Field and Description |
---|---|
static String |
HINT_LOOKAHEAD
Deprecated.
without replacement
This is now a no-op, SEEKs and SKIPs are optimizated automatically.
Will be removed in 2.0+
|
static String |
SCAN_ATTRIBUTES_METRICS_DATA
Deprecated.
|
static String |
SCAN_ATTRIBUTES_METRICS_ENABLE
Deprecated.
since 1.0.0. Use
setScanMetricsEnabled(boolean) |
static String |
SCAN_ATTRIBUTES_TABLE_NAME |
consistency, filter, targetReplicaId
ID_ATRIBUTE
Constructor and Description |
---|
Scan()
Create a Scan operation across all rows.
|
Scan(byte[] startRow)
Create a Scan operation starting at the specified row.
|
Scan(byte[] startRow,
byte[] stopRow)
Create a Scan operation for the range of rows specified.
|
Scan(byte[] startRow,
Filter filter) |
Scan(Get get)
Builds a scan object with the same specs as get.
|
Scan(Scan scan)
Creates a new instance of this class while copying all values.
|
Modifier and Type | Method and Description |
---|---|
Scan |
addColumn(byte[] family,
byte[] qualifier)
Get the column from the specified family with the specified qualifier.
|
Scan |
addFamily(byte[] family)
Get all columns from the specified family.
|
boolean |
doLoadColumnFamiliesOnDemand()
Get the logical value indicating whether on-demand CF loading should be allowed.
|
boolean |
getAllowPartialResults() |
int |
getBatch() |
boolean |
getCacheBlocks()
Get whether blocks should be cached for this Scan.
|
int |
getCaching() |
byte[][] |
getFamilies() |
Map<byte[],NavigableSet<byte[]>> |
getFamilyMap()
Getting the familyMap
|
Filter |
getFilter() |
Map<String,Object> |
getFingerprint()
Compile the table and column family (i.e.
|
Boolean |
getLoadColumnFamiliesOnDemandValue()
Get the raw loadColumnFamiliesOnDemand setting; if it's not set, can be null.
|
long |
getMaxResultSize() |
int |
getMaxResultsPerColumnFamily() |
int |
getMaxVersions() |
int |
getRowOffsetPerColumnFamily()
Method for retrieving the scan's offset per row per column
family (#kvs to be skipped)
|
ScanMetrics |
getScanMetrics() |
byte[] |
getStartRow() |
byte[] |
getStopRow() |
TimeRange |
getTimeRange() |
boolean |
hasFamilies() |
boolean |
hasFilter() |
boolean |
isGetScan() |
boolean |
isRaw() |
boolean |
isReversed()
Get whether this scan is a reversed one.
|
boolean |
isScanMetricsEnabled() |
boolean |
isSmall()
Get whether this scan is a small scan
|
int |
numFamilies() |
Scan |
setACL(Map<String,Permission> perms) |
Scan |
setACL(String user,
Permission perms) |
Scan |
setAllowPartialResults(boolean allowPartialResults)
Setting whether the caller wants to see the partial results that may be returned from the
server.
|
Scan |
setAttribute(String name,
byte[] value)
Sets an attribute.
|
Scan |
setAuthorizations(Authorizations authorizations)
Sets the authorizations to be used by this Query
|
Scan |
setBatch(int batch)
Set the maximum number of values to return for each call to next()
|
Scan |
setCacheBlocks(boolean cacheBlocks)
Set whether blocks should be cached for this Scan.
|
Scan |
setCaching(int caching)
Set the number of rows for caching that will be passed to scanners.
|
Scan |
setConsistency(Consistency consistency)
Sets the consistency level for this operation
|
Scan |
setFamilyMap(Map<byte[],NavigableSet<byte[]>> familyMap)
Setting the familyMap
|
Scan |
setFilter(Filter filter)
Apply the specified server-side filter when performing the Query.
|
Scan |
setId(String id)
This method allows you to set an identifier on an operation.
|
Scan |
setIsolationLevel(IsolationLevel level)
Set the isolation level for this query.
|
Scan |
setLoadColumnFamiliesOnDemand(boolean value)
Set the value indicating whether loading CFs on demand should be allowed (cluster
default is false).
|
Scan |
setMaxResultSize(long maxResultSize)
Set the maximum result size.
|
Scan |
setMaxResultsPerColumnFamily(int limit)
Set the maximum number of values to return per row per Column Family
|
Scan |
setMaxVersions()
Get all available versions.
|
Scan |
setMaxVersions(int maxVersions)
Get up to the specified number of versions of each column.
|
Scan |
setRaw(boolean raw)
Enable/disable "raw" mode for this scan.
|
Scan |
setReplicaId(int Id)
Specify region replica id where Query will fetch data from.
|
Scan |
setReversed(boolean reversed)
Set whether this scan is a reversed one
|
Scan |
setRowOffsetPerColumnFamily(int offset)
Set offset for the row per Column Family.
|
Scan |
setRowPrefixFilter(byte[] rowPrefix)
Set a filter (using stopRow and startRow) so the result set only contains rows where the
rowKey starts with the specified prefix.
|
Scan |
setScanMetricsEnabled(boolean enabled)
Enable collection of
ScanMetrics . |
Scan |
setSmall(boolean small)
Set whether this scan is a small scan
|
Scan |
setStartRow(byte[] startRow)
Set the start row of the scan.
|
Scan |
setStopRow(byte[] stopRow)
Set the stop row.
|
Scan |
setTimeRange(long minStamp,
long maxStamp)
Get versions of columns only within the specified timestamp range,
[minStamp, maxStamp).
|
Scan |
setTimeStamp(long timestamp)
Get versions of columns with the specified timestamp.
|
Map<String,Object> |
toMap(int maxCols)
Compile the details beyond the scope of getFingerprint (row, columns,
timestamps, etc.) into a Map along with the fingerprinted information.
|
getACL, getAuthorizations, getConsistency, getIsolationLevel, getReplicaId
getAttribute, getAttributeSize, getAttributesMap, getId
@Deprecated public static final String SCAN_ATTRIBUTES_METRICS_ENABLE
setScanMetricsEnabled(boolean)
@Deprecated public static final String SCAN_ATTRIBUTES_METRICS_DATA
getScanMetrics()
public static final String SCAN_ATTRIBUTES_TABLE_NAME
@Deprecated public static final String HINT_LOOKAHEAD
public Scan()
public Scan(byte[] startRow, Filter filter)
public Scan(byte[] startRow)
If the specified row does not exist, the Scanner will start from the next closest row after the specified row.
startRow
- row to start scanner at or afterpublic Scan(byte[] startRow, byte[] stopRow)
startRow
- row to start scanner at or after (inclusive)stopRow
- row to stop scanner before (exclusive)public Scan(Scan scan) throws IOException
scan
- The scan instance to copy from.IOException
- When copying the values fails.public Scan(Get get)
get
- get to model scan afterpublic boolean isGetScan()
public Scan addFamily(byte[] family)
Overrides previous calls to addColumn for this family.
family
- family namepublic Scan addColumn(byte[] family, byte[] qualifier)
Overrides previous calls to addFamily for this family.
family
- family namequalifier
- column qualifierpublic Scan setTimeRange(long minStamp, long maxStamp) throws IOException
minStamp
- minimum timestamp value, inclusivemaxStamp
- maximum timestamp value, exclusiveIOException
- if invalid time rangesetMaxVersions()
,
setMaxVersions(int)
public Scan setTimeStamp(long timestamp) throws IOException
timestamp
- version timestampIOException
setMaxVersions()
,
setMaxVersions(int)
public Scan setStartRow(byte[] startRow)
startRow
- row to start scan on (inclusive)
Note: In order to make startRow exclusive add a trailing 0 bytepublic Scan setStopRow(byte[] stopRow)
stopRow
- row to end at (exclusive)
Note: In order to make stopRow inclusive add a trailing 0 byte
Note: When doing a filter for a rowKey Prefix
use setRowPrefixFilter(byte[])
.
The 'trailing 0' will not yield the desired result.
public Scan setRowPrefixFilter(byte[] rowPrefix)
Set a filter (using stopRow and startRow) so the result set only contains rows where the rowKey starts with the specified prefix.
This is a utility method that converts the desired rowPrefix into the appropriate values for the startRow and stopRow to achieve the desired result.
This can safely be used in combination with setFilter.
NOTE: Doing a setStartRow(byte[])
and/or setStopRow(byte[])
after this method will yield undefined results.
rowPrefix
- the prefix all rows must start with. (Set null to remove the filter.)public Scan setMaxVersions()
public Scan setMaxVersions(int maxVersions)
maxVersions
- maximum versions for each columnpublic Scan setBatch(int batch)
batch
- the maximum number of valuespublic Scan setMaxResultsPerColumnFamily(int limit)
limit
- the maximum number of values returned / row / CFpublic Scan setRowOffsetPerColumnFamily(int offset)
offset
- is the number of kvs that will be skipped.public Scan setCaching(int caching)
HConstants.HBASE_CLIENT_SCANNER_CACHING
will
apply.
Higher caching values will enable faster scanners but will use more memory.caching
- the number of rows for cachingpublic long getMaxResultSize()
setMaxResultSize(long)
public Scan setMaxResultSize(long maxResultSize)
maxResultSize
- The maximum result size in bytes.public Scan setFilter(Filter filter)
Query
Filter.filterKeyValue(Cell)
is called AFTER all tests
for ttl, column match, deletes and max versions have been run.public Scan setFamilyMap(Map<byte[],NavigableSet<byte[]>> familyMap)
familyMap
- map of family to qualifierpublic Map<byte[],NavigableSet<byte[]>> getFamilyMap()
public int numFamilies()
public boolean hasFamilies()
public byte[][] getFamilies()
public byte[] getStartRow()
public byte[] getStopRow()
public int getMaxVersions()
public int getBatch()
public int getMaxResultsPerColumnFamily()
public int getRowOffsetPerColumnFamily()
public int getCaching()
public TimeRange getTimeRange()
public boolean hasFilter()
public Scan setCacheBlocks(boolean cacheBlocks)
This is true by default. When true, default settings of the table and family are used (this will never override caching blocks if the block cache is disabled for that family or entirely).
cacheBlocks
- if false, default settings are overridden and blocks
will not be cachedpublic boolean getCacheBlocks()
public Scan setReversed(boolean reversed)
This is false by default which means forward(normal) scan.
reversed
- if true, scan will be backward orderpublic boolean isReversed()
public Scan setAllowPartialResults(boolean allowPartialResults)
allowPartialResults
- public boolean getAllowPartialResults()
ResultScanner.next()
public Scan setLoadColumnFamiliesOnDemand(boolean value)
public Boolean getLoadColumnFamiliesOnDemandValue()
public boolean doLoadColumnFamiliesOnDemand()
public Map<String,Object> getFingerprint()
getFingerprint
in class Operation
public Map<String,Object> toMap(int maxCols)
public Scan setRaw(boolean raw)
raw
- True/False to enable/disable "raw" mode.public boolean isRaw()
public Scan setSmall(boolean small)
Small scan should use pread and big scan can use seek + read seek + read is fast but can cause two problem (1) resource contention (2) cause too much network io [89-fb] Using pread for non-compaction read request https://issues.apache.org/jira/browse/HBASE-7266 On the other hand, if setting it true, we would do openScanner,next,closeScanner in one RPC call. It means the better performance for small scan. [HBASE-9488]. Generally, if the scan range is within one data block(64KB), it could be considered as a small scan.
small
- public boolean isSmall()
public Scan setAttribute(String name, byte[] value)
Attributes
setAttribute
in interface Attributes
setAttribute
in class OperationWithAttributes
name
- attribute namevalue
- attribute valuepublic Scan setId(String id)
OperationWithAttributes
setId
in class OperationWithAttributes
id
- id to set for the scanpublic Scan setAuthorizations(Authorizations authorizations)
Query
setAuthorizations
in class Query
public Scan setACL(Map<String,Permission> perms)
public Scan setACL(String user, Permission perms)
public Scan setConsistency(Consistency consistency)
Query
setConsistency
in class Query
consistency
- the consistency levelpublic Scan setReplicaId(int Id)
Query
Query.setConsistency(Consistency)
passing Consistency.TIMELINE
to read data from
a specific replicaId.
setReplicaId
in class Query
public Scan setIsolationLevel(IsolationLevel level)
Query
setIsolationLevel
in class Query
level
- IsolationLevel for this querypublic Scan setScanMetricsEnabled(boolean enabled)
ScanMetrics
. For advanced users.enabled
- Set to true to enable accumulating scan metricspublic boolean isScanMetricsEnabled()
public ScanMetrics getScanMetrics()
setScanMetricsEnabled(boolean)