org.apache.hadoop.hive.ql.exec
Class TopNHash

java.lang.Object
  extended by org.apache.hadoop.hive.ql.exec.TopNHash

public class TopNHash
extends Object

Stores binary key/value in sorted manner to get top-n key/value TODO: rename to TopNHeap?


Nested Class Summary
static interface TopNHash.BinaryCollector
          For interaction between operator and top-n hash.
 
Field Summary
static int EXCLUDE
           
static int FORWARD
           
static org.apache.commons.logging.Log LOG
           
 
Constructor Summary
TopNHash()
           
 
Method Summary
 void flush()
          Flushes all the rows cached in the heap.
 int getVectorizedBatchResult(int batchIndex)
          Get vectorized batch result for particular index.
 int getVectorizedKeyDistLength(int batchIndex)
          After vectorized batch is processed, can return distribution keys length of a key.
 HiveKey getVectorizedKeyToForward(int batchIndex)
          After vectorized batch is processed, can return the key that caused a particular row to be forwarded.
 void initialize(int topN, float memUsage, boolean isMapGroupBy, TopNHash.BinaryCollector collector)
           
 int startVectorizedBatch(int size)
          Perform basic checks and initialize TopNHash for the new vectorized row batch.
 void storeValue(int index, org.apache.hadoop.io.BytesWritable value, int keyHash, boolean vectorized)
          Stores the value for the key in the heap.
 int tryStoreKey(HiveKey key)
          Try store the non-vectorized key.
 void tryStoreVectorizedKey(HiveKey key, int batchIndex)
          Try to put the key from the current vectorized batch into the heap.
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

LOG

public static org.apache.commons.logging.Log LOG

FORWARD

public static final int FORWARD
See Also:
Constant Field Values

EXCLUDE

public static final int EXCLUDE
See Also:
Constant Field Values
Constructor Detail

TopNHash

public TopNHash()
Method Detail

initialize

public void initialize(int topN,
                       float memUsage,
                       boolean isMapGroupBy,
                       TopNHash.BinaryCollector collector)

tryStoreKey

public int tryStoreKey(HiveKey key)
                throws HiveException,
                       IOException
Try store the non-vectorized key.

Parameters:
key - Serialized key.
Returns:
TopNHash.FORWARD if the row should be forwarded; TopNHash.EXCLUDED if the row should be discarded; any other number if the row is to be stored; the index should be passed to storeValue.
Throws:
HiveException
IOException

startVectorizedBatch

public int startVectorizedBatch(int size)
                         throws IOException,
                                HiveException
Perform basic checks and initialize TopNHash for the new vectorized row batch.

Parameters:
size - batch size
Returns:
TopNHash.FORWARD if all rows should be forwarded w/o trying to call TopN; TopNHash.EXCLUDED if all rows should be discarded w/o trying to call TopN; any other result means the batch has been started.
Throws:
IOException
HiveException

tryStoreVectorizedKey

public void tryStoreVectorizedKey(HiveKey key,
                                  int batchIndex)
                           throws HiveException,
                                  IOException
Try to put the key from the current vectorized batch into the heap.

Parameters:
key - the key.
batchIndex - The index of the key in the vectorized batch (sequential, not .selected).
Throws:
HiveException
IOException

getVectorizedBatchResult

public int getVectorizedBatchResult(int batchIndex)
Get vectorized batch result for particular index.

Parameters:
batchIndex - index of the key in the batch.
Returns:
the result, same as from tryStoreKey(HiveKey)

getVectorizedKeyToForward

public HiveKey getVectorizedKeyToForward(int batchIndex)
After vectorized batch is processed, can return the key that caused a particular row to be forwarded. Because the row could only be marked to forward because it has the same key with some row already in the heap (for GBY), we can use that key from the heap to emit the forwarded row.

Parameters:
batchIndex - index of the key in the batch.
Returns:
The key corresponding to the index.

getVectorizedKeyDistLength

public int getVectorizedKeyDistLength(int batchIndex)
After vectorized batch is processed, can return distribution keys length of a key.

Parameters:
batchIndex - index of the key in the batch.
Returns:
The distribution length corresponding to the key.

storeValue

public void storeValue(int index,
                       org.apache.hadoop.io.BytesWritable value,
                       int keyHash,
                       boolean vectorized)
Stores the value for the key in the heap.

Parameters:
index - The index, either from tryStoreKey or from tryStoreVectorizedKey result.
value - The value to store.
keyHash - The key hash to store.
vectorized - Whether the result is coming from a vectorized batch.

flush

public void flush()
           throws HiveException
Flushes all the rows cached in the heap.

Throws:
HiveException


Copyright © 2014 The Apache Software Foundation. All rights reserved.