org.apache.hadoop.hive.ql.exec.vector
Class BytesColumnVector

java.lang.Object
  extended by org.apache.hadoop.hive.ql.exec.vector.ColumnVector
      extended by org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector

public class BytesColumnVector
extends ColumnVector

This class supports string and binary data by value reference -- i.e. each field is explicitly present, as opposed to provided by a dictionary reference. In some cases, all the values will be in the same byte array to begin with, but this need not be the case. If each value is in a separate byte array to start with, or not all of the values are in the same original byte array, you can still assign data by reference into this column vector. This gives flexibility to use this in multiple situations.

When setting data by reference, the caller is responsible for allocating the byte arrays used to hold the data. You can also set data by value, as long as you call the initBuffer() method first. You can mix "by value" and "by reference" in the same column vector, though that use is probably not typical.


Field Summary
 int[] length
           
 int[] start
           
 byte[][] vector
           
 
Fields inherited from class org.apache.hadoop.hive.ql.exec.vector.ColumnVector
isNull, isRepeating, noNulls
 
Constructor Summary
BytesColumnVector()
          Use this constructor for normal operation.
BytesColumnVector(int size)
          Don't call this constructor except for testing purposes.
 
Method Summary
 int bufferSize()
           
 void copySelected(boolean selectedInUse, int[] sel, int size, BytesColumnVector output)
          Copy the current object contents into the output.
 void fill(byte[] value)
           
 void flatten(boolean selectedInUse, int[] sel, int size)
          Simplify vector by brute-force flattening noNulls and isRepeating This can be used to reduce combinatorial explosion of code paths in VectorExpressions with many arguments, at the expense of loss of some performance.
 org.apache.hadoop.io.Writable getWritableObject(int index)
           
 void increaseBufferSpace(int nextElemLength)
          Increase buffer space enough to accommodate next element.
 void init()
          Initialize the column vector.
 void initBuffer()
          Initialize buffer to default size.
 void initBuffer(int estimatedValueSize)
          You must call initBuffer first before using setVal().
 void setConcat(int elementNum, byte[] leftSourceBuf, int leftStart, int leftLen, byte[] rightSourceBuf, int rightStart, int rightLen)
          Set a field to the concatenation of two string values.
 void setElement(int outElementNum, int inputElementNum, ColumnVector inputVector)
          Set the element in this column vector from the given input vector.
 void setRef(int elementNum, byte[] sourceBuf, int start, int length)
          Set a field by reference.
 void setVal(int elementNum, byte[] sourceBuf, int start, int length)
          Set a field by actually copying in to a local buffer.
 
Methods inherited from class org.apache.hadoop.hive.ql.exec.vector.ColumnVector
flattenNoNulls, flattenRepeatingNulls, reset, unFlatten
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

vector

public byte[][] vector

start

public int[] start

length

public int[] length
Constructor Detail

BytesColumnVector

public BytesColumnVector()
Use this constructor for normal operation. All column vectors should be the default size normally.


BytesColumnVector

public BytesColumnVector(int size)
Don't call this constructor except for testing purposes.

Parameters:
size - number of elements in the column vector
Method Detail

setRef

public void setRef(int elementNum,
                   byte[] sourceBuf,
                   int start,
                   int length)
Set a field by reference.

Parameters:
elementNum - index within column vector to set
sourceBuf - container of source data
start - start byte position within source
length - length of source byte sequence

initBuffer

public void initBuffer(int estimatedValueSize)
You must call initBuffer first before using setVal(). Provide the estimated number of bytes needed to hold a full column vector worth of byte string data.

Parameters:
estimatedValueSize - Estimated size of buffer space needed

initBuffer

public void initBuffer()
Initialize buffer to default size.


bufferSize

public int bufferSize()
Returns:
amount of buffer space currently allocated

setVal

public void setVal(int elementNum,
                   byte[] sourceBuf,
                   int start,
                   int length)
Set a field by actually copying in to a local buffer. If you must actually copy data in to the array, use this method. DO NOT USE this method unless it's not practical to set data by reference with setRef(). Setting data by reference tends to run a lot faster than copying data in.

Parameters:
elementNum - index within column vector to set
sourceBuf - container of source data
start - start byte position within source
length - length of source byte sequence

setConcat

public void setConcat(int elementNum,
                      byte[] leftSourceBuf,
                      int leftStart,
                      int leftLen,
                      byte[] rightSourceBuf,
                      int rightStart,
                      int rightLen)
Set a field to the concatenation of two string values. Result data is copied into the internal buffer.

Parameters:
elementNum - index within column vector to set
leftSourceBuf - container of left argument
leftStart - start of left argument
leftLen - length of left argument
rightSourceBuf - container of right argument
rightStart - start of right argument
rightLen - length of right arugment

increaseBufferSpace

public void increaseBufferSpace(int nextElemLength)
Increase buffer space enough to accommodate next element. This uses an exponential increase mechanism to rapidly increase buffer size to enough to hold all data. As batches get re-loaded, buffer space allocated will quickly stabilize.

Parameters:
nextElemLength - size of next element to be added

getWritableObject

public org.apache.hadoop.io.Writable getWritableObject(int index)
Specified by:
getWritableObject in class ColumnVector

copySelected

public void copySelected(boolean selectedInUse,
                         int[] sel,
                         int size,
                         BytesColumnVector output)
Copy the current object contents into the output. Only copy selected entries, as indicated by selectedInUse and the sel array.


flatten

public void flatten(boolean selectedInUse,
                    int[] sel,
                    int size)
Simplify vector by brute-force flattening noNulls and isRepeating This can be used to reduce combinatorial explosion of code paths in VectorExpressions with many arguments, at the expense of loss of some performance.

Specified by:
flatten in class ColumnVector

fill

public void fill(byte[] value)

setElement

public void setElement(int outElementNum,
                       int inputElementNum,
                       ColumnVector inputVector)
Description copied from class: ColumnVector
Set the element in this column vector from the given input vector.

Specified by:
setElement in class ColumnVector

init

public void init()
Description copied from class: ColumnVector
Initialize the column vector. This method can be overridden by specific column vector types. Use this method only if the individual type of the column vector is not known, otherwise its preferable to call specific initialization methods.

Overrides:
init in class ColumnVector


Copyright © 2014 The Apache Software Foundation. All rights reserved.