org.apache.hadoop.hive.ql.exec.persistence
Class PTFRowContainer<Row extends List<Object>>
java.lang.Object
org.apache.hadoop.hive.ql.exec.persistence.RowContainer<Row>
org.apache.hadoop.hive.ql.exec.persistence.PTFRowContainer<Row>
- All Implemented Interfaces:
- AbstractRowContainer<Row>, AbstractRowContainer.RowIterator<Row>
public class PTFRowContainer<Row extends List<Object>>
- extends RowContainer<Row>
Extends the RowContainer functionality to provide random access getAt(i)
.
It extends RowContainer behavior in the following ways:
- You must continue to call first to signal the transition from writing to the
Container to reading from it.
- As rows are being added, positions at which a spill occurs is captured as a
BlockInfo object. At this point it captures the offset in the File at which the current
Block will be written.
- When first is called: we associate with each BlockInfo the File Split that it
occurs in.
- So in order to read a random row from the Container we do the following:
- Convert the row index into a block number. This is easy because all blocks are
the same size, given by the
blockSize
- The corresponding BlockInfo tells us the Split that this block starts in. Also
by looking at the next Block in the BlockInfos list, we know which Split this block ends in.
- So we arrange to read all the Splits that contain rows for this block. For the first
Split we seek to the startOffset that we captured in BlockInfo.
- So after reading the Splits, all rows in this block are in the 'currentReadBlock'
- We track the span of the currentReadBlock, using
currentReadBlockStartRow,blockSize
. So if a row is requested in this span,
we don't need to read rows from disk.
- If the requested row is in the 'last' block; we point the currentReadBlock to
the currentWriteBlock; the same as what RowContainer does.
- the
getAt
leaves the Container in the same state as a
next
call; so a getAt and next calls can be interspersed.
Constructor Summary |
PTFRowContainer(int bs,
org.apache.hadoop.conf.Configuration jc,
org.apache.hadoop.mapred.Reporter reporter)
|
PTFRowContainer
public PTFRowContainer(int bs,
org.apache.hadoop.conf.Configuration jc,
org.apache.hadoop.mapred.Reporter reporter)
throws HiveException
- Throws:
HiveException
addRow
public void addRow(Row t)
throws HiveException
- Description copied from interface:
AbstractRowContainer
- add a row into the RowContainer
- Specified by:
addRow
in interface AbstractRowContainer<Row extends List<Object>>
- Overrides:
addRow
in class RowContainer<Row extends List<Object>>
- Parameters:
t
- row
- Throws:
HiveException
first
public Row first()
throws HiveException
- Specified by:
first
in interface AbstractRowContainer.RowIterator<Row extends List<Object>>
- Overrides:
first
in class RowContainer<Row extends List<Object>>
- Throws:
HiveException
next
public Row next()
throws HiveException
- Specified by:
next
in interface AbstractRowContainer.RowIterator<Row extends List<Object>>
- Overrides:
next
in class RowContainer<Row extends List<Object>>
- Throws:
HiveException
clearRows
public void clearRows()
throws HiveException
- Description copied from class:
RowContainer
- Remove all elements in the RowContainer.
- Specified by:
clearRows
in interface AbstractRowContainer<Row extends List<Object>>
- Overrides:
clearRows
in class RowContainer<Row extends List<Object>>
- Throws:
HiveException
close
public void close()
throws HiveException
- Throws:
HiveException
getAt
public Row getAt(int rowIdx)
throws HiveException
- Throws:
HiveException
createTableDesc
public static TableDesc createTableDesc(StructObjectInspector oI)
Copyright © 2014 The Apache Software Foundation. All rights reserved.