org.apache.hadoop.hive.ql.io
Class RCFile.Reader

java.lang.Object
  extended by org.apache.hadoop.hive.ql.io.RCFile.Reader
Enclosing class:
RCFile

public static class RCFile.Reader
extends Object

Read KeyBuffer/ValueBuffer pairs from a RCFile.


Constructor Summary
RCFile.Reader(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path file, org.apache.hadoop.conf.Configuration conf)
          Create a new RCFile reader.
RCFile.Reader(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path file, int bufferSize, org.apache.hadoop.conf.Configuration conf, long start, long length)
          Create a new RCFile reader.
 
Method Summary
 void close()
          Close the reader.
 BytesRefArrayWritable getColumn(int columnID, BytesRefArrayWritable rest)
          Fetch all data in the buffer for a given column.
 org.apache.hadoop.io.compress.CompressionCodec getCompressionCodec()
           
 int getCurrentBlockLength()
           
 int getCurrentCompressedKeyLen()
           
 RCFile.KeyBuffer getCurrentKeyBufferObj()
          return the KeyBuffer object used in the reader.
 int getCurrentKeyLength()
           
 void getCurrentRow(BytesRefArrayWritable ret)
          get the current row used,make sure called next(LongWritable) first.
 RCFile.ValueBuffer getCurrentValueBufferObj()
          return the ValueBuffer object used in the reader.
 org.apache.hadoop.io.SequenceFile.Metadata getMetadata()
          Return the metadata (Text to Text map) that was written into the file.
 org.apache.hadoop.io.Text getMetadataValueOf(org.apache.hadoop.io.Text key)
          Return the metadata value associated with the given key.
 long getPosition()
          Return the current byte position in the input file.
 boolean hasRecordsInBuffer()
           
 boolean isCompressedRCFile()
           
 long lastSeenSyncPos()
          Returns the last seen sync position.
 boolean next(org.apache.hadoop.io.LongWritable readRows)
          Returns how many rows we fetched with next().
 boolean nextBlock()
           
 boolean nextColumnsBatch()
          Deprecated. 
 void resetBuffer()
          Resets the values which determine if there are more rows in the buffer This can be used after one calls seek or sync, if one called next before that.
 void seek(long position)
          Set the current byte position in the input file.
 void sync(long position)
          Seek to the next sync mark past a given position.
 boolean syncSeen()
          Returns true iff the previous call to next passed a sync mark.
 String toString()
          Returns the name of the file.
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Constructor Detail

RCFile.Reader

public RCFile.Reader(org.apache.hadoop.fs.FileSystem fs,
                     org.apache.hadoop.fs.Path file,
                     org.apache.hadoop.conf.Configuration conf)
              throws IOException
Create a new RCFile reader.

Throws:
IOException

RCFile.Reader

public RCFile.Reader(org.apache.hadoop.fs.FileSystem fs,
                     org.apache.hadoop.fs.Path file,
                     int bufferSize,
                     org.apache.hadoop.conf.Configuration conf,
                     long start,
                     long length)
              throws IOException
Create a new RCFile reader.

Throws:
IOException
Method Detail

getMetadata

public org.apache.hadoop.io.SequenceFile.Metadata getMetadata()
Return the metadata (Text to Text map) that was written into the file.


getMetadataValueOf

public org.apache.hadoop.io.Text getMetadataValueOf(org.apache.hadoop.io.Text key)
Return the metadata value associated with the given key.

Parameters:
key - the metadata key to retrieve

getPosition

public long getPosition()
                 throws IOException
Return the current byte position in the input file.

Throws:
IOException

seek

public void seek(long position)
          throws IOException
Set the current byte position in the input file.

The position passed must be a position returned by RCFile.Writer.getLength() when writing this file. To seek to an arbitrary position, use sync(long). In another words, the current seek can only seek to the end of the file. For other positions, use sync(long).

Throws:
IOException

resetBuffer

public void resetBuffer()
Resets the values which determine if there are more rows in the buffer This can be used after one calls seek or sync, if one called next before that. Otherwise, the seek or sync will have no effect, it will continue to get rows from the buffer built up from the call to next.


sync

public void sync(long position)
          throws IOException
Seek to the next sync mark past a given position.

Throws:
IOException

nextBlock

public boolean nextBlock()
                  throws IOException
Throws:
IOException

getColumn

public BytesRefArrayWritable getColumn(int columnID,
                                       BytesRefArrayWritable rest)
                                throws IOException
Fetch all data in the buffer for a given column. This is useful for columnar operators, which perform operations on an array data of one column. It should be used together with nextColumnsBatch(). Calling getColumn() with not change the result of next(LongWritable) and getCurrentRow(BytesRefArrayWritable).

Parameters:
columnID - the number of the column to get 0 to N-1
Throws:
IOException

nextColumnsBatch

@Deprecated
public boolean nextColumnsBatch()
                         throws IOException
Deprecated. 

Read in next key buffer and throw any data in current key buffer and current value buffer. It will influence the result of next(LongWritable) and getCurrentRow(BytesRefArrayWritable)

Returns:
whether there still has records or not
Throws:
IOException

next

public boolean next(org.apache.hadoop.io.LongWritable readRows)
             throws IOException
Returns how many rows we fetched with next(). It only means how many rows are read by next(). The returned result may be smaller than actual number of rows passed by, because seek(long), nextColumnsBatch() can change the underlying key buffer and value buffer.

Returns:
next row number
Throws:
IOException

hasRecordsInBuffer

public boolean hasRecordsInBuffer()

getCurrentRow

public void getCurrentRow(BytesRefArrayWritable ret)
                   throws IOException
get the current row used,make sure called next(LongWritable) first.

Throws:
IOException

syncSeen

public boolean syncSeen()
Returns true iff the previous call to next passed a sync mark.


lastSeenSyncPos

public long lastSeenSyncPos()
Returns the last seen sync position.


toString

public String toString()
Returns the name of the file.

Overrides:
toString in class Object

isCompressedRCFile

public boolean isCompressedRCFile()

close

public void close()
Close the reader.


getCurrentKeyBufferObj

public RCFile.KeyBuffer getCurrentKeyBufferObj()
return the KeyBuffer object used in the reader. Internally in each reader, there is only one KeyBuffer object, which gets reused for every block.


getCurrentValueBufferObj

public RCFile.ValueBuffer getCurrentValueBufferObj()
return the ValueBuffer object used in the reader. Internally in each reader, there is only one ValueBuffer object, which gets reused for every block.


getCurrentBlockLength

public int getCurrentBlockLength()

getCurrentKeyLength

public int getCurrentKeyLength()

getCurrentCompressedKeyLen

public int getCurrentCompressedKeyLen()

getCompressionCodec

public org.apache.hadoop.io.compress.CompressionCodec getCompressionCodec()


Copyright © 2014 The Apache Software Foundation. All rights reserved.