org.apache.hadoop.hive.ql.io.orc
Interface Reader


public interface Reader

The interface for reading ORC files. One Reader can support multiple concurrent RecordReader.


Nested Class Summary
static class Reader.Options
          Options for creating a RecordReader.
 
Method Summary
 CompressionKind getCompression()
          Get the compression kind.
 int getCompressionSize()
          Get the buffer size for the compression.
 long getContentLength()
          Get the length of the file.
 Metadata getMetadata()
          Get the metadata information like stripe level column statistics etc.
 List<String> getMetadataKeys()
          Get the user metadata keys.
 ByteBuffer getMetadataValue(String key)
          Get a user metadata value.
 long getNumberOfRows()
          Get the number of rows in the file.
 ObjectInspector getObjectInspector()
          Get the object inspector for looking at the objects.
 long getRawDataSize()
          Get the deserialized data size of the file
 long getRawDataSizeOfColumns(List<String> colNames)
          Get the deserialized data size of the specified columns
 int getRowIndexStride()
          Get the number of rows per a entry in the row index.
 ColumnStatistics[] getStatistics()
          Get the statistics about the columns in the file.
 List<StripeInformation> getStripes()
          Get the list of stripes.
 List<org.apache.hadoop.hive.ql.io.orc.OrcProto.Type> getTypes()
          Get the list of types contained in the file.
 boolean hasMetadataValue(String key)
          Did the user set the given metadata value.
 RecordReader rows()
          Create a RecordReader that reads everything with the default options.
 RecordReader rows(boolean[] include)
          Create a RecordReader that will scan the entire file.
 RecordReader rows(long offset, long length, boolean[] include)
          Create a RecordReader that will start reading at the first stripe after offset up to the stripe that starts at offset + length.
 RecordReader rows(long offset, long length, boolean[] include, SearchArgument sarg, String[] neededColumns)
          Create a RecordReader that will read a section of a file.
 RecordReader rowsOptions(Reader.Options options)
          Create a RecordReader that uses the options given.
 

Method Detail

getNumberOfRows

long getNumberOfRows()
Get the number of rows in the file.

Returns:
the number of rows

getRawDataSize

long getRawDataSize()
Get the deserialized data size of the file

Returns:
raw data size

getRawDataSizeOfColumns

long getRawDataSizeOfColumns(List<String> colNames)
Get the deserialized data size of the specified columns

Parameters:
colNames -
Returns:
raw data size of columns

getMetadataKeys

List<String> getMetadataKeys()
Get the user metadata keys.

Returns:
the set of metadata keys

getMetadataValue

ByteBuffer getMetadataValue(String key)
Get a user metadata value.

Parameters:
key - a key given by the user
Returns:
the bytes associated with the given key

hasMetadataValue

boolean hasMetadataValue(String key)
Did the user set the given metadata value.

Parameters:
key - the key to check
Returns:
true if the metadata value was set

getCompression

CompressionKind getCompression()
Get the compression kind.

Returns:
the kind of compression in the file

getCompressionSize

int getCompressionSize()
Get the buffer size for the compression.

Returns:
number of bytes to buffer for the compression codec.

getRowIndexStride

int getRowIndexStride()
Get the number of rows per a entry in the row index.

Returns:
the number of rows per an entry in the row index or 0 if there is no row index.

getStripes

List<StripeInformation> getStripes()
Get the list of stripes.

Returns:
the information about the stripes in order

getObjectInspector

ObjectInspector getObjectInspector()
Get the object inspector for looking at the objects.

Returns:
an object inspector for each row returned

getContentLength

long getContentLength()
Get the length of the file.

Returns:
the number of bytes in the file

getStatistics

ColumnStatistics[] getStatistics()
Get the statistics about the columns in the file.

Returns:
the information about the column

getMetadata

Metadata getMetadata()
                     throws IOException
Get the metadata information like stripe level column statistics etc.

Returns:
the information about the column
Throws:
IOException

getTypes

List<org.apache.hadoop.hive.ql.io.orc.OrcProto.Type> getTypes()
Get the list of types contained in the file. The root type is the first type in the list.

Returns:
the list of flattened types

rows

RecordReader rows()
                  throws IOException
Create a RecordReader that reads everything with the default options.

Returns:
a new RecordReader
Throws:
IOException

rowsOptions

RecordReader rowsOptions(Reader.Options options)
                         throws IOException
Create a RecordReader that uses the options given. This method can't be named rows, because many callers used rows(null) before the rows() method was introduced.

Parameters:
options - the options to read with
Returns:
a new RecordReader
Throws:
IOException

rows

RecordReader rows(boolean[] include)
                  throws IOException
Create a RecordReader that will scan the entire file. This is a legacy method and rowsOptions is preferred.

Parameters:
include - true for each column that should be included
Returns:
A new RecordReader
Throws:
IOException

rows

RecordReader rows(long offset,
                  long length,
                  boolean[] include)
                  throws IOException
Create a RecordReader that will start reading at the first stripe after offset up to the stripe that starts at offset + length. This is intended to work with MapReduce's FileInputFormat where divisions are picked blindly, but they must cover all of the rows. This is a legacy method and rowsOptions is preferred.

Parameters:
offset - a byte offset in the file
length - a number of bytes in the file
include - true for each column that should be included
Returns:
a new RecordReader that will read the specified rows.
Throws:
IOException

rows

RecordReader rows(long offset,
                  long length,
                  boolean[] include,
                  SearchArgument sarg,
                  String[] neededColumns)
                  throws IOException
Create a RecordReader that will read a section of a file. It starts reading at the first stripe after the offset and continues to the stripe that starts at offset + length. It also accepts a list of columns to read and a search argument. This is a legacy method and rowsOptions is preferred.

Parameters:
offset - the minimum offset of the first stripe to read
length - the distance from offset of the first address to stop reading at
include - true for each column that should be included
sarg - a search argument that limits the rows that should be read.
neededColumns - the names of the included columns
Returns:
the record reader for the rows
Throws:
IOException


Copyright © 2014 The Apache Software Foundation. All rights reserved.