Reader (Hive Query Language 0.13.0.2.1.2.0-402 API)

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.apache.hadoop.hive.ql.io.orc
Interface Reader

public interface Reader

The interface for reading ORC files. One Reader can support multiple concurrent RecordReader.

Nested Class Summary
`static class`	`Reader.Options` Options for creating a RecordReader.

Method Summary
`CompressionKind`	`getCompression()` Get the compression kind.
`int`	`getCompressionSize()` Get the buffer size for the compression.
`long`	`getContentLength()` Get the length of the file.
`Metadata`	`getMetadata()` Get the metadata information like stripe level column statistics etc.
`List<String>`	`getMetadataKeys()` Get the user metadata keys.
`ByteBuffer`	`getMetadataValue(String key)` Get a user metadata value.
`long`	`getNumberOfRows()` Get the number of rows in the file.
`ObjectInspector`	`getObjectInspector()` Get the object inspector for looking at the objects.
`long`	`getRawDataSize()` Get the deserialized data size of the file
`long`	`getRawDataSizeOfColumns(List<String> colNames)` Get the deserialized data size of the specified columns
`int`	`getRowIndexStride()` Get the number of rows per a entry in the row index.
`ColumnStatistics[]`	`getStatistics()` Get the statistics about the columns in the file.
`List<StripeInformation>`	`getStripes()` Get the list of stripes.
`List<org.apache.hadoop.hive.ql.io.orc.OrcProto.Type>`	`getTypes()` Get the list of types contained in the file.
`boolean`	`hasMetadataValue(String key)` Did the user set the given metadata value.
`RecordReader`	`rows()` Create a RecordReader that reads everything with the default options.
`RecordReader`	`rows(boolean[] include)` Create a RecordReader that will scan the entire file.
`RecordReader`	`rows(long offset, long length, boolean[] include)` Create a RecordReader that will start reading at the first stripe after offset up to the stripe that starts at offset + length.
`RecordReader`	`rows(long offset, long length, boolean[] include, SearchArgument sarg, String[] neededColumns)` Create a RecordReader that will read a section of a file.
`RecordReader`	`rowsOptions(Reader.Options options)` Create a RecordReader that uses the options given.

Method Detail

getNumberOfRows

long getNumberOfRows()

Get the number of rows in the file.

Returns:: the number of rows

getRawDataSize

long getRawDataSize()

Get the deserialized data size of the file

Returns:: raw data size

getRawDataSizeOfColumns

long getRawDataSizeOfColumns(List<String> colNames)

Get the deserialized data size of the specified columns

Parameters:: colNames -
Returns:: raw data size of columns

getMetadataKeys

List<String> getMetadataKeys()

Get the user metadata keys.

Returns:: the set of metadata keys

getMetadataValue

ByteBuffer getMetadataValue(String key)

Get a user metadata value.

Parameters:: key - a key given by the user
Returns:: the bytes associated with the given key

hasMetadataValue

boolean hasMetadataValue(String key)

Did the user set the given metadata value.

Parameters:: key - the key to check
Returns:: true if the metadata value was set

getCompression

CompressionKind getCompression()

Get the compression kind.

Returns:: the kind of compression in the file

getCompressionSize

int getCompressionSize()

Get the buffer size for the compression.

Returns:: number of bytes to buffer for the compression codec.

getRowIndexStride

int getRowIndexStride()

Get the number of rows per a entry in the row index.

Returns:: the number of rows per an entry in the row index or 0 if there is no row index.

getStripes

List<StripeInformation> getStripes()

Get the list of stripes.

Returns:: the information about the stripes in order

getObjectInspector

ObjectInspector getObjectInspector()

Get the object inspector for looking at the objects.

Returns:: an object inspector for each row returned

getContentLength

long getContentLength()

Get the length of the file.

Returns:: the number of bytes in the file

getStatistics

ColumnStatistics[] getStatistics()

Get the statistics about the columns in the file.

Returns:: the information about the column

getMetadata

Metadata getMetadata()
                     throws IOException

Get the metadata information like stripe level column statistics etc.

Returns:: the information about the column
Throws:: IOException

getTypes

List<org.apache.hadoop.hive.ql.io.orc.OrcProto.Type> getTypes()

Get the list of types contained in the file. The root type is the first type in the list.

Returns:: the list of flattened types

rows

RecordReader rows()
                  throws IOException

Create a RecordReader that reads everything with the default options.

Returns:: a new RecordReader
Throws:: IOException

rowsOptions

RecordReader rowsOptions(Reader.Options options)
                         throws IOException

Create a RecordReader that uses the options given. This method can't be named rows, because many callers used rows(null) before the rows() method was introduced.

Parameters:: options - the options to read with
Returns:: a new RecordReader
Throws:: IOException

rows

RecordReader rows(boolean[] include)
                  throws IOException

Create a RecordReader that will scan the entire file. This is a legacy method and rowsOptions is preferred.

Parameters:: include - true for each column that should be included
Returns:: A new RecordReader
Throws:: IOException

rows

RecordReader rows(long offset,
                  long length,
                  boolean[] include)
                  throws IOException

Create a RecordReader that will start reading at the first stripe after offset up to the stripe that starts at offset + length. This is intended to work with MapReduce's FileInputFormat where divisions are picked blindly, but they must cover all of the rows. This is a legacy method and rowsOptions is preferred.

Parameters:: offset - a byte offset in the file; length - a number of bytes in the file; include - true for each column that should be included
Returns:: a new RecordReader that will read the specified rows.
Throws:: IOException

rows

RecordReader rows(long offset,
                  long length,
                  boolean[] include,
                  SearchArgument sarg,
                  String[] neededColumns)
                  throws IOException

Create a RecordReader that will read a section of a file. It starts reading at the first stripe after the offset and continues to the stripe that starts at offset + length. It also accepts a list of columns to read and a search argument. This is a legacy method and rowsOptions is preferred.

Parameters:: offset - the minimum offset of the first stripe to read; length - the distance from offset of the first address to stop reading at; include - true for each column that should be included; sarg - a search argument that limits the rows that should be read.; neededColumns - the names of the included columns
Returns:: the record reader for the rows
Throws:: IOException

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.apache.hadoop.hive.ql.io.orc Interface Reader

getNumberOfRows

getRawDataSize

getRawDataSizeOfColumns

getMetadataKeys

getMetadataValue

hasMetadataValue

getCompression

getCompressionSize

getRowIndexStride

getStripes

getObjectInspector

getContentLength

getStatistics

getMetadata

getTypes

rows

rowsOptions

rows

rows

rows

org.apache.hadoop.hive.ql.io.orc
Interface Reader