OrcInputFormat (Hive Query Language 0.13.0.2.1.2.0-402 API)

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.apache.hadoop.hive.ql.io.orc
Class OrcInputFormat

java.lang.Object
  org.apache.hadoop.hive.ql.io.orc.OrcInputFormat

All Implemented Interfaces:: VectorizedInputFormatInterface, AcidInputFormat<OrcStruct>, InputFormatChecker, org.apache.hadoop.mapred.InputFormat<org.apache.hadoop.io.NullWritable,OrcStruct>

public class OrcInputFormat
extends Object
implements org.apache.hadoop.mapred.InputFormat<org.apache.hadoop.io.NullWritable,OrcStruct>, InputFormatChecker, VectorizedInputFormatInterface, AcidInputFormat<OrcStruct>
extends Object
implements org.apache.hadoop.mapred.InputFormat<org.apache.hadoop.io.NullWritable,OrcStruct>, InputFormatChecker, VectorizedInputFormatInterface, AcidInputFormat<OrcStruct>

A MapReduce/Hive input format for ORC files.

This class implements both the classic InputFormat, which stores the rows directly, and AcidInputFormat, which stores a series of events with the following schema:

   class AcidEvent<ROW> {
     enum ACTION {INSERT, UPDATE, DELETE}
     ACTION operation;
     long originalTransaction;
     int bucket;
     long rowId;
     long currentTransaction;
     ROW row;
   }

Each AcidEvent object corresponds to an update event. The originalTransaction, bucket, and rowId are the unique identifier for the row. The operation and currentTransaction are the operation and the transaction that added this event. Insert and update events include the entire row, while delete events have null for row.

Nested Class Summary

Nested classes/interfaces inherited from interface org.apache.hadoop.hive.ql.io.AcidInputFormat
`AcidInputFormat.Options, AcidInputFormat.RawReader<V>, AcidInputFormat.RowReader<V>`

Constructor Summary
`OrcInputFormat()`

Method Summary
`static RecordReader`	`createReaderFromFile(Reader file, org.apache.hadoop.conf.Configuration conf, long offset, long length)`
`AcidInputFormat.RawReader<OrcStruct>`	`getRawReader(org.apache.hadoop.conf.Configuration conf, boolean collapseEvents, int bucket, ValidTxnList validTxnList, org.apache.hadoop.fs.Path baseDirectory, org.apache.hadoop.fs.Path[] deltaDirectory)` Get a reader that returns the raw ACID events (insert, update, delete).
`AcidInputFormat.RowReader<OrcStruct>`	`getReader(org.apache.hadoop.mapred.InputSplit inputSplit, AcidInputFormat.Options options)` Get a record reader that provides the user-facing view of the data after it has been merged together.
`org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.NullWritable,OrcStruct>`	`getRecordReader(org.apache.hadoop.mapred.InputSplit inputSplit, org.apache.hadoop.mapred.JobConf conf, org.apache.hadoop.mapred.Reporter reporter)`
`org.apache.hadoop.mapred.InputSplit[]`	`getSplits(org.apache.hadoop.mapred.JobConf job, int numSplits)`
`boolean`	`validateInput(org.apache.hadoop.fs.FileSystem fs, HiveConf conf, ArrayList<org.apache.hadoop.fs.FileStatus> files)` This method is used to validate the input files.

Methods inherited from class java.lang.Object
`equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`

Constructor Detail

OrcInputFormat

public OrcInputFormat()

Method Detail

createReaderFromFile

public static RecordReader createReaderFromFile(Reader file,
                                                org.apache.hadoop.conf.Configuration conf,
                                                long offset,
                                                long length)
                                         throws IOException

Throws:: IOException

validateInput

public boolean validateInput(org.apache.hadoop.fs.FileSystem fs,
                             HiveConf conf,
                             ArrayList<org.apache.hadoop.fs.FileStatus> files)
                      throws IOException

Description copied from interface: InputFormatChecker

This method is used to validate the input files.

Specified by:: validateInput in interface InputFormatChecker

Throws:: IOException

getSplits

public org.apache.hadoop.mapred.InputSplit[] getSplits(org.apache.hadoop.mapred.JobConf job,
                                                       int numSplits)
                                                throws IOException

Specified by:: getSplits in interface org.apache.hadoop.mapred.InputFormat<org.apache.hadoop.io.NullWritable,OrcStruct>

Throws:: IOException

getRecordReader

public org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.NullWritable,OrcStruct> getRecordReader(org.apache.hadoop.mapred.InputSplit inputSplit,
                                                                                                          org.apache.hadoop.mapred.JobConf conf,
                                                                                                          org.apache.hadoop.mapred.Reporter reporter)
                                                                                                   throws IOException

Specified by:: getRecordReader in interface org.apache.hadoop.mapred.InputFormat<org.apache.hadoop.io.NullWritable,OrcStruct>

Throws:: IOException

getReader

public AcidInputFormat.RowReader<OrcStruct> getReader(org.apache.hadoop.mapred.InputSplit inputSplit,
                                                      AcidInputFormat.Options options)
                                               throws IOException

Description copied from interface: AcidInputFormat

Get a record reader that provides the user-facing view of the data after it has been merged together. The key provides information about the record's identifier (transaction, bucket, record id).

Specified by:: getReader in interface AcidInputFormat<OrcStruct>

Parameters:: inputSplit - the split to read; options - the options to read with
Returns:: a record reader
Throws:: IOException

getRawReader

public AcidInputFormat.RawReader<OrcStruct> getRawReader(org.apache.hadoop.conf.Configuration conf,
                                                         boolean collapseEvents,
                                                         int bucket,
                                                         ValidTxnList validTxnList,
                                                         org.apache.hadoop.fs.Path baseDirectory,
                                                         org.apache.hadoop.fs.Path[] deltaDirectory)
                                                  throws IOException