org.apache.hadoop.hive.ql.io.orc
Class OrcInputFormat
java.lang.Object
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
- All Implemented Interfaces:
- VectorizedInputFormatInterface, AcidInputFormat<OrcStruct>, InputFormatChecker, org.apache.hadoop.mapred.InputFormat<org.apache.hadoop.io.NullWritable,OrcStruct>
public class OrcInputFormat
- extends Object
- implements org.apache.hadoop.mapred.InputFormat<org.apache.hadoop.io.NullWritable,OrcStruct>, InputFormatChecker, VectorizedInputFormatInterface, AcidInputFormat<OrcStruct>
A MapReduce/Hive input format for ORC files.
This class implements both the classic InputFormat, which stores the rows
directly, and AcidInputFormat, which stores a series of events with the
following schema:
class AcidEvent<ROW> {
enum ACTION {INSERT, UPDATE, DELETE}
ACTION operation;
long originalTransaction;
int bucket;
long rowId;
long currentTransaction;
ROW row;
}
Each AcidEvent object corresponds to an update event. The
originalTransaction, bucket, and rowId are the unique identifier for the row.
The operation and currentTransaction are the operation and the transaction
that added this event. Insert and update events include the entire row, while
delete events have null for row.
Method Summary |
static RecordReader |
createReaderFromFile(Reader file,
org.apache.hadoop.conf.Configuration conf,
long offset,
long length)
|
AcidInputFormat.RawReader<OrcStruct> |
getRawReader(org.apache.hadoop.conf.Configuration conf,
boolean collapseEvents,
int bucket,
ValidTxnList validTxnList,
org.apache.hadoop.fs.Path baseDirectory,
org.apache.hadoop.fs.Path[] deltaDirectory)
Get a reader that returns the raw ACID events (insert, update, delete). |
AcidInputFormat.RowReader<OrcStruct> |
getReader(org.apache.hadoop.mapred.InputSplit inputSplit,
AcidInputFormat.Options options)
Get a record reader that provides the user-facing view of the data after
it has been merged together. |
org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.NullWritable,OrcStruct> |
getRecordReader(org.apache.hadoop.mapred.InputSplit inputSplit,
org.apache.hadoop.mapred.JobConf conf,
org.apache.hadoop.mapred.Reporter reporter)
|
org.apache.hadoop.mapred.InputSplit[] |
getSplits(org.apache.hadoop.mapred.JobConf job,
int numSplits)
|
boolean |
validateInput(org.apache.hadoop.fs.FileSystem fs,
HiveConf conf,
ArrayList<org.apache.hadoop.fs.FileStatus> files)
This method is used to validate the input files. |
OrcInputFormat
public OrcInputFormat()
createReaderFromFile
public static RecordReader createReaderFromFile(Reader file,
org.apache.hadoop.conf.Configuration conf,
long offset,
long length)
throws IOException
- Throws:
IOException
validateInput
public boolean validateInput(org.apache.hadoop.fs.FileSystem fs,
HiveConf conf,
ArrayList<org.apache.hadoop.fs.FileStatus> files)
throws IOException
- Description copied from interface:
InputFormatChecker
- This method is used to validate the input files.
- Specified by:
validateInput
in interface InputFormatChecker
- Throws:
IOException
getSplits
public org.apache.hadoop.mapred.InputSplit[] getSplits(org.apache.hadoop.mapred.JobConf job,
int numSplits)
throws IOException
- Specified by:
getSplits
in interface org.apache.hadoop.mapred.InputFormat<org.apache.hadoop.io.NullWritable,OrcStruct>
- Throws:
IOException
getRecordReader
public org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.NullWritable,OrcStruct> getRecordReader(org.apache.hadoop.mapred.InputSplit inputSplit,
org.apache.hadoop.mapred.JobConf conf,
org.apache.hadoop.mapred.Reporter reporter)
throws IOException
- Specified by:
getRecordReader
in interface org.apache.hadoop.mapred.InputFormat<org.apache.hadoop.io.NullWritable,OrcStruct>
- Throws:
IOException
getReader
public AcidInputFormat.RowReader<OrcStruct> getReader(org.apache.hadoop.mapred.InputSplit inputSplit,
AcidInputFormat.Options options)
throws IOException
- Description copied from interface:
AcidInputFormat
- Get a record reader that provides the user-facing view of the data after
it has been merged together. The key provides information about the
record's identifier (transaction, bucket, record id).
- Specified by:
getReader
in interface AcidInputFormat<OrcStruct>
- Parameters:
inputSplit
- the split to readoptions
- the options to read with
- Returns:
- a record reader
- Throws:
IOException
getRawReader
public AcidInputFormat.RawReader<OrcStruct> getRawReader(org.apache.hadoop.conf.Configuration conf,
boolean collapseEvents,
int bucket,
ValidTxnList validTxnList,
org.apache.hadoop.fs.Path baseDirectory,
org.apache.hadoop.fs.Path[] deltaDirectory)
throws IOException
- Description copied from interface:
AcidInputFormat
- Get a reader that returns the raw ACID events (insert, update, delete).
Should only be used by the compactor.
- Specified by:
getRawReader
in interface AcidInputFormat<OrcStruct>
- Parameters:
conf
- the configurationcollapseEvents
- should the ACID events be collapsed so that only
the last version of the row is kept.bucket
- the bucket to readvalidTxnList
- the list of valid transactions to usebaseDirectory
- the base directory to read or the root directory for
old style filesdeltaDirectory
- a list of delta files to include in the merge
- Returns:
- a record reader
- Throws:
IOException
Copyright © 2014 The Apache Software Foundation. All rights reserved.