org.apache.hadoop.hive.ql.io
Class SymlinkTextInputFormat

java.lang.Object
  extended by org.apache.hadoop.hive.ql.io.SymbolicInputFormat
      extended by org.apache.hadoop.hive.ql.io.SymlinkTextInputFormat
All Implemented Interfaces:
ContentSummaryInputFormat, ReworkMapredInputFormat, org.apache.hadoop.mapred.InputFormat<org.apache.hadoop.io.LongWritable,org.apache.hadoop.io.Text>, org.apache.hadoop.mapred.JobConfigurable

public class SymlinkTextInputFormat
extends SymbolicInputFormat
implements org.apache.hadoop.mapred.InputFormat<org.apache.hadoop.io.LongWritable,org.apache.hadoop.io.Text>, org.apache.hadoop.mapred.JobConfigurable, ContentSummaryInputFormat, ReworkMapredInputFormat

Symlink file is a text file which contains a list of filename / dirname. This input method reads symlink files from specified job input paths and takes the files / directories specified in those symlink files as actual map-reduce input. The target input data should be in TextInputFormat.


Nested Class Summary
static class SymlinkTextInputFormat.SymlinkTextInputSplit
          This input split wraps the FileSplit generated from TextInputFormat.getSplits(), while setting the original link file path as job input path.
 
Constructor Summary
SymlinkTextInputFormat()
           
 
Method Summary
 void configure(org.apache.hadoop.mapred.JobConf job)
           
 org.apache.hadoop.fs.ContentSummary getContentSummary(org.apache.hadoop.fs.Path p, org.apache.hadoop.mapred.JobConf job)
           
 org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.LongWritable,org.apache.hadoop.io.Text> getRecordReader(org.apache.hadoop.mapred.InputSplit split, org.apache.hadoop.mapred.JobConf job, org.apache.hadoop.mapred.Reporter reporter)
           
 org.apache.hadoop.mapred.InputSplit[] getSplits(org.apache.hadoop.mapred.JobConf job, int numSplits)
          Parses all target paths from job input directory which contains symlink files, and splits the target data using TextInputFormat.
 void validateInput(org.apache.hadoop.mapred.JobConf job)
          For backward compatibility with hadoop 0.17.
 
Methods inherited from class org.apache.hadoop.hive.ql.io.SymbolicInputFormat
rework
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface org.apache.hadoop.hive.ql.io.ReworkMapredInputFormat
rework
 

Constructor Detail

SymlinkTextInputFormat

public SymlinkTextInputFormat()
Method Detail

getRecordReader

public org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.LongWritable,org.apache.hadoop.io.Text> getRecordReader(org.apache.hadoop.mapred.InputSplit split,
                                                                                                                          org.apache.hadoop.mapred.JobConf job,
                                                                                                                          org.apache.hadoop.mapred.Reporter reporter)
                                                                                                                   throws IOException
Specified by:
getRecordReader in interface org.apache.hadoop.mapred.InputFormat<org.apache.hadoop.io.LongWritable,org.apache.hadoop.io.Text>
Throws:
IOException

getSplits

public org.apache.hadoop.mapred.InputSplit[] getSplits(org.apache.hadoop.mapred.JobConf job,
                                                       int numSplits)
                                                throws IOException
Parses all target paths from job input directory which contains symlink files, and splits the target data using TextInputFormat.

Specified by:
getSplits in interface org.apache.hadoop.mapred.InputFormat<org.apache.hadoop.io.LongWritable,org.apache.hadoop.io.Text>
Throws:
IOException

configure

public void configure(org.apache.hadoop.mapred.JobConf job)
Specified by:
configure in interface org.apache.hadoop.mapred.JobConfigurable

validateInput

public void validateInput(org.apache.hadoop.mapred.JobConf job)
                   throws IOException
For backward compatibility with hadoop 0.17.

Throws:
IOException

getContentSummary

public org.apache.hadoop.fs.ContentSummary getContentSummary(org.apache.hadoop.fs.Path p,
                                                             org.apache.hadoop.mapred.JobConf job)
                                                      throws IOException
Specified by:
getContentSummary in interface ContentSummaryInputFormat
Throws:
IOException


Copyright © 2014 The Apache Software Foundation. All rights reserved.