org.apache.hadoop.hive.ql.io
Class HiveFileFormatUtils

java.lang.Object
  extended by org.apache.hadoop.hive.ql.io.HiveFileFormatUtils

public final class HiveFileFormatUtils
extends Object

An util class for various Hive file format tasks. registerOutputFormatSubstitute(Class, Class) getOutputFormatSubstitute(Class) are added for backward compatibility. They return the newly added HiveOutputFormat for the older ones.


Method Summary
static boolean checkInputFormat(org.apache.hadoop.fs.FileSystem fs, HiveConf conf, Class<? extends org.apache.hadoop.mapred.InputFormat> inputFormatCls, ArrayList<org.apache.hadoop.fs.FileStatus> files)
          checks if files are in same format as the given input format.
static List<String> doGetAliasesFromPath(Map<String,ArrayList<String>> pathToAliases, org.apache.hadoop.fs.Path dir)
          Get the list of aliases from the opeerator tree that are needed for the path
static List<Operator<? extends OperatorDesc>> doGetWorksFromPath(Map<String,ArrayList<String>> pathToAliases, Map<String,Operator<? extends OperatorDesc>> aliasToWork, org.apache.hadoop.fs.Path dir)
          Get the list of operators from the operator tree that are needed for the path
static FileSinkOperator.RecordWriter getHiveRecordWriter(org.apache.hadoop.mapred.JobConf jc, TableDesc tableInfo, Class<? extends org.apache.hadoop.io.Writable> outputClass, FileSinkDesc conf, org.apache.hadoop.fs.Path outPath, org.apache.hadoop.mapred.Reporter reporter)
           
static Class<? extends InputFormatChecker> getInputFormatChecker(Class<?> inputFormat)
          get an InputFormatChecker for a file format.
static org.apache.hadoop.fs.Path getOutputFormatFinalPath(org.apache.hadoop.fs.Path parent, String taskId, org.apache.hadoop.mapred.JobConf jc, HiveOutputFormat<?,?> hiveOutputFormat, boolean isCompressed, org.apache.hadoop.fs.Path defaultFinalPath)
          Deprecated.  
static Class<? extends HiveOutputFormat> getOutputFormatSubstitute(Class<?> origin, boolean storagehandlerflag)
          get a OutputFormat's substitute HiveOutputFormat.
static PartitionDesc getPartitionDescFromPathRecursively(Map<String,PartitionDesc> pathToPartitionInfo, org.apache.hadoop.fs.Path dir, Map<Map<String,PartitionDesc>,Map<String,PartitionDesc>> cacheMap)
           
static PartitionDesc getPartitionDescFromPathRecursively(Map<String,PartitionDesc> pathToPartitionInfo, org.apache.hadoop.fs.Path dir, Map<Map<String,PartitionDesc>,Map<String,PartitionDesc>> cacheMap, boolean ignoreSchema)
           
static String getRealOutputFormatClassName()
          get a RealOutputFormatClassName corresponding to the HivePassThroughOutputFormat
static FileSinkOperator.RecordWriter getRecordWriter(org.apache.hadoop.mapred.JobConf jc, HiveOutputFormat<?,?> hiveOutputFormat, Class<? extends org.apache.hadoop.io.Writable> valueClass, boolean isCompressed, Properties tableProp, org.apache.hadoop.fs.Path outPath, org.apache.hadoop.mapred.Reporter reporter)
           
static void registerInputFormatChecker(Class<? extends org.apache.hadoop.mapred.InputFormat> format, Class<? extends InputFormatChecker> checker)
          register an InputFormatChecker for a given InputFormat.
static void registerOutputFormatSubstitute(Class<? extends org.apache.hadoop.mapred.OutputFormat> origin, Class<? extends HiveOutputFormat> substitute)
          register a substitute.
static void setRealOutputFormatClassName(String destination)
          set a RealOutputFormatClassName corresponding to the HivePassThroughOutputFormat
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Method Detail

registerOutputFormatSubstitute

public static void registerOutputFormatSubstitute(Class<? extends org.apache.hadoop.mapred.OutputFormat> origin,
                                                  Class<? extends HiveOutputFormat> substitute)
register a substitute.

Parameters:
origin - the class that need to be substituted
substitute -

getOutputFormatSubstitute

public static Class<? extends HiveOutputFormat> getOutputFormatSubstitute(Class<?> origin,
                                                                          boolean storagehandlerflag)
get a OutputFormat's substitute HiveOutputFormat.


getRealOutputFormatClassName

public static String getRealOutputFormatClassName()
get a RealOutputFormatClassName corresponding to the HivePassThroughOutputFormat


setRealOutputFormatClassName

public static void setRealOutputFormatClassName(String destination)
set a RealOutputFormatClassName corresponding to the HivePassThroughOutputFormat


getOutputFormatFinalPath

@Deprecated
public static org.apache.hadoop.fs.Path getOutputFormatFinalPath(org.apache.hadoop.fs.Path parent,
                                                                            String taskId,
                                                                            org.apache.hadoop.mapred.JobConf jc,
                                                                            HiveOutputFormat<?,?> hiveOutputFormat,
                                                                            boolean isCompressed,
                                                                            org.apache.hadoop.fs.Path defaultFinalPath)
                                                          throws IOException
Deprecated. 

get the final output path of a given FileOutputFormat.

Parameters:
parent - parent dir of the expected final output path
jc - job configuration
Throws:
IOException

registerInputFormatChecker

public static void registerInputFormatChecker(Class<? extends org.apache.hadoop.mapred.InputFormat> format,
                                              Class<? extends InputFormatChecker> checker)
register an InputFormatChecker for a given InputFormat.

Parameters:
format - the class that need to be substituted
checker -

getInputFormatChecker

public static Class<? extends InputFormatChecker> getInputFormatChecker(Class<?> inputFormat)
get an InputFormatChecker for a file format.


checkInputFormat

public static boolean checkInputFormat(org.apache.hadoop.fs.FileSystem fs,
                                       HiveConf conf,
                                       Class<? extends org.apache.hadoop.mapred.InputFormat> inputFormatCls,
                                       ArrayList<org.apache.hadoop.fs.FileStatus> files)
                                throws HiveException
checks if files are in same format as the given input format.

Throws:
HiveException

getHiveRecordWriter

public static FileSinkOperator.RecordWriter getHiveRecordWriter(org.apache.hadoop.mapred.JobConf jc,
                                                                TableDesc tableInfo,
                                                                Class<? extends org.apache.hadoop.io.Writable> outputClass,
                                                                FileSinkDesc conf,
                                                                org.apache.hadoop.fs.Path outPath,
                                                                org.apache.hadoop.mapred.Reporter reporter)
                                                         throws HiveException
Throws:
HiveException

getRecordWriter

public static FileSinkOperator.RecordWriter getRecordWriter(org.apache.hadoop.mapred.JobConf jc,
                                                            HiveOutputFormat<?,?> hiveOutputFormat,
                                                            Class<? extends org.apache.hadoop.io.Writable> valueClass,
                                                            boolean isCompressed,
                                                            Properties tableProp,
                                                            org.apache.hadoop.fs.Path outPath,
                                                            org.apache.hadoop.mapred.Reporter reporter)
                                                     throws IOException,
                                                            HiveException
Throws:
IOException
HiveException

getPartitionDescFromPathRecursively

public static PartitionDesc getPartitionDescFromPathRecursively(Map<String,PartitionDesc> pathToPartitionInfo,
                                                                org.apache.hadoop.fs.Path dir,
                                                                Map<Map<String,PartitionDesc>,Map<String,PartitionDesc>> cacheMap)
                                                         throws IOException
Throws:
IOException

getPartitionDescFromPathRecursively

public static PartitionDesc getPartitionDescFromPathRecursively(Map<String,PartitionDesc> pathToPartitionInfo,
                                                                org.apache.hadoop.fs.Path dir,
                                                                Map<Map<String,PartitionDesc>,Map<String,PartitionDesc>> cacheMap,
                                                                boolean ignoreSchema)
                                                         throws IOException
Throws:
IOException

doGetWorksFromPath

public static List<Operator<? extends OperatorDesc>> doGetWorksFromPath(Map<String,ArrayList<String>> pathToAliases,
                                                                        Map<String,Operator<? extends OperatorDesc>> aliasToWork,
                                                                        org.apache.hadoop.fs.Path dir)
Get the list of operators from the operator tree that are needed for the path

Parameters:
pathToAliases - mapping from path to aliases
aliasToWork - The operator tree to be invoked for a given alias
dir - The path to look for

doGetAliasesFromPath

public static List<String> doGetAliasesFromPath(Map<String,ArrayList<String>> pathToAliases,
                                                org.apache.hadoop.fs.Path dir)
Get the list of aliases from the opeerator tree that are needed for the path

Parameters:
pathToAliases - mapping from path to aliases
dir - The path to look for


Copyright © 2014 The Apache Software Foundation. All rights reserved.