org.apache.hadoop.hive.ql.exec
Class SkewJoinHandler

java.lang.Object
  extended by org.apache.hadoop.hive.ql.exec.SkewJoinHandler

public class SkewJoinHandler
extends Object

At runtime in Join, we output big keys in one table into one corresponding directories, and all same keys in other tables into different dirs(one for each table). The directories will look like:

For each skew key, we first write all values to a local tmp file. At the time of ending the current group, the local tmp file will be uploaded to hdfs. Right now, we use one file per skew key.

For more info, please see https://issues.apache.org/jira/browse/HIVE-964.


Field Summary
 int currBigKeyTag
           
 
Constructor Summary
SkewJoinHandler(CommonJoinOperator<? extends OperatorDesc> joinOp)
           
 
Method Summary
 void close(boolean abort)
           
 void handleSkew(int tag)
           
 void initiliaze(org.apache.hadoop.conf.Configuration hconf)
           
 void setSkewJoinJobCounter(org.apache.hadoop.io.LongWritable skewjoinFollowupJobs)
           
 void updateSkewJoinJobCounter(int tag)
           
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

currBigKeyTag

public int currBigKeyTag
Constructor Detail

SkewJoinHandler

public SkewJoinHandler(CommonJoinOperator<? extends OperatorDesc> joinOp)
Method Detail

initiliaze

public void initiliaze(org.apache.hadoop.conf.Configuration hconf)

handleSkew

public void handleSkew(int tag)
                throws HiveException
Throws:
HiveException

close

public void close(boolean abort)
           throws HiveException
Throws:
HiveException

setSkewJoinJobCounter

public void setSkewJoinJobCounter(org.apache.hadoop.io.LongWritable skewjoinFollowupJobs)

updateSkewJoinJobCounter

public void updateSkewJoinJobCounter(int tag)


Copyright © 2014 The Apache Software Foundation. All rights reserved.