org.apache.hadoop.hive.ql.optimizer.listbucketingpruner
Class ListBucketingPruner

java.lang.Object
  extended by org.apache.hadoop.hive.ql.optimizer.listbucketingpruner.ListBucketingPruner
All Implemented Interfaces:
Transform

public class ListBucketingPruner
extends Object
implements Transform

The transformation step that does list bucketing pruning.


Nested Class Summary
static class ListBucketingPruner.DynamicMultiDimensionalCollection
          Note: this class is not designed to be used in general but for list bucketing pruner only.
 
Constructor Summary
ListBucketingPruner()
           
 
Method Summary
static org.apache.hadoop.fs.Path[] prune(ParseContext ctx, Partition part, ExprNodeDesc pruner)
          Prunes to the directories which match the skewed keys in where clause.
 ParseContext transform(ParseContext pctx)
          All transformation steps implement this interface.
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

ListBucketingPruner

public ListBucketingPruner()
Method Detail

transform

public ParseContext transform(ParseContext pctx)
                       throws SemanticException
Description copied from interface: Transform
All transformation steps implement this interface.

Specified by:
transform in interface Transform
Parameters:
pctx - input parse context
Returns:
ParseContext
Throws:
SemanticException

prune

public static org.apache.hadoop.fs.Path[] prune(ParseContext ctx,
                                                Partition part,
                                                ExprNodeDesc pruner)
Prunes to the directories which match the skewed keys in where clause. Algorithm ========= For each possible skewed element combination: 1. walk through ExprNode tree 2. decide Boolean (True/False/unknown(null)) Go through each skewed element combination again: 1. if it is skewed value, skip the directory only if it is false, otherwise keep it 2. skip the default directory only if all skewed elements,non-skewed value, are false. Example ======= For example: 1. skewed column (list): C1, C2 2. skewed value (list of list): (1,a), (2,b), (1,c) Unique skewed elements for each skewed column (list of list): (1,2,other), (a,b,c,other) Index: (0,1,2) (0,1,2,3) Output matches order of skewed column. Output can be read as: C1 has unique element list (1,2,other) C2 has unique element list (a,b,c,other) C1\C2 | a | b | c |Other 1 | (1,a) | X | (1,c) |X 2 | X |(2,b) | X |X other | X | X | X |X Complete dynamic-multi-dimension collection (0,0) (1,a) * -> T (0,1) (1,b) -> T (0,2) (1,c) *-> F (0,3) (1,other)-> F (1,0) (2,a)-> F (1,1) (2,b) * -> T (1,2) (2,c)-> F (1,3) (2,other)-> F (2,0) (other,a) -> T (2,1) (other,b) -> T (2,2) (other,c) -> T (2,3) (other,other) -> T * is skewed value entry Expression Tree : ((c1=1) and (c2=a)) or ( (c1=3) or (c2=b)) or / \ and or / \ / \ c1=1 c2=a c1=3 c2=b For each entry in dynamic-multi-dimension container 1. walk through the tree to decide value (please see map's value above) 2. if it is skewed value 2.1 remove the entry from the map 2.2 add directory to path unless value is false 3. otherwise, add value to map Once it is done, go through the rest entries in map to decide default directory 1. we know all is not skewed value 2. we skip default directory only if all value is false What we choose at the end? 1. directory for (1,a) because it 's skewed value and match returns true 2. directory for (2,b) because it 's skewed value and match returns true 3. default directory because not all non-skewed value returns false we skip directory for (1,c) since match returns false Note: unknown is marked in transform(ParseContext)
 newcd = new ExprNodeConstantDesc(cd.getTypeInfo(), null)
 
can be checked via
     child_nd instanceof ExprNodeConstantDesc
               && ((ExprNodeConstantDesc) child_nd).getValue() == null)
 

Parameters:
ctx - parse context
part - partition
pruner - expression node tree
Returns:


Copyright © 2014 The Apache Software Foundation. All rights reserved.