org.apache.hadoop.hive.ql.optimizer.listbucketingpruner
Class ListBucketingPruner
java.lang.Object
org.apache.hadoop.hive.ql.optimizer.listbucketingpruner.ListBucketingPruner
- All Implemented Interfaces:
- Transform
public class ListBucketingPruner
- extends Object
- implements Transform
The transformation step that does list bucketing pruning.
ListBucketingPruner
public ListBucketingPruner()
transform
public ParseContext transform(ParseContext pctx)
throws SemanticException
- Description copied from interface:
Transform
- All transformation steps implement this interface.
- Specified by:
transform
in interface Transform
- Parameters:
pctx
- input parse context
- Returns:
- ParseContext
- Throws:
SemanticException
prune
public static org.apache.hadoop.fs.Path[] prune(ParseContext ctx,
Partition part,
ExprNodeDesc pruner)
- Prunes to the directories which match the skewed keys in where clause.
Algorithm
=========
For each possible skewed element combination:
1. walk through ExprNode tree
2. decide Boolean (True/False/unknown(null))
Go through each skewed element combination again:
1. if it is skewed value, skip the directory only if it is false, otherwise keep it
2. skip the default directory only if all skewed elements,non-skewed value, are false.
Example
=======
For example:
1. skewed column (list): C1, C2
2. skewed value (list of list): (1,a), (2,b), (1,c)
Unique skewed elements for each skewed column (list of list):
(1,2,other), (a,b,c,other)
Index: (0,1,2) (0,1,2,3)
Output matches order of skewed column. Output can be read as:
C1 has unique element list (1,2,other)
C2 has unique element list (a,b,c,other)
C1\C2 | a | b | c |Other
1 | (1,a) | X | (1,c) |X
2 | X |(2,b) | X |X
other | X | X | X |X
Complete dynamic-multi-dimension collection
(0,0) (1,a) * -> T
(0,1) (1,b) -> T
(0,2) (1,c) *-> F
(0,3) (1,other)-> F
(1,0) (2,a)-> F
(1,1) (2,b) * -> T
(1,2) (2,c)-> F
(1,3) (2,other)-> F
(2,0) (other,a) -> T
(2,1) (other,b) -> T
(2,2) (other,c) -> T
(2,3) (other,other) -> T
* is skewed value entry
Expression Tree : ((c1=1) and (c2=a)) or ( (c1=3) or (c2=b))
or
/ \
and or
/ \ / \
c1=1 c2=a c1=3 c2=b
For each entry in dynamic-multi-dimension container
1. walk through the tree to decide value (please see map's value above)
2. if it is skewed value
2.1 remove the entry from the map
2.2 add directory to path unless value is false
3. otherwise, add value to map
Once it is done, go through the rest entries in map to decide default directory
1. we know all is not skewed value
2. we skip default directory only if all value is false
What we choose at the end?
1. directory for (1,a) because it 's skewed value and match returns true
2. directory for (2,b) because it 's skewed value and match returns true
3. default directory because not all non-skewed value returns false
we skip directory for (1,c) since match returns false
Note: unknown is marked in
transform(ParseContext)
newcd = new ExprNodeConstantDesc(cd.getTypeInfo(), null)
can be checked via
child_nd instanceof ExprNodeConstantDesc
&& ((ExprNodeConstantDesc) child_nd).getValue() == null)
- Parameters:
ctx
- parse contextpart
- partitionpruner
- expression node tree
- Returns:
Copyright © 2014 The Apache Software Foundation. All rights reserved.