org.apache.hadoop.hive.ql.optimizer.listbucketingpruner
Class ListBucketingPruner.DynamicMultiDimensionalCollection

java.lang.Object
  extended by org.apache.hadoop.hive.ql.optimizer.listbucketingpruner.ListBucketingPruner.DynamicMultiDimensionalCollection
Enclosing class:
ListBucketingPruner

public static class ListBucketingPruner.DynamicMultiDimensionalCollection
extends Object

Note: this class is not designed to be used in general but for list bucketing pruner only. The structure addresses the following requirements: 1. multiple dimension collection 2. length of each dimension is dynamic. It's decided at runtime. The first user is list bucketing pruner and used in pruning phase: 1. Each skewed column has a batch of skewed elements. 2. One skewed column represents one dimension. 3. Length of dimension is size of skewed elements. 4. no. of skewed columns and length of dimension are dynamic and configured by user. use case: ======== Use case #1: Multiple dimension collection represents if to select a directory representing by the cell. skewed column: C1, C2, C3 skewed value: (1,a,x), (2,b,x), (1,c,x), (2,a,y) Other: represent value for the column which is not part of skewed value. C3 = x C1\C2 | a | b | c |Other 1 | Boolean(1,a,x) | X | Boolean(1,c,x) |X 2 | X |Boolean(2,b,x) | X |X other | X | X | X |X C3 = y C1\C2 | a | b | c |Other 1 | X | X | X |X 2 | Boolean(2,a,y) | X | X |X other | X | X | X |X Boolean is cell type which can be False/True/Null(Unknown). (1,a,x) is just for information purpose to explain which skewed value it represents. 1. value of Boolean(1,a,x) represents if we select the directory for list bucketing 2. value of Boolean(2,b,x) represents if we select the directory for list bucketing ... 3. All the rest, marked as "X", will decide if to pickup the default directory. 4. Not only "other" columns/rows but also the rest as long as it doesn't represent skewed value. For cell representing skewed value: 1. False, skip the directory 2. True/Unknown, select the directory For cells representing default directory: 1. only if all cells are false, skip the directory 2. all other cases, select the directory Use case #2: Multiple dimension collection represents skewed elements so that walk through tree one by one. Cell is a List representing the value mapping from index path and skewed value. skewed column: C1, C2, C3 skewed value: (1,a,x), (2,b,x), (1,c,x), (2,a,y) Other: represent value for the column which is not part of skewed value. C3 = x C1\C2 | a | b | c |Other 1 | (1,a,x) | X | (1,c,x) |X 2 | X |(2,b,x) | X |X other | X | X | X |X C3 = y C1\C2 | a | b | c |Other 1 | X | X | X |X 2 | (2,a,y) | X | X |X other | X | X | X |X Implementation: ============== please see another example in ListBucketingPruner.prune(org.apache.hadoop.hive.ql.parse.ParseContext, org.apache.hadoop.hive.ql.metadata.Partition, org.apache.hadoop.hive.ql.plan.ExprNodeDesc) We will use a HasMap to represent the Dynamic-Multiple-Dimension collection: 1. Key is List representing the index path to the cell 2. value represents the cell (Boolean for use case #1, List for case #2) For example: 1. skewed column (list): C1, C2, C3 2. skewed value (list of list): (1,a,x), (2,b,x), (1,c,x), (2,a,y) From skewed value, we calculate the unique skewed element for each skewed column: C1: (1,2) C2: (a,b,c) C3: (x,y) We store them in list of list. We don't need to store skewed column name since we use order to match: 1. Skewed column (list): C1, C2, C3 2. Unique skewed elements for each skewed column (list of list): (1,2,other), (a,b,c,other), (x,y,other) 3. index (0,1,2) (0,1,2,3) (0,1,2) We use the index,starting at 0. to construct hashmap representing dynamic-multi-dimension collection: key (what skewed value key represents) -> value (Boolean for use case #1, List for case #2). (0,0,0) (1,a,x) (0,0,1) (1,a,y) (0,1,0) (1,b,x) (0,1,1) (1,b,y) (0,2,0) (1,c,x) (0,2,1) (1,c,y) (1,0,0) (2,a,x) (1,0,1) (2,a,y) (1,1,0) (2,b,x) (1,1,1) (2,b,y) (1,2,0) (2,c,x) (1,2,1) (2,c,y) ...


Constructor Summary
ListBucketingPruner.DynamicMultiDimensionalCollection()
           
 
Method Summary
static List<List<String>> flat(List<List<String>> uniqSkewedElements)
          Flat a dynamic-multi-dimension collection.
static List<List<String>> generateCollection(List<List<String>> values)
          Find out complete skewed-element collection For example: 1.
static List<List<String>> uniqueElementsList(List<List<String>> values, String defaultDirName)
          Convert value to unique element list.
static List<List<String>> uniqueSkewedValueList(List<List<String>> values)
          Convert value to unique skewed value list.
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

ListBucketingPruner.DynamicMultiDimensionalCollection

public ListBucketingPruner.DynamicMultiDimensionalCollection()
Method Detail

generateCollection

public static List<List<String>> generateCollection(List<List<String>> values)
                                             throws SemanticException
Find out complete skewed-element collection For example: 1. skewed column (list): C1, C2 2. skewed value (list of list): (1,a), (2,b), (1,c) It returns the complete collection (1,a) , (1,b) , (1,c) , (1,other), (2,a), (2,b) , (2,c), (2,other), (other,a), (other,b), (other,c), (other,other)

Throws:
SemanticException

uniqueElementsList

public static List<List<String>> uniqueElementsList(List<List<String>> values,
                                                    String defaultDirName)
Convert value to unique element list. This is specific for skew value use case: For example: 1. skewed column (list): C1, C2, C3 2. skewed value (list of list): (1,a,x), (2,b,x), (1,c,x), (2,a,y) Input: skewed value (list of list): (1,a,x), (2,b,x), (1,c,x), (2,a,y) Output: Unique skewed elements for each skewed column (list of list): (1,2,other), (a,b,c,other), (x,y,other) Output matches order of skewed column. Output can be read as: C1 has unique element list (1,2,other) C2 has unique element list (a,b,c,other) C3 has unique element list (x,y,other) Other represents any value which is not part skewed-value combination.

Parameters:
values - skewed value list
Returns:
a list of unique element lists

uniqueSkewedValueList

public static List<List<String>> uniqueSkewedValueList(List<List<String>> values)
Convert value to unique skewed value list. It is used in ListBucketingPrunerUtils.evaluateExprOnCell(java.util.List, java.util.List, org.apache.hadoop.hive.ql.plan.ExprNodeDesc, java.util.List>) For example: 1. skewed column (list): C1, C2, C3 2. skewed value (list of list): (1,a,x), (2,b,x), (1,c,x), (2,a,y) Input: skewed value (list of list): (1,a,x), (2,b,x), (1,c,x), (2,a,y) Output: Unique skewed value for each skewed column (list of list): (1,2), (a,b,c), (x,y) Output matches order of skewed column. Output can be read as: C1 has unique skewed value list (1,2,) C2 has unique skewed value list (a,b,c) C3 has unique skewed value list (x,y)

Parameters:
values - skewed value list
Returns:
a list of unique skewed value lists

flat

public static List<List<String>> flat(List<List<String>> uniqSkewedElements)
                               throws SemanticException
Flat a dynamic-multi-dimension collection. For example: 1. skewed column (list): C1, C2, C3 2. skewed value (list of list): (1,a,x), (2,b,x), (1,c,x), (2,a,y) Unique skewed elements for each skewed column (list of list): (1,2,other), (a,b,c,other) Index: (0,1,2) (0,1,2,3) Complete dynamic-multi-dimension collection (0,0) (1,a) * -> T (0,1) (1,b) -> T (0,2) (1,c) *-> F (0,3) (1,other)-> F (1,0) (2,a)-> F (1,1) (2,b) * -> T (1,2) (2,c)-> F (1,3) (2,other)-> F (2,0) (other,a) -> T (2,1) (other,b) -> T (2,2) (other,c) -> T (2,3) (other,other) -> T * is skewed value entry

Parameters:
uniqSkewedElements -
Returns:
Throws:
SemanticException


Copyright © 2014 The Apache Software Foundation. All rights reserved.