org.apache.hadoop.hive.ql.optimizer
Class LimitPushdownOptimizer
java.lang.Object
org.apache.hadoop.hive.ql.optimizer.LimitPushdownOptimizer
- All Implemented Interfaces:
- Transform
public class LimitPushdownOptimizer
- extends Object
- implements Transform
Make RS calculate top-K selection for limit clause.
It's only works with RS for limit operation which means between RS and LITMIT,
there should not be other operators which may change number of rows like FilterOperator.
see Operator.acceptLimitPushdown()
If RS is only for limiting rows, RSHash counts row with same key separately.
But if RS is for GBY, RSHash should forward all the rows with the same key.
Legend : A(a) --> key A, value a, row A(a)
If each RS in mapper tasks is forwarded rows like this
MAP1(RS) : 40(a)-10(b)-30(c)-10(d)-70(e)-80(f)
MAP2(RS) : 90(g)-80(h)-60(i)-40(j)-30(k)-20(l)
MAP3(RS) : 40(m)-50(n)-30(o)-30(p)-60(q)-70(r)
OBY or GBY makes result like this,
REDUCER : 10(b,d)-20(l)-30(c,k,o,p)-40(a,j,m)-50(n)-60(i,q)-70(e,r)-80(f,h)-90(g)
LIMIT 3 for GBY: 10(b,d)-20(l)-30(c,k,o,p)
LIMIT 3 for OBY: 10(b,d)-20(l)
with the optimization, the amount of shuffling can be reduced, making identical result
For GBY,
MAP1 : 40(a)-10(b)-30(c)-10(d)
MAP2 : 40(j)-30(k)-20(l)
MAP3 : 40(m)-50(n)-30(o)-30(p)
REDUCER : 10(b,d)-20(l)-30(c,k,o,p)-40(a,j,m)-50(n)
LIMIT 3 : 10(b,d)-20(l)-30(c,k,o,p)
For OBY,
MAP1 : 10(b)-30(c)-10(d)
MAP2 : 40(j)-30(k)-20(l)
MAP3 : 40(m)-50(n)-30(o)
REDUCER : 10(b,d)-20(l)-30(c,k,o)-40(j,m)-50(n)
LIMIT 3 : 10(b,d)-20(l)
LimitPushdownOptimizer
public LimitPushdownOptimizer()
transform
public ParseContext transform(ParseContext pctx)
throws SemanticException
- Description copied from interface:
Transform
- All transformation steps implement this interface.
- Specified by:
transform
in interface Transform
- Parameters:
pctx
- input parse context
- Returns:
- ParseContext
- Throws:
SemanticException
Copyright © 2014 The Apache Software Foundation. All rights reserved.