org.apache.hadoop.hive.ql.udf.generic
Class GenericUDAFHistogramNumeric.GenericUDAFHistogramNumericEvaluator

java.lang.Object
  extended by org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator
      extended by org.apache.hadoop.hive.ql.udf.generic.GenericUDAFHistogramNumeric.GenericUDAFHistogramNumericEvaluator
All Implemented Interfaces:
Closeable
Enclosing class:
GenericUDAFHistogramNumeric

public static class GenericUDAFHistogramNumeric.GenericUDAFHistogramNumericEvaluator
extends GenericUDAFEvaluator

Construct a histogram using an algorithm described by Ben-Haim and Tom-Tov. The algorithm is a heuristic adapted from the following paper: Yael Ben-Haim and Elad Tom-Tov, "A streaming parallel decision tree algorithm", J. Machine Learning Research 11 (2010), pp. 849--872. Although there are no approximation guarantees, it appears to work well with adequate data and a large (e.g., 20-80) number of histogram bins.


Nested Class Summary
 
Nested classes/interfaces inherited from class org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator
GenericUDAFEvaluator.AbstractAggregationBuffer, GenericUDAFEvaluator.AggregationBuffer, GenericUDAFEvaluator.AggregationType, GenericUDAFEvaluator.Mode
 
Constructor Summary
GenericUDAFHistogramNumeric.GenericUDAFHistogramNumericEvaluator()
           
 
Method Summary
 GenericUDAFEvaluator.AggregationBuffer getNewAggregationBuffer()
          Get a new aggregation object.
 ObjectInspector init(GenericUDAFEvaluator.Mode m, ObjectInspector[] parameters)
          Initialize the evaluator.
 void iterate(GenericUDAFEvaluator.AggregationBuffer agg, Object[] parameters)
          Iterate through original data.
 void merge(GenericUDAFEvaluator.AggregationBuffer agg, Object partial)
          Merge with partial aggregation result.
 void reset(GenericUDAFEvaluator.AggregationBuffer agg)
          Reset the aggregation.
 Object terminate(GenericUDAFEvaluator.AggregationBuffer agg)
          Get final aggregation result.
 Object terminatePartial(GenericUDAFEvaluator.AggregationBuffer agg)
          Get partial aggregation result.
 
Methods inherited from class org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator
aggregate, close, configure, evaluate, isEstimable
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

GenericUDAFHistogramNumeric.GenericUDAFHistogramNumericEvaluator

public GenericUDAFHistogramNumeric.GenericUDAFHistogramNumericEvaluator()
Method Detail

init

public ObjectInspector init(GenericUDAFEvaluator.Mode m,
                            ObjectInspector[] parameters)
                     throws HiveException
Description copied from class: GenericUDAFEvaluator
Initialize the evaluator.

Overrides:
init in class GenericUDAFEvaluator
Parameters:
m - The mode of aggregation.
parameters - The ObjectInspector for the parameters: In PARTIAL1 and COMPLETE mode, the parameters are original data; In PARTIAL2 and FINAL mode, the parameters are just partial aggregations (in that case, the array will always have a single element).
Returns:
The ObjectInspector for the return value. In PARTIAL1 and PARTIAL2 mode, the ObjectInspector for the return value of terminatePartial() call; In FINAL and COMPLETE mode, the ObjectInspector for the return value of terminate() call. NOTE: We need ObjectInspector[] (in addition to the TypeInfo[] in GenericUDAFResolver) for 2 reasons: 1. ObjectInspector contains more information than TypeInfo; and GenericUDAFEvaluator.init at execution time. 2. We call GenericUDAFResolver.getEvaluator at compilation time,
Throws:
HiveException

terminatePartial

public Object terminatePartial(GenericUDAFEvaluator.AggregationBuffer agg)
                        throws HiveException
Description copied from class: GenericUDAFEvaluator
Get partial aggregation result.

Specified by:
terminatePartial in class GenericUDAFEvaluator
Returns:
partial aggregation result.
Throws:
HiveException

terminate

public Object terminate(GenericUDAFEvaluator.AggregationBuffer agg)
                 throws HiveException
Description copied from class: GenericUDAFEvaluator
Get final aggregation result.

Specified by:
terminate in class GenericUDAFEvaluator
Returns:
final aggregation result.
Throws:
HiveException

merge

public void merge(GenericUDAFEvaluator.AggregationBuffer agg,
                  Object partial)
           throws HiveException
Description copied from class: GenericUDAFEvaluator
Merge with partial aggregation result. NOTE: null might be passed in case there is no input data.

Specified by:
merge in class GenericUDAFEvaluator
partial - The partial aggregation result.
Throws:
HiveException

iterate

public void iterate(GenericUDAFEvaluator.AggregationBuffer agg,
                    Object[] parameters)
             throws HiveException
Description copied from class: GenericUDAFEvaluator
Iterate through original data.

Specified by:
iterate in class GenericUDAFEvaluator
parameters - The objects of parameters.
Throws:
HiveException

getNewAggregationBuffer

public GenericUDAFEvaluator.AggregationBuffer getNewAggregationBuffer()
                                                               throws HiveException
Description copied from class: GenericUDAFEvaluator
Get a new aggregation object.

Specified by:
getNewAggregationBuffer in class GenericUDAFEvaluator
Throws:
HiveException

reset

public void reset(GenericUDAFEvaluator.AggregationBuffer agg)
           throws HiveException
Description copied from class: GenericUDAFEvaluator
Reset the aggregation. This is useful if we want to reuse the same aggregation.

Specified by:
reset in class GenericUDAFEvaluator
Throws:
HiveException


Copyright © 2014 The Apache Software Foundation. All rights reserved.