org.apache.hadoop.hive.ql.udf.generic
Class GenericUDAFVariance.GenericUDAFVarianceEvaluator

java.lang.Object
  extended by org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator
      extended by org.apache.hadoop.hive.ql.udf.generic.GenericUDAFVariance.GenericUDAFVarianceEvaluator
All Implemented Interfaces:
Closeable
Direct Known Subclasses:
GenericUDAFStd.GenericUDAFStdEvaluator, GenericUDAFStdSample.GenericUDAFStdSampleEvaluator, GenericUDAFVarianceSample.GenericUDAFVarianceSampleEvaluator
Enclosing class:
GenericUDAFVariance

public static class GenericUDAFVariance.GenericUDAFVarianceEvaluator
extends GenericUDAFEvaluator

Evaluate the variance using the algorithm described by Chan, Golub, and LeVeque in "Algorithms for computing the sample variance: analysis and recommendations" The American Statistician, 37 (1983) pp. 242--247. variance = variance1 + variance2 + n/(m*(m+n)) * pow(((m/n)*t1 - t2),2) where: - variance is sum[x-avg^2] (this is actually n times the variance) and is updated at every step. - n is the count of elements in chunk1 - m is the count of elements in chunk2 - t1 = sum of elements in chunk1, t2 = sum of elements in chunk2. This algorithm was proven to be numerically stable by J.L. Barlow in "Error analysis of a pairwise summation algorithm to compute sample variance" Numer. Math, 58 (1991) pp. 583--590


Nested Class Summary
 
Nested classes/interfaces inherited from class org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator
GenericUDAFEvaluator.AbstractAggregationBuffer, GenericUDAFEvaluator.AggregationBuffer, GenericUDAFEvaluator.AggregationType, GenericUDAFEvaluator.Mode
 
Constructor Summary
GenericUDAFVariance.GenericUDAFVarianceEvaluator()
           
 
Method Summary
 GenericUDAFEvaluator.AggregationBuffer getNewAggregationBuffer()
          Get a new aggregation object.
 DoubleWritable getResult()
           
 ObjectInspector init(GenericUDAFEvaluator.Mode m, ObjectInspector[] parameters)
          Initialize the evaluator.
 void iterate(GenericUDAFEvaluator.AggregationBuffer agg, Object[] parameters)
          Iterate through original data.
 void merge(GenericUDAFEvaluator.AggregationBuffer agg, Object partial)
          Merge with partial aggregation result.
 void reset(GenericUDAFEvaluator.AggregationBuffer agg)
          Reset the aggregation.
 void setResult(DoubleWritable result)
           
 Object terminate(GenericUDAFEvaluator.AggregationBuffer agg)
          Get final aggregation result.
 Object terminatePartial(GenericUDAFEvaluator.AggregationBuffer agg)
          Get partial aggregation result.
 
Methods inherited from class org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator
aggregate, close, configure, evaluate, isEstimable
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

GenericUDAFVariance.GenericUDAFVarianceEvaluator

public GenericUDAFVariance.GenericUDAFVarianceEvaluator()
Method Detail

init

public ObjectInspector init(GenericUDAFEvaluator.Mode m,
                            ObjectInspector[] parameters)
                     throws HiveException
Description copied from class: GenericUDAFEvaluator
Initialize the evaluator.

Overrides:
init in class GenericUDAFEvaluator
Parameters:
m - The mode of aggregation.
parameters - The ObjectInspector for the parameters: In PARTIAL1 and COMPLETE mode, the parameters are original data; In PARTIAL2 and FINAL mode, the parameters are just partial aggregations (in that case, the array will always have a single element).
Returns:
The ObjectInspector for the return value. In PARTIAL1 and PARTIAL2 mode, the ObjectInspector for the return value of terminatePartial() call; In FINAL and COMPLETE mode, the ObjectInspector for the return value of terminate() call. NOTE: We need ObjectInspector[] (in addition to the TypeInfo[] in GenericUDAFResolver) for 2 reasons: 1. ObjectInspector contains more information than TypeInfo; and GenericUDAFEvaluator.init at execution time. 2. We call GenericUDAFResolver.getEvaluator at compilation time,
Throws:
HiveException

getNewAggregationBuffer

public GenericUDAFEvaluator.AggregationBuffer getNewAggregationBuffer()
                                                               throws HiveException
Description copied from class: GenericUDAFEvaluator
Get a new aggregation object.

Specified by:
getNewAggregationBuffer in class GenericUDAFEvaluator
Throws:
HiveException

reset

public void reset(GenericUDAFEvaluator.AggregationBuffer agg)
           throws HiveException
Description copied from class: GenericUDAFEvaluator
Reset the aggregation. This is useful if we want to reuse the same aggregation.

Specified by:
reset in class GenericUDAFEvaluator
Throws:
HiveException

iterate

public void iterate(GenericUDAFEvaluator.AggregationBuffer agg,
                    Object[] parameters)
             throws HiveException
Description copied from class: GenericUDAFEvaluator
Iterate through original data.

Specified by:
iterate in class GenericUDAFEvaluator
parameters - The objects of parameters.
Throws:
HiveException

terminatePartial

public Object terminatePartial(GenericUDAFEvaluator.AggregationBuffer agg)
                        throws HiveException
Description copied from class: GenericUDAFEvaluator
Get partial aggregation result.

Specified by:
terminatePartial in class GenericUDAFEvaluator
Returns:
partial aggregation result.
Throws:
HiveException

merge

public void merge(GenericUDAFEvaluator.AggregationBuffer agg,
                  Object partial)
           throws HiveException
Description copied from class: GenericUDAFEvaluator
Merge with partial aggregation result. NOTE: null might be passed in case there is no input data.

Specified by:
merge in class GenericUDAFEvaluator
partial - The partial aggregation result.
Throws:
HiveException

terminate

public Object terminate(GenericUDAFEvaluator.AggregationBuffer agg)
                 throws HiveException
Description copied from class: GenericUDAFEvaluator
Get final aggregation result.

Specified by:
terminate in class GenericUDAFEvaluator
Returns:
final aggregation result.
Throws:
HiveException

setResult

public void setResult(DoubleWritable result)

getResult

public DoubleWritable getResult()


Copyright © 2014 The Apache Software Foundation. All rights reserved.