org.apache.hadoop.hive.ql.udf.generic
Class GenericUDAFCovariance.GenericUDAFCovarianceEvaluator

java.lang.Object
  extended by org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator
      extended by org.apache.hadoop.hive.ql.udf.generic.GenericUDAFCovariance.GenericUDAFCovarianceEvaluator
All Implemented Interfaces:
Closeable
Direct Known Subclasses:
GenericUDAFCovarianceSample.GenericUDAFCovarianceSampleEvaluator
Enclosing class:
GenericUDAFCovariance

public static class GenericUDAFCovariance.GenericUDAFCovarianceEvaluator
extends GenericUDAFEvaluator

Evaluate the variance using the algorithm described in http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance, presumably by Pébay, Philippe (2008), in "Formulas for Robust, One-Pass Parallel Computation of Covariances and Arbitrary-Order Statistical Moments", Technical Report SAND2008-6212, Sandia National Laboratories, http://infoserve.sandia.gov/sand_doc/2008/086212.pdf Incremental: n : mx_n = mx_(n-1) + [x_n - mx_(n-1)]/n : my_n = my_(n-1) + [y_n - my_(n-1)]/n : c_n = c_(n-1) + (x_n - mx_(n-1))*(y_n - my_n) : Merge: c_X = c_A + c_B + (mx_A - mx_B)*(my_A - my_B)*n_A*n_B/n_X This one-pass algorithm is stable.


Nested Class Summary
 
Nested classes/interfaces inherited from class org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator
GenericUDAFEvaluator.AbstractAggregationBuffer, GenericUDAFEvaluator.AggregationBuffer, GenericUDAFEvaluator.AggregationType, GenericUDAFEvaluator.Mode
 
Constructor Summary
GenericUDAFCovariance.GenericUDAFCovarianceEvaluator()
           
 
Method Summary
 GenericUDAFEvaluator.AggregationBuffer getNewAggregationBuffer()
          Get a new aggregation object.
 DoubleWritable getResult()
           
 ObjectInspector init(GenericUDAFEvaluator.Mode m, ObjectInspector[] parameters)
          Initialize the evaluator.
 void iterate(GenericUDAFEvaluator.AggregationBuffer agg, Object[] parameters)
          Iterate through original data.
 void merge(GenericUDAFEvaluator.AggregationBuffer agg, Object partial)
          Merge with partial aggregation result.
 void reset(GenericUDAFEvaluator.AggregationBuffer agg)
          Reset the aggregation.
 void setResult(DoubleWritable result)
           
 Object terminate(GenericUDAFEvaluator.AggregationBuffer agg)
          Get final aggregation result.
 Object terminatePartial(GenericUDAFEvaluator.AggregationBuffer agg)
          Get partial aggregation result.
 
Methods inherited from class org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator
aggregate, close, configure, evaluate, isEstimable
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

GenericUDAFCovariance.GenericUDAFCovarianceEvaluator

public GenericUDAFCovariance.GenericUDAFCovarianceEvaluator()
Method Detail

init

public ObjectInspector init(GenericUDAFEvaluator.Mode m,
                            ObjectInspector[] parameters)
                     throws HiveException
Description copied from class: GenericUDAFEvaluator
Initialize the evaluator.

Overrides:
init in class GenericUDAFEvaluator
Parameters:
m - The mode of aggregation.
parameters - The ObjectInspector for the parameters: In PARTIAL1 and COMPLETE mode, the parameters are original data; In PARTIAL2 and FINAL mode, the parameters are just partial aggregations (in that case, the array will always have a single element).
Returns:
The ObjectInspector for the return value. In PARTIAL1 and PARTIAL2 mode, the ObjectInspector for the return value of terminatePartial() call; In FINAL and COMPLETE mode, the ObjectInspector for the return value of terminate() call. NOTE: We need ObjectInspector[] (in addition to the TypeInfo[] in GenericUDAFResolver) for 2 reasons: 1. ObjectInspector contains more information than TypeInfo; and GenericUDAFEvaluator.init at execution time. 2. We call GenericUDAFResolver.getEvaluator at compilation time,
Throws:
HiveException

getNewAggregationBuffer

public GenericUDAFEvaluator.AggregationBuffer getNewAggregationBuffer()
                                                               throws HiveException
Description copied from class: GenericUDAFEvaluator
Get a new aggregation object.

Specified by:
getNewAggregationBuffer in class GenericUDAFEvaluator
Throws:
HiveException

reset

public void reset(GenericUDAFEvaluator.AggregationBuffer agg)
           throws HiveException
Description copied from class: GenericUDAFEvaluator
Reset the aggregation. This is useful if we want to reuse the same aggregation.

Specified by:
reset in class GenericUDAFEvaluator
Throws:
HiveException

iterate

public void iterate(GenericUDAFEvaluator.AggregationBuffer agg,
                    Object[] parameters)
             throws HiveException
Description copied from class: GenericUDAFEvaluator
Iterate through original data.

Specified by:
iterate in class GenericUDAFEvaluator
parameters - The objects of parameters.
Throws:
HiveException

terminatePartial

public Object terminatePartial(GenericUDAFEvaluator.AggregationBuffer agg)
                        throws HiveException
Description copied from class: GenericUDAFEvaluator
Get partial aggregation result.

Specified by:
terminatePartial in class GenericUDAFEvaluator
Returns:
partial aggregation result.
Throws:
HiveException

merge

public void merge(GenericUDAFEvaluator.AggregationBuffer agg,
                  Object partial)
           throws HiveException
Description copied from class: GenericUDAFEvaluator
Merge with partial aggregation result. NOTE: null might be passed in case there is no input data.

Specified by:
merge in class GenericUDAFEvaluator
partial - The partial aggregation result.
Throws:
HiveException

terminate

public Object terminate(GenericUDAFEvaluator.AggregationBuffer agg)
                 throws HiveException
Description copied from class: GenericUDAFEvaluator
Get final aggregation result.

Specified by:
terminate in class GenericUDAFEvaluator
Returns:
final aggregation result.
Throws:
HiveException

setResult

public void setResult(DoubleWritable result)

getResult

public DoubleWritable getResult()


Copyright © 2014 The Apache Software Foundation. All rights reserved.