org.apache.hadoop.hive.ql.udf.generic
Class GenericUDAFCovariance

java.lang.Object
  extended by org.apache.hadoop.hive.ql.udf.generic.AbstractGenericUDAFResolver
      extended by org.apache.hadoop.hive.ql.udf.generic.GenericUDAFCovariance
All Implemented Interfaces:
GenericUDAFResolver, GenericUDAFResolver2
Direct Known Subclasses:
GenericUDAFCovarianceSample

public class GenericUDAFCovariance
extends AbstractGenericUDAFResolver

Compute the covariance covar_pop(x, y), using the following one-pass method (ref. "Formulas for Robust, One-Pass Parallel Computation of Covariances and Arbitrary-Order Statistical Moments", Philippe Pebay, Sandia Labs): Incremental: n : mx_n = mx_(n-1) + [x_n - mx_(n-1)]/n : my_n = my_(n-1) + [y_n - my_(n-1)]/n : c_n = c_(n-1) + (x_n - mx_(n-1))*(y_n - my_n) : Merge: c_X = c_A + c_B + (mx_A - mx_B)*(my_A - my_B)*n_A*n_B/n_X


Nested Class Summary
static class GenericUDAFCovariance.GenericUDAFCovarianceEvaluator
          Evaluate the variance using the algorithm described in http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance, presumably by Pébay, Philippe (2008), in "Formulas for Robust, One-Pass Parallel Computation of Covariances and Arbitrary-Order Statistical Moments", Technical Report SAND2008-6212, Sandia National Laboratories, http://infoserve.sandia.gov/sand_doc/2008/086212.pdf Incremental: n : mx_n = mx_(n-1) + [x_n - mx_(n-1)]/n : my_n = my_(n-1) + [y_n - my_(n-1)]/n : c_n = c_(n-1) + (x_n - mx_(n-1))*(y_n - my_n) : Merge: c_X = c_A + c_B + (mx_A - mx_B)*(my_A - my_B)*n_A*n_B/n_X This one-pass algorithm is stable.
 
Constructor Summary
GenericUDAFCovariance()
           
 
Method Summary
 GenericUDAFEvaluator getEvaluator(TypeInfo[] parameters)
          Get the evaluator for the parameter types.
 
Methods inherited from class org.apache.hadoop.hive.ql.udf.generic.AbstractGenericUDAFResolver
getEvaluator
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

GenericUDAFCovariance

public GenericUDAFCovariance()
Method Detail

getEvaluator

public GenericUDAFEvaluator getEvaluator(TypeInfo[] parameters)
                                  throws SemanticException
Description copied from interface: GenericUDAFResolver
Get the evaluator for the parameter types. The reason that this function returns an object instead of a class is because it is possible that the object needs some configuration (that can be serialized). In that case the class of the object has to implement the Serializable interface. At execution time, we will deserialize the object from the plan and use it to evaluate the aggregations.

If the class of the object does not implement Serializable, then we will create a new instance of the class at execution time.

Specified by:
getEvaluator in interface GenericUDAFResolver
Overrides:
getEvaluator in class AbstractGenericUDAFResolver
Parameters:
parameters - The types of the parameters. We need the type information to know which evaluator class to use.
Throws:
SemanticException


Copyright © 2014 The Apache Software Foundation. All rights reserved.