org.apache.hadoop.hive.ql.udf.generic
Class GenericUDAFCovariance
java.lang.Object
org.apache.hadoop.hive.ql.udf.generic.AbstractGenericUDAFResolver
org.apache.hadoop.hive.ql.udf.generic.GenericUDAFCovariance
- All Implemented Interfaces:
- GenericUDAFResolver, GenericUDAFResolver2
- Direct Known Subclasses:
- GenericUDAFCovarianceSample
public class GenericUDAFCovariance
- extends AbstractGenericUDAFResolver
Compute the covariance covar_pop(x, y), using the following one-pass method
(ref. "Formulas for Robust, One-Pass Parallel Computation of Covariances and
Arbitrary-Order Statistical Moments", Philippe Pebay, Sandia Labs):
Incremental:
n :
mx_n = mx_(n-1) + [x_n - mx_(n-1)]/n :
my_n = my_(n-1) + [y_n - my_(n-1)]/n :
c_n = c_(n-1) + (x_n - mx_(n-1))*(y_n - my_n) :
Merge:
c_X = c_A + c_B + (mx_A - mx_B)*(my_A - my_B)*n_A*n_B/n_X
Nested Class Summary |
static class |
GenericUDAFCovariance.GenericUDAFCovarianceEvaluator
Evaluate the variance using the algorithm described in
http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance,
presumably by Pébay, Philippe (2008), in "Formulas for Robust,
One-Pass Parallel Computation of Covariances and Arbitrary-Order
Statistical Moments", Technical Report SAND2008-6212,
Sandia National Laboratories,
http://infoserve.sandia.gov/sand_doc/2008/086212.pdf
Incremental:
n :
mx_n = mx_(n-1) + [x_n - mx_(n-1)]/n :
my_n = my_(n-1) + [y_n - my_(n-1)]/n :
c_n = c_(n-1) + (x_n - mx_(n-1))*(y_n - my_n) :
Merge:
c_X = c_A + c_B + (mx_A - mx_B)*(my_A - my_B)*n_A*n_B/n_X
This one-pass algorithm is stable. |
GenericUDAFCovariance
public GenericUDAFCovariance()
getEvaluator
public GenericUDAFEvaluator getEvaluator(TypeInfo[] parameters)
throws SemanticException
- Description copied from interface:
GenericUDAFResolver
- Get the evaluator for the parameter types.
The reason that this function returns an object instead of a class is
because it is possible that the object needs some configuration (that can
be serialized). In that case the class of the object has to implement the
Serializable interface. At execution time, we will deserialize the object
from the plan and use it to evaluate the aggregations.
If the class of the object does not implement Serializable, then we will
create a new instance of the class at execution time.
- Specified by:
getEvaluator
in interface GenericUDAFResolver
- Overrides:
getEvaluator
in class AbstractGenericUDAFResolver
- Parameters:
parameters
- The types of the parameters. We need the type information to know
which evaluator class to use.
- Throws:
SemanticException
Copyright © 2014 The Apache Software Foundation. All rights reserved.