org.apache.hadoop.hive.ql.udf.generic
Class GenericUDAFCorrelation
java.lang.Object
org.apache.hadoop.hive.ql.udf.generic.AbstractGenericUDAFResolver
org.apache.hadoop.hive.ql.udf.generic.GenericUDAFCorrelation
- All Implemented Interfaces:
- GenericUDAFResolver, GenericUDAFResolver2
public class GenericUDAFCorrelation
- extends AbstractGenericUDAFResolver
Compute the Pearson correlation coefficient corr(x, y), using the following
stable one-pass method, based on:
"Formulas for Robust, One-Pass Parallel Computation of Covariances and
Arbitrary-Order Statistical Moments", Philippe Pebay, Sandia Labs
and "The Art of Computer Programming, volume 2: Seminumerical Algorithms",
Donald Knuth.
Incremental:
n :
mx_n = mx_(n-1) + [x_n - mx_(n-1)]/n :
my_n = my_(n-1) + [y_n - my_(n-1)]/n :
c_n = c_(n-1) + (x_n - mx_(n-1))*(y_n - my_n) :
vx_n = vx_(n-1) + (x_n - mx_n)(x_n - mx_(n-1)):
vy_n = vy_(n-1) + (y_n - my_n)(y_n - my_(n-1)):
Merge:
c_(A,B) = c_A + c_B + (mx_A - mx_B)*(my_A - my_B)*n_A*n_B/(n_A+n_B)
vx_(A,B) = vx_A + vx_B + (mx_A - mx_B)*(mx_A - mx_B)*n_A*n_B/(n_A+n_B)
vy_(A,B) = vy_A + vy_B + (my_A - my_B)*(my_A - my_B)*n_A*n_B/(n_A+n_B)
GenericUDAFCorrelation
public GenericUDAFCorrelation()
getEvaluator
public GenericUDAFEvaluator getEvaluator(TypeInfo[] parameters)
throws SemanticException
- Description copied from interface:
GenericUDAFResolver
- Get the evaluator for the parameter types.
The reason that this function returns an object instead of a class is
because it is possible that the object needs some configuration (that can
be serialized). In that case the class of the object has to implement the
Serializable interface. At execution time, we will deserialize the object
from the plan and use it to evaluate the aggregations.
If the class of the object does not implement Serializable, then we will
create a new instance of the class at execution time.
- Specified by:
getEvaluator
in interface GenericUDAFResolver
- Overrides:
getEvaluator
in class AbstractGenericUDAFResolver
- Parameters:
parameters
- The types of the parameters. We need the type information to know
which evaluator class to use.
- Throws:
SemanticException
Copyright © 2014 The Apache Software Foundation. All rights reserved.