org.apache.hadoop.hive.ql.udf.generic
Class GenericUDAFEvaluator

java.lang.Object
  extended by org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator
All Implemented Interfaces:
Closeable
Direct Known Subclasses:
GenericUDAFAverage.AbstractGenericUDAFAverageEvaluator, GenericUDAFBridge.GenericUDAFBridgeEvaluator, GenericUDAFComputeStats.GenericUDAFBinaryStatsEvaluator, GenericUDAFComputeStats.GenericUDAFBooleanStatsEvaluator, GenericUDAFComputeStats.GenericUDAFDecimalStatsEvaluator, GenericUDAFComputeStats.GenericUDAFDoubleStatsEvaluator, GenericUDAFComputeStats.GenericUDAFLongStatsEvaluator, GenericUDAFComputeStats.GenericUDAFStringStatsEvaluator, GenericUDAFContextNGrams.GenericUDAFContextNGramEvaluator, GenericUDAFCorrelation.GenericUDAFCorrelationEvaluator, GenericUDAFCount.GenericUDAFCountEvaluator, GenericUDAFCovariance.GenericUDAFCovarianceEvaluator, GenericUDAFEWAHBitmap.GenericUDAFEWAHBitmapEvaluator, GenericUDAFFirstValue.GenericUDAFFirstValueEvaluator, GenericUDAFHistogramNumeric.GenericUDAFHistogramNumericEvaluator, GenericUDAFLastValue.GenericUDAFLastValueEvaluator, GenericUDAFLeadLag.GenericUDAFLeadLagEvaluator, GenericUDAFMax.GenericUDAFMaxEvaluator, GenericUDAFMin.GenericUDAFMinEvaluator, GenericUDAFMkCollectionEvaluator, GenericUDAFnGrams.GenericUDAFnGramEvaluator, GenericUDAFNTile.GenericUDAFNTileEvaluator, GenericUDAFPercentileApprox.GenericUDAFPercentileApproxEvaluator, GenericUDAFRank.GenericUDAFRankEvaluator, GenericUDAFRowNumber.GenericUDAFRowNumberEvaluator, GenericUDAFSum.GenericUDAFSumDouble, GenericUDAFSum.GenericUDAFSumHiveDecimal, GenericUDAFSum.GenericUDAFSumLong, GenericUDAFVariance.GenericUDAFVarianceEvaluator

public abstract class GenericUDAFEvaluator
extends Object
implements Closeable

A Generic User-defined aggregation function (GenericUDAF) for the use with Hive. New GenericUDAF classes need to inherit from this GenericUDAF class. The GenericUDAF are superior to normal UDAFs in the following ways: 1. It can accept arguments of complex types, and return complex types. 2. It can accept variable length of arguments. 3. It can accept an infinite number of function signature - for example, it's easy to write a GenericUDAF that accepts array, array> and so on (arbitrary levels of nesting).


Nested Class Summary
static class GenericUDAFEvaluator.AbstractAggregationBuffer
           
static interface GenericUDAFEvaluator.AggregationBuffer
          Deprecated. use GenericUDAFEvaluator.AbstractAggregationBuffer instead
static interface GenericUDAFEvaluator.AggregationType
           
static class GenericUDAFEvaluator.Mode
          Mode.
 
Constructor Summary
GenericUDAFEvaluator()
          The constructor.
 
Method Summary
 void aggregate(GenericUDAFEvaluator.AggregationBuffer agg, Object[] parameters)
          This function will be called by GroupByOperator when it sees a new input row.
 void close()
          Close GenericUDFEvaluator.
 void configure(MapredContext mapredContext)
          Additionally setup GenericUDAFEvaluator with MapredContext before initializing.
 Object evaluate(GenericUDAFEvaluator.AggregationBuffer agg)
          This function will be called by GroupByOperator when it sees a new input row.
abstract  GenericUDAFEvaluator.AggregationBuffer getNewAggregationBuffer()
          Get a new aggregation object.
 ObjectInspector init(GenericUDAFEvaluator.Mode m, ObjectInspector[] parameters)
          Initialize the evaluator.
static boolean isEstimable(GenericUDAFEvaluator.AggregationBuffer buffer)
           
abstract  void iterate(GenericUDAFEvaluator.AggregationBuffer agg, Object[] parameters)
          Iterate through original data.
abstract  void merge(GenericUDAFEvaluator.AggregationBuffer agg, Object partial)
          Merge with partial aggregation result.
abstract  void reset(GenericUDAFEvaluator.AggregationBuffer agg)
          Reset the aggregation.
abstract  Object terminate(GenericUDAFEvaluator.AggregationBuffer agg)
          Get final aggregation result.
abstract  Object terminatePartial(GenericUDAFEvaluator.AggregationBuffer agg)
          Get partial aggregation result.
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

GenericUDAFEvaluator

public GenericUDAFEvaluator()
The constructor.

Method Detail

isEstimable

public static boolean isEstimable(GenericUDAFEvaluator.AggregationBuffer buffer)

configure

public void configure(MapredContext mapredContext)
Additionally setup GenericUDAFEvaluator with MapredContext before initializing. This is only called in runtime of MapRedTask.

Parameters:
context - context

init

public ObjectInspector init(GenericUDAFEvaluator.Mode m,
                            ObjectInspector[] parameters)
                     throws HiveException
Initialize the evaluator.

Parameters:
m - The mode of aggregation.
parameters - The ObjectInspector for the parameters: In PARTIAL1 and COMPLETE mode, the parameters are original data; In PARTIAL2 and FINAL mode, the parameters are just partial aggregations (in that case, the array will always have a single element).
Returns:
The ObjectInspector for the return value. In PARTIAL1 and PARTIAL2 mode, the ObjectInspector for the return value of terminatePartial() call; In FINAL and COMPLETE mode, the ObjectInspector for the return value of terminate() call. NOTE: We need ObjectInspector[] (in addition to the TypeInfo[] in GenericUDAFResolver) for 2 reasons: 1. ObjectInspector contains more information than TypeInfo; and GenericUDAFEvaluator.init at execution time. 2. We call GenericUDAFResolver.getEvaluator at compilation time,
Throws:
HiveException

getNewAggregationBuffer

public abstract GenericUDAFEvaluator.AggregationBuffer getNewAggregationBuffer()
                                                                        throws HiveException
Get a new aggregation object.

Throws:
HiveException

reset

public abstract void reset(GenericUDAFEvaluator.AggregationBuffer agg)
                    throws HiveException
Reset the aggregation. This is useful if we want to reuse the same aggregation.

Throws:
HiveException

close

public void close()
           throws IOException
Close GenericUDFEvaluator. This is only called in runtime of MapRedTask.

Specified by:
close in interface Closeable
Throws:
IOException

aggregate

public void aggregate(GenericUDAFEvaluator.AggregationBuffer agg,
                      Object[] parameters)
               throws HiveException
This function will be called by GroupByOperator when it sees a new input row.

Parameters:
agg - The object to store the aggregation result.
parameters - The row, can be inspected by the OIs passed in init().
Throws:
HiveException

evaluate

public Object evaluate(GenericUDAFEvaluator.AggregationBuffer agg)
                throws HiveException
This function will be called by GroupByOperator when it sees a new input row.

Parameters:
agg - The object to store the aggregation result.
Throws:
HiveException

iterate

public abstract void iterate(GenericUDAFEvaluator.AggregationBuffer agg,
                             Object[] parameters)
                      throws HiveException
Iterate through original data.

Parameters:
parameters - The objects of parameters.
Throws:
HiveException

terminatePartial

public abstract Object terminatePartial(GenericUDAFEvaluator.AggregationBuffer agg)
                                 throws HiveException
Get partial aggregation result.

Returns:
partial aggregation result.
Throws:
HiveException

merge

public abstract void merge(GenericUDAFEvaluator.AggregationBuffer agg,
                           Object partial)
                    throws HiveException
Merge with partial aggregation result. NOTE: null might be passed in case there is no input data.

Parameters:
partial - The partial aggregation result.
Throws:
HiveException

terminate

public abstract Object terminate(GenericUDAFEvaluator.AggregationBuffer agg)
                          throws HiveException
Get final aggregation result.

Returns:
final aggregation result.
Throws:
HiveException


Copyright © 2014 The Apache Software Foundation. All rights reserved.