|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object org.apache.hadoop.hive.ql.udf.generic.NumericHistogram
public class NumericHistogram
A generic, re-usable histogram class that supports partial aggregations. The algorithm is a heuristic adapted from the following paper: Yael Ben-Haim and Elad Tom-Tov, "A streaming parallel decision tree algorithm", J. Machine Learning Research 11 (2010), pp. 849--872. Although there are no approximation guarantees, it appears to work well with adequate data and a large (e.g., 20-80) number of histogram bins.
Constructor Summary | |
---|---|
NumericHistogram()
Creates a new histogram object. |
Method Summary | |
---|---|
void |
add(double v)
Adds a new data point to the histogram approximation. |
void |
allocate(int num_bins)
Sets the number of histogram bins to use for approximating data. |
org.apache.hadoop.hive.ql.udf.generic.NumericHistogram.Coord |
getBin(int b)
Returns a particular histogram bin. |
int |
getNumBins()
|
int |
getUsedBins()
Returns the number of bins currently being used by the histogram. |
boolean |
isReady()
Returns true if this histogram object has been initialized by calling merge() or allocate(). |
void |
merge(List<DoubleWritable> other)
Takes a serialized histogram created by the serialize() method and merges it with the current histogram object. |
double |
quantile(double q)
Gets an approximate quantile value from the current histogram. |
void |
reset()
Resets a histogram object to its initial state. |
ArrayList<DoubleWritable> |
serialize()
In preparation for a Hive merge() call, serializes the current histogram object into an ArrayList of DoubleWritable objects. |
Methods inherited from class java.lang.Object |
---|
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public NumericHistogram()
Method Detail |
---|
public void reset()
public int getUsedBins()
public boolean isReady()
public org.apache.hadoop.hive.ql.udf.generic.NumericHistogram.Coord getBin(int b)
public void allocate(int num_bins)
num_bins
- Number of non-uniform-width histogram bins to usepublic void merge(List<DoubleWritable> other)
other
- A serialized histogram created by the serialize() methodmerge(java.util.List)
public void add(double v)
v
- The data point to add to the histogram approximation.public double quantile(double q)
q
- The requested quantile, must be strictly within the range (0,1).
public ArrayList<DoubleWritable> serialize()
merge(java.util.List)
public int getNumBins()
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |