|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object org.apache.hadoop.hive.ql.udf.generic.NGramEstimator
public class NGramEstimator
A generic, re-usable n-gram estimation class that supports partial aggregations. The algorithm is based on the heuristic from the following paper: Yael Ben-Haim and Elad Tom-Tov, "A streaming parallel decision tree algorithm", J. Machine Learning Research 11 (2010), pp. 849--872. In particular, it is guaranteed that frequencies will be under-counted. With large data and a reasonable precision factor, this undercounting appears to be on the order of 5%.
Constructor Summary | |
---|---|
NGramEstimator()
Creates a new n-gram estimator object. |
Method Summary | |
---|---|
void |
add(ArrayList<String> ng)
Adds a new n-gram to the estimation. |
ArrayList<Object[]> |
getNGrams()
Returns the final top-k n-grams in a format suitable for returning to Hive. |
void |
initialize(int pk,
int ppf,
int pn)
Sets the 'k' and 'pf' parameters. |
boolean |
isInitialized()
Returns true if the 'k' and 'pf' parameters have been set. |
void |
merge(List<org.apache.hadoop.io.Text> other)
Takes a serialized n-gram estimator object created by the serialize() method and merges it with the current n-gram object. |
void |
reset()
Resets an n-gram estimator object to its initial state. |
ArrayList<org.apache.hadoop.io.Text> |
serialize()
In preparation for a Hive merge() call, serializes the current n-gram estimator object into an ArrayList of Text objects. |
int |
size()
Returns the number of n-grams in our buffer. |
Methods inherited from class java.lang.Object |
---|
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public NGramEstimator()
Method Detail |
---|
public boolean isInitialized()
public void initialize(int pk, int ppf, int pn) throws HiveException
HiveException
public void reset()
public ArrayList<Object[]> getNGrams() throws HiveException
HiveException
public int size()
public void add(ArrayList<String> ng) throws HiveException
ng
- The n-gram to add to the estimation
HiveException
public void merge(List<org.apache.hadoop.io.Text> other) throws HiveException
other
- A serialized n-gram object created by the serialize() method
HiveException
public ArrayList<org.apache.hadoop.io.Text> serialize() throws HiveException
HiveException
merge(java.util.List)
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |