Installing DataFu
DataFu is a collection of Apache Pig UDFs (User-Defined Functions) for statistical evaluation that were developed by LinkedIn and have now been open sourced under an Apache 2.0 license.
To use DataFu:
- Install the DataFu package:
Operating system
Install command
Red-Hat-compatible
sudo yum install pig-udf-datafu
SLES
sudo zypper install pig-udf-datafu
Debian or Ubuntu
sudo apt-get install pig-udf-datafu
This puts the datafu-0.0.4-cdh5.0.0.jar file in /usr/lib/pig.
- Register the JAR. Replace the <component_version> string with the current DataFu and CDH version numbers.
REGISTER /usr/lib/pig/datafu-<DataFu_version>-cdh<CDH_version>.jar
For example,
REGISTER /usr/lib/pig/datafu-0.0.4-cdh5.0.0.jar
A number of usage examples and other information are available at https://github.com/linkedin/datafu.