Building and deploying UDFs
This section explains the steps to compile Impala UDFs from C++ source code, and deploy the resulting libraries for use in Impala queries.
Impala UDF development package ships with a sample build environment for UDFs, that you can study, experiment with, and adapt for your own use.
The cmake configuration command reads the file CMakeLists.txt and generates a Makefile customized for your particular directory paths. Then the make command runs the actual build steps based on the rules in the Makefile.
 Impala loads the shared library from an HDFS location. After building a shared library
        containing one or more UDFs, use hdfs dfs or hadoop fs
        commands to copy the binary file to an HDFS location readable by Impala. 
 The final step in deployment is to issue a CREATE FUNCTION statement in
        the impala-shell interpreter to make Impala aware of the new function.
        Because each function is associated with a particular database, always issue a
          USE statement to the appropriate database before creating a function, or
        specify a fully qualified name, that is, CREATE FUNCTION
            db_name.function_name. 
 As you update the UDF code and redeploy updated versions of a shared library, use
          DROP FUNCTION and CREATE FUNCTION to let Impala pick up
        the latest version of the code. 
-  Install the packages using the appropriate package installation command for your
            Linux distribution.
            sudo yum install gcc-c++ cmake boost-devel sudo yum install impala-udf-devel # The package name on Ubuntu and Debian is impala-udf-dev.
-  Download the UDF sample code:
            git clone https://github.com/cloudera/impala-udf-samples cd impala-udf-samples && cmake . && make
-  Unpack the sample code in udf_samples.tar.gzand use that as a template to set up your build environment.
To build the original samples:
# Process CMakeLists.txt and set up appropriate Makefiles.
cmake .
# Generate shared libraries from UDF and UDAF sample code,
# udf_samples/libudfsample.so and udf_samples/libudasample.so
makeThe sample code to examine, experiment with, and adapt is in these files:
- 
          udf-sample.h: Header file that declares the signature for a scalar
          UDF (AddUDF).
- udf-sample.cc: Sample source for a simple UDF that adds two integers. Because Impala can reference multiple function entry points from the same shared library, you could add other UDF functions in this file and add their signatures to the corresponding header file.
- udf-sample-test.cc: Basic unit tests for the sample UDF.
- 
          uda-sample.h: Header file that declares the signature for sample
          aggregate functions. The SQL functions will be called COUNT,AVG, andSTRINGCONCAT. Because aggregate functions require more elaborate coding to handle the processing for multiple phases, there are several underlying C++ functions such asCountInit,AvgUpdate, andStringConcatFinalize.
- 
          uda-sample.cc: Sample source for simple UDAFs that demonstrate how to
          manage the state transitions as the underlying functions are called during the different
          phases of query processing. -  The UDAF that imitates the COUNTfunction keeps track of a single incrementing number; the merge functions combine the intermediate count values from each Impala node, and the combined number is returned verbatim by the finalize function.
-  The UDAF that imitates the AVGfunction keeps track of two numbers, a count of rows processed and the sum of values for a column. These numbers are updated and merged as withCOUNT, then the finalize function divides them to produce and return the final average value.
- The UDAF that concatenates string values into a comma-separated list demonstrates how to manage storage for a string that increases in length as the function is called for multiple rows.
 
-  The UDAF that imitates the 
- uda-sample-test.cc: basic unit tests for the sample UDAFs.
