Managing User-Defined Functions (UDFs) with HiveServer2

Hive's query language (HiveQL) can be extended with Java-based user-defined functions (UDFs). See the Apache Hive Language Manual UDF page for information about Hive built-in UDFs. To create customized UDFs, see the Apache Hive wiki. After creating a new Java class to extend the com.example.hive.udf package, you must compile your code into a Java archive file (JAR), and add it to the Hive classpath with the ADD JAR command. The ADD JAR command does not work with HiveServer2 and the Beeline client when Beeline runs on a different host. As an alternative to ADD JAR, Hive's auxiliary paths functionality should be used.

Perform one of the following procedures depending on whether you want to create permanent or temporary functions.

Blacklist for Built-in UDFs

HiveServer2 maintains a blacklist for built-in UDFs to secure itself against attacks in a multiuser scenario where the hive user's credentials can be used to execute any Java code.
hive.server2.builtin.udf.blacklist A comma separated list of built-in UDFs that are not allowed to be executed. A UDF that is included in the list will return an error if invoked from a query.

Default value: Empty

User-Defined Functions (UDFs) with HiveServer2 Using Cloudera Manager

Minimum Required Role: Configurator (also provided by Cluster Administrator, Full Administrator)

Creating Permanent Functions

  1. Copy the JAR file to HDFS and make sure the hive user can access this JAR file.
  2. Copy the JAR file to the host on which HiveServer2 is running. Save the JARs to any directory you choose, give the hive user read, write, and execute access to this directory, and make a note of the path (for example, /opt/local/hive/lib/).
  3. In the Cloudera Manager Admin Console, go to the Hive service.
  4. Click the Configuration tab.
  5. Expand the Hive (Service-Wide) scope.
  6. Click the Advanced category.
  7. Configure the Hive Auxiliary JARs Directory property with the HiveServer2 host path and the Hive Metastore host path from Step 2, for example /opt/local/hive/lib/. Setting this property overwrites hive.aux.jars.path, even if this variable has been previously set in the HiveServer2 advanced configuration snippet.
  8. Click Save Changes. The JARs are added to HIVE_AUX_JARS_PATH environment variable.
  9. Redeploy the Hive client configuration.
    1. In the Cloudera Manager Admin Console, go to the Hive service.
    2. From the Actions menu at the top right of the service page, select Deploy Client Configuration.
    3. Click Deploy Client Configuration.
  10. Restart the Hive service.
  11. With Sentry enabled - Grant privileges on the JAR files to the roles that require access. Log in to Beeline as user hive and use the Hive SQL GRANT statement to do so. For example:
    GRANT ALL ON URI 'file:///opt/local/hive/lib/my.jar' TO ROLE EXAMPLE_ROLE
    You must also grant privilege to the JAR on HDFS:
    GRANT ALL ON URI 'hdfs:///path/to/jar' TO ROLE EXAMPLE_ROLE
  12. Run the CREATE FUNCTION command to create the UDF from the JAR file and point to the JAR file location in HDFS. For example:
    CREATE FUNCTION addfunc AS 'com.example.hiveserver2.udf.add' USING JAR 'hdfs:///path/to/jar'

Creating Temporary Functions

  1. Copy the JAR file to the host on which HiveServer2 is running. Save the JARs to any directory you choose, give the hive user read, write, and execute access to this directory, and make a note of the path (for example, /opt/local/hive/lib/).
  2. In the Cloudera Manager Admin Console, go to the Hive service.
  3. Click the Configuration tab.
  4. Expand the Hive (Service-Wide) scope.
  5. Click the Advanced category.
  6. Configure the Hive Auxiliary JARs Directory property with the HiveServer2 host path and the Hive Metastore host path from Step 1, for example /opt/local/hive/lib/. Setting this property overwrites hive.aux.jars.path, even if this variable has been previously set in the HiveServer2 advanced configuration snippet.
  7. Click Save Changes. The JARs are added to HIVE_AUX_JARS_PATH environment variable.
  8. Redeploy the Hive client configuration.
    1. In the Cloudera Manager Admin Console, go to the Hive service.
    2. From the Actions menu at the top right of the service page, select Deploy Client Configuration.
    3. Click Deploy Client Configuration.
  9. Restart the Hive service.
  10. With Sentry enabled - Grant privileges on the JAR files to the roles that require access. Log in to Beeline as user hive and use the Hive SQL GRANT statement to do so. For example:
    GRANT ALL ON URI 'file:///opt/local/hive/lib/my.jar' TO ROLE EXAMPLE_ROLE
    You must also grant privilege to the JAR on HDFS:
    GRANT ALL ON URI 'hdfs:///path/to/jar' TO ROLE EXAMPLE_ROLE
  11. Run the CREATE TEMPORARY FUNCTION command. For example:
    CREATE TEMPORARY FUNCTION addfunc AS 'com.example.hiveserver2.udf.add'

User-Defined Functions (UDFs) with HiveServer2 Using the Command Line

The following sections describe how to create permanent and temporary functions using the command line.

Creating Permanent Functions

  1. Copy the JAR file to HDFS and make sure the hive user can access this jar file.
  2. On the Beeline client machine, in /etc/hive/conf/hive-site.xml, set the hive.aux.jars.path property to a comma-separated list of the fully qualified paths to the JAR file and any dependent libraries.
    hive.aux.jars.path=file:///opt/local/hive/lib/my.jar
  3. Copy the JAR file (and its dependent libraries) to the host running HiveServer2/Impala. Make sure the hive user has read, write, and execute access to these files on the HiveServer2/Impala host.
  4. On the HiveServer2/Impala host, open /etc/default/hive-server2 and set the AUX_CLASSPATH variable to a comma-separated list of the fully qualified paths to the JAR file and any dependent libraries.
    AUX_CLASSPATH=/opt/local/hive/lib/my.jar
  5. Restart HiveServer2.
  6. If Sentry is enabled - Grant privileges on the JAR files to the roles that require access. Login to Beeline as user hive and use the Hive SQL GRANT statement to do so. For example:
    GRANT ALL ON URI 'file:///opt/local/hive/lib/my.jar' TO ROLE EXAMPLE_ROLE
    You must also grant privilege to the JAR on HDFS:
    GRANT ALL ON URI 'hdfs:///path/to/jar' TO ROLE EXAMPLE_ROLE
    If you are using Sentry policy files, you can grant the URI privilege as follows:
    udf_r = server=server1->uri=file:///opt/local/hive/lib
    udf_r = server=server1->uri=hdfs:///path/to/jar
  7. Run the CREATE FUNCTION command and point to the JAR from Hive:
    CREATE FUNCTION addfunc AS 'com.example.hiveserver2.udf.add' USING JAR 'hdfs:///path/to/jar'

Creating Temporary Functions

  1. On the Beeline client machine, in /etc/hive/conf/hive-site.xml, set the hive.aux.jars.path property to a comma-separated list of the fully qualified paths to the JAR file and any dependent libraries.
    hive.aux.jars.path=file:///opt/local/hive/lib/my.jar
  2. Copy the JAR file (and its dependent libraries) to the host running HiveServer2/Impala. Make sure the hive user has read, write, and execute access to these files on the HiveServer2/Impala host.
  3. On the HiveServer2/Impala host, open /etc/default/hive-server2 and set the AUX_CLASSPATH variable to a comma-separated list of the fully qualified paths to the JAR file and any dependent libraries.
    AUX_CLASSPATH=/opt/local/hive/lib/my.jar
  4. If Sentry is enabled - Grant privileges on the local JAR files to the roles that require access. Login to Beeline as user hive and use the Hive SQL GRANT statement to do so. For example:
    GRANT ALL ON URI 'file:///opt/local/hive/lib/my.jar' TO ROLE EXAMPLE_ROLE
    If you are using Sentry policy files, you can grant the URI privilege as follows:
    udf_r = server=server1->uri=file:///opt/local/hive/lib
  5. Restart HiveServer2.
  6. Run the CREATE TEMPORARY FUNCTION command and point to the JAR from Hive:
    CREATE TEMPORARY FUNCTION addfunc AS 'com.example.hiveserver2.udf.add'