Configuring UDF JAR caching in Hive Virtual Warehouse

After you write and compile your User Defined Function (UDF) code into a Java Archive (JAR) file, you can configure a Hive Virtual Warehouse to cache the UDF JAR in HiveServer (HS2) in Cloudera Data Warehouse.

UDFs enable you to create custom functions to process records or groups of records. Although Hive provides a comprehensive library of functions, there are gaps for which UDFs are a good solution.

In this task, you configure the Hive Virtual Warehouse to cache the JAR file for quick access by the Virtual Warehouse. After configuring the Virtual Warehouse for caching, the UDF JAR is downloaded from the object store the first time it is called, and then cached. Subsequent calls to the JAR are answered from the cache.

Configuring caching significantly improves performance for queries that use the UDF. Without caching, loading a very large UDF of several hundred MBs can take up to several minutes for each query.

Create a user-defined function.
Write, compile, and export your UDF code to a JAR file.
Upload the UDF JAR file to a bucket or container on AWS or Azure, respectively.

Log in to the Cloudera Data Warehouse service as DWAdmin.
Go to the Virtual Warehouses tab, locate the Hive Virtual Warehouse that uses the bucket or container where you placed the UDF JAR file, and click > Details.
The Virtual Warehouse Details page is displayed.
Go to Configurations > HiveServer2.
Select hive-site from the Configuration files drop-down menu, and click Add Custom Configuration.
The Custom Configuration modal is displayed.
Add the following configuration information, and then click Add:
- Configuration Key: hive.server2.udf.cache.enabled
- Configuration Value: true
Click Apply Changes.
Verify that the configuration property and setting have been added by searching for hive.server2.udf.cache.enabled in the search box. If the property has been added, the property name is displayed in the KEY column of the table.