Configuring Hive Virtual Warehouses to cache UDF JARs

After you write and compile your User Defined Function (UDF) code into a Java Archive (JAR) file, you can configure a Hive Virtual Warehouse to cache the UDF JAR in HiveServer (HS2) in Cloudera Data Warehouse (CDW).

UDFs enable you to create custom functions to process records or groups of records. Although Hive provides a comprehensive library of functions, there are gaps for which UDFs are a good solution.

In this task, you configure the Hive Virtual Warehouse to cache the JAR file for quick access by the Virtual Warehouse. After configuring the Virtual Warehouse for caching, the UDF JAR is downloaded from the object store the first time it is called, and then cached. Subsequent calls to the JAR are answered from the cache.

Configuring caching significantly improves performance for queries that use the UDF. Without caching, loading a very large UDF of several hundred MBs can take up to several minutes for each query.

Required role: DWAdmin

  • Create a user-defined function
    • Write, compile, and export your UDF code to a JAR file.
    • Upload the UDF JAR file to a bucket or container on AWS or Azure, respectively.
  1. In the CDW UI on the Overview page, locate the Hive Virtual Warehouse that uses the bucket or container where you placed the UDF JAR file, and click Edit.
  2. In Details, click CONFIGURATIONS > HiveServer2.
  3. Select hive-site from the drop-down list, and click +.
  4. In the Add Custom Configurations, add the following configuration information, and then click ADD:
    hive.server2.udf.cache.enabled = true
  5. Click APPLY.
  6. Verify that the configuration property and setting have been added by searching for hive.server2.udf.cache.enabled in the search box. If the property has been added, the property name is displayed in the KEY column of the table.