Using Apache HivePDF version

Building the project and upload the JAR

You compile the UDF code into a JAR and add the JAR to the classpath on the cluster. You choose one of several methods of configuring the cluster so Hive can find the JAR.

Cloudera Base on premises
Use one of these methods to configure the cluster to find the JAR:
  • Direct reference

    Straight-forward, but recommended for development only.

  • Hive aux library directory

    Prevents accidental overwriting of files or functions. Recommended for tested, stable UDFs to prevent accidental overwriting of files or functions.

  • Reloadable aux JAR

    Avoids HiveServer restarts. Recommended if you anticipate making frequent changes to the UDF logic.

Cloudera on cloud
Use the Direct reference method only.
  1. Build the IntelliJ project.
    ...
    [INFO] Building jar: /Users/max/IdeaProjects/hiveudf/target/TypeOf-1.0-SNAPSHOT.jar
    [INFO] ------------------------------------------------------------------------
    [INFO] BUILD SUCCESS
    [INFO] ------------------------------------------------------------------------
    [INFO] Total time: 14.820 s
    [INFO] Finished at: 2019-04-03T16:53:04-07:00
    [INFO] Final Memory: 26M/397M
    [INFO] ------------------------------------------------------------------------
                        
    Process finished with exit code 0
  2. In IntelliJ, navigate to the JAR in the /target directory of the project.
  3. Configure the cluster so Hive can find the JAR using one of the following methods.
    • Direct JAR reference
      1. Upload the JAR to HDFS (Cloudera Base on premises) or S3 ( Public Cloud).
      2. Move the JAR into the Hive warehouse. For example, in Cloudera Base on premises:
        $ hdfs dfs -put TypeOf-1.0-SNAPSHOT.jar /warehouse/tablespace/managed/hiveudf-1.0-SNAPSHOT.jar
    • Hive aux JARs path (Cloudera Base on premises only)
      1. In Cloudera Base on premises, click > Clusters and select the Hive service, for example, HIVE. Click Configuration and search for Hive Auxiliary JARs Directory.
      2. Specify a directory value for the Hive aux JARs property if necessary, or make a note of the path.
      3. Upload the JAR to the specified directory on all HiveServer instances (and all Metastore instances, if separate).
    • Reloadable aux JAR (Cloudera Base on premises only)
      1. Upload the JAR to the /hadoop/hive-udf-dyn directory on all HiveServer instances (and all Metastore instances, if separate). An HDFS location is not supported.
      2. In hive-site.xml, set the following property: hive.reloadable.aux.jars.path=/hadoop/hive-udf-dyn.
  4. In IntelliJ, click Save.
  5. Click Actions > Deploy Client Configuration.
  6. Restart the Hive service. For example, restart HIVE.