Set up the development environment
You can create a Hive UDF in a development environment using IntelliJ, for example, and build the UDF with Hive and Hadoop JARS that you download from your HDP 3.x cluster.
-
On your cluster, locate the hadoop-common-<version>.jar and
hive-exec-<version>.jar.
For example:
ls /usr/hdp/current/hadoop-client/hadoop-common-* |grep -v test /usr/hdp/current/hadoop-client/hadoop-common-3.1.1.3.1.0.0-78.jar # ls /usr/hdp/current/hive-server2/lib/hive-exec-* /usr/hdp/current/hive-server2/lib/hive-exec-3.1.0.3.1.0.0-78.jar
- Download the JARs to your development computer to add to your IntelliJ project later.
- Open IntelliJ and create a new Maven-based project. Click Create New Project, select Maven, and select Java version 1.8 as the Project SDK. Click Next.
-
Add archetype information.
For example:
- GroupId: com.mycompany.hiveudf
- ArtifactId: hiveudf
-
Click Next and Finish.
The generated pom.xml appears in sample-hiveudf.
-
To the pom.xml, add properties to facilitate versioning.
For example:
<properties> <hadoop.version>3.1.1.3.1.0.0-78</hadoop.version> <hive.version>3.1.0.3.1.0.0-78</hive.version> </properties>
-
In the pom.xml, define the repositories.
Use internal repositories if you do not have internet access.
<repositories> <repository> <releases> <enabled>true</enabled> <updatePolicy>always</updatePolicy> <checksumPolicy>warn</checksumPolicy> </releases> <snapshots> <enabled>false</enabled> <updatePolicy>never</updatePolicy> <checksumPolicy>fail</checksumPolicy> </snapshots> <id>HDPReleases</id> <name>HDP Releases</name> <url>http://repo.hortonworks.com/content/repositories/releases/</url> <layout>default</layout> </repository> <repository> <id>public.repo.hortonworks.com</id> <name>Public Hortonworks Maven Repo</name> <url>http://repo.hortonworks.com/content/groups/public/</url> <snapshots> <enabled>false</enabled> </snapshots> </repository> </repositories>
-
Define dependencies.
For example:
<dependencies> <dependency> <groupId>org.apache.hive</groupId> <artifactId>hive-exec</artifactId> <version>${hive.version}</version> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-common</artifactId> <version>${hadoop.version}</version> </dependency> </dependencies>
- Select File > Project Structure. Click Modules. On the Dependencies tab, click + to add JARS or directories. Browse to and select the JARs you downloaded in step 1.