Managing Hadoop API Dependencies in CDH 5
In CDH 3, all of the Hadoop API implementations were confined to a single JAR file (hadoop-core) plus a few of its dependencies. It was relatively straightforward to make sure that classes from these JAR files were available at runtime.
CDH 4 and CDH 5 are more complex: they bundle both MRv1 and MRv2 (YARN). To simplify things, CDH 4 and CDH 5 provide a Maven-based way of managing client-side Hadoop API dependencies that saves you from having to figure out the exact names and locations of all the JAR files needed to provide Hadoop APIs.
In CDH 5, Cloudera recommends that you use a hadoop-client artifact for all clients, instead of managing JAR-file-based dependencies manually.
Flavors of the hadoop-client Artifact
There are two different flavors of the hadoop-client artifact: a Maven-based Project Object Model (POM) artifact and a Linux package, hadoop-client. The former lets you manage Hadoop API dependencies at both compile and run time for your Maven- or Ivy-based projects; the latter provides a familiar interface in the form of a collection of JAR files that can be added to your classpath directly.
Versions of the hadoop-client Artifact
CDH Version | MRv1 Version String | YARN Version String |
---|---|---|
5.0.0 - 5.1.x | 2.3.0-mr1-cdh5.x.x | 2.3.0-cdh5.x.x |
5.2.0 - 5.3.x | 2.5.0-mr1-cdh5.x.x | 2.5.0-cdh5.x.x |
5.4.0 and higher | 2.6.0-mr1-cdh5.x.x | 2.6.0-cdh5.x.x |
Using hadoop-client for Maven-based Java Projects
Make sure you add the following dependency specification to your pom.xml file:
<dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-client</artifactId> <version>VERSION</version> <scope>provided</scope> </dependency>
See CDH Version Strings for Maven for the VERSION string.
Using hadoop-client for Ivy-based Java Projects
Make sure you add the following dependency specification to your ivy.xml file:
<dependency org="org.apache.hadoop" name="hadoop-client" rev="VERSION" conf="default->provided"/>
where the <VERSION> string can be either 2.2.0-cdh5.x.x for YARN APIs or 2.2.0-mr1-cdh5.x.x for MRv1 APIs, substituting x for the version number.
Using JAR Files Provided in the hadoop-client Package
Make sure you add to your project all of the JAR files provided under /usr/lib/hadoop/client-0.20 (for MRv1 APIs) or /usr/lib/hadoop/client (for YARN).
For example, you can add this location to the JVM classpath:
$ export CLASSPATH=/usr/lib/hadoop/client-0.20/\*