Managing Hadoop API Dependencies in CDH 6

In CDH 5, Cloudera bundled both MRv1 and MRv2 (YARN). To simplify things, CDH 5 and higher provided a Maven-based way of managing client-side Hadoop API dependencies that saved you from having to figure out the exact names and locations of all the JAR files needed to provide Hadoop APIs.

In CDH 6, the client dependencies are simplified because MRv1 is no longer supported. Cloudera recommends that you use a hadoop-client artifact for all clients, instead of managing JAR-file-based dependencies manually.

Flavors of the hadoop-client Artifact

There are two different flavors of the hadoop-client artifact: a Maven-based Project Object Model (POM) artifact and a Linux package, hadoop-client. The former lets you manage Hadoop API dependencies at both compile and run time for your Maven- or Ivy-based projects; the latter provides a familiar interface in the form of a collection of JAR files that can be added to your classpath directly.

Versions of the hadoop-client Artifact

If you're using the Maven-based POM hadoop-client artifact, use the following version strings:

CDH Version Strings for Maven
CDH Version Version String
6.0.x 3.0.0-cdh6.0.x

Replace 6.0.x with the CDH version number.

Using hadoop-client for Maven-based Java Projects

Make sure you add the following dependency specification to your pom.xml file:

  <dependency>
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-client</artifactId>
    <version><version_string></version>
    <scope>provided</scope>
  </dependency>

See CDH Version Strings for Maven for the <version_string> string.

Using hadoop-client for Ivy-based Java Projects

Make sure you add the following dependency specification to your ivy.xml file:

  <dependency org="org.apache.hadoop" name="hadoop-client" rev="<version_string>" conf="default->provided"/>

See CDH Version Strings for Maven for the <version_string> string.

Using JAR Files Provided in the hadoop-client Package

Make sure you add to your project all of the JAR files provided under /usr/lib/hadoop/client.

For example, you can add this location to the JVM classpath as follows:

export CLASSPATH=/usr/lib/hadoop/client/\*