This is the documentation for CDH 5.0.x. Documentation for other versions is available at Cloudera Documentation.

Accessing Table Data with MapReduce

You can download an example of a MapReduce program that reads from the groups table (consisting of data from /etc/group), extracts the first and third columns, and inserts them into the groupids table. Proceed as follows.

Download the program from https://github.com/cloudera/hcatalog-examples.git.
Build the example JAR file:
```
$ cd hcatalog-examples
$ mvn package
```
Load data from the local file system into the groups table:
```
$ hive -e "load data local inpath '/etc/group' overwrite into table groups"
```

Set up the environment that is needed for copying the required JAR files to HDFS, for example:

$ export HCAT_HOME=/usr/lib/hive-hcatalog
$ export HIVE_HOME=/usr/lib/hive
$ HIVE_VERSION=0.11.0-cdh5.0.0
$ HCATJAR=$HCAT_HOME/share/hcatalog/hcatalog-core-$HIVE_VERSION.jar
$ HCATPIGJAR=$HCAT_HOME/share/hcatalog/hcatalog-pig-adapter-$HIVE_VERSION.jar
$ export HADOOP_CLASSPATH=$HCATJAR:$HCATPIGJAR:$HIVE_HOME/lib/hive-exec-$HIVE_VERSION.jar\
:$HIVE_HOME/lib/hive-metastore-$HIVE_VERSION.jar:$HIVE_HOME/lib/jdo-api-*.jar:$HIVE_HOME/lib/libfb303-*.jar\
:$HIVE_HOME/lib/libthrift-*.jar:$HIVE_HOME/lib/slf4j-api-*.jar:$HIVE_HOME/conf:/etc/hadoop/conf
$ LIBJARS=`echo $HADOOP_CLASSPATH | sed -e 's/:/,/g'`
$ export LIBJARS=$LIBJARS,$HIVE_HOME/lib/antlr-runtime-*.jar

Note: You can find current version numbers for CDH dependencies in CDH's root pom.xml file for the current release, for example cdh-root-5.0.0.pom.)

Run the job:

$ hadoop jar target/UseHCat-1.0.jar com.cloudera.test.UseHCat -files $HCATJAR -libjars $LIBJARS groups groupids

Page generated September 3, 2015.