This is the documentation for CDH 5.1.x. Documentation for other versions is available at Cloudera Documentation.

Accessing Table Data with MapReduce

You can download an example of a MapReduce program that reads from the groups table (consisting of data from /etc/group), extracts the first and third columns, and inserts them into the groupids table. Proceed as follows.
  1. Download the program from https://github.com/cloudera/hcatalog-examples.git.
  2. Build the example JAR file:
    $ cd hcatalog-examples
    $ mvn package
  3. Load data from the local file system into the groups table:
    $ hive -e "load data local inpath '/etc/group' overwrite into table groups"
  4. Set up the environment that is needed for copying the required JAR files to HDFS, for example:
    $ export HCAT_HOME=/usr/lib/hive-hcatalog
    $ export HIVE_HOME=/usr/lib/hive
    $ HIVE_VERSION=0.11.0-cdh5.0.0
    $ HCATJAR=$HCAT_HOME/share/hcatalog/hcatalog-core-$HIVE_VERSION.jar
    $ HCATPIGJAR=$HCAT_HOME/share/hcatalog/hcatalog-pig-adapter-$HIVE_VERSION.jar
    $ export HADOOP_CLASSPATH=$HCATJAR:$HCATPIGJAR:$HIVE_HOME/lib/hive-exec-$HIVE_VERSION.jar\
    :$HIVE_HOME/lib/hive-metastore-$HIVE_VERSION.jar:$HIVE_HOME/lib/jdo-api-*.jar:$HIVE_HOME/lib/libfb303-*.jar\
    :$HIVE_HOME/lib/libthrift-*.jar:$HIVE_HOME/lib/slf4j-api-*.jar:$HIVE_HOME/conf:/etc/hadoop/conf
    $ LIBJARS=`echo $HADOOP_CLASSPATH | sed -e 's/:/,/g'`
    $ export LIBJARS=$LIBJARS,$HIVE_HOME/lib/antlr-runtime-*.jar
      Note: You can find current version numbers for CDH dependencies in CDH's root pom.xml file for the current release, for example cdh-root-5.0.0.pom.)
  5. Run the job:
    $ hadoop jar target/UseHCat-1.0.jar com.cloudera.test.UseHCat -files $HCATJAR -libjars $LIBJARS groups groupids
Page generated September 3, 2015.