Accessing Table Data with MapReduce
You can download an example of a MapReduce program that reads from
the groups table (consisting of data from /etc/group), extracts the
first and third columns, and inserts them into the groupids table. Proceed as follows.
- Download the program from https://github.com/cloudera/hcatalog-examples.git.
- Build the example JAR
file:
$ cd hcatalog-examples $ mvn package
- Load data from the local file system into the groups
table:
$ hive -e "load data local inpath '/etc/group' overwrite into table groups"
- Set up the environment that is needed for copying the
required JAR files to HDFS, for example:
$ export HCAT_HOME=/usr/lib/hive-hcatalog $ export HIVE_HOME=/usr/lib/hive $ HIVE_VERSION=0.11.0-cdh5.0.0 $ HCATJAR=$HCAT_HOME/share/hcatalog/hcatalog-core-$HIVE_VERSION.jar $ HCATPIGJAR=$HCAT_HOME/share/hcatalog/hcatalog-pig-adapter-$HIVE_VERSION.jar $ export HADOOP_CLASSPATH=$HCATJAR:$HCATPIGJAR:$HIVE_HOME/lib/hive-exec-$HIVE_VERSION.jar\ :$HIVE_HOME/lib/hive-metastore-$HIVE_VERSION.jar:$HIVE_HOME/lib/jdo-api-*.jar:$HIVE_HOME/lib/libfb303-*.jar\ :$HIVE_HOME/lib/libthrift-*.jar:$HIVE_HOME/lib/slf4j-api-*.jar:$HIVE_HOME/conf:/etc/hadoop/conf $ LIBJARS=`echo $HADOOP_CLASSPATH | sed -e 's/:/,/g'` $ export LIBJARS=$LIBJARS,$HIVE_HOME/lib/antlr-runtime-*.jar
Note: You can find current version numbers for CDH dependencies in CDH's root pom.xml file for the current release, for example cdh-root-5.0.0.pom.) - Run the
job:
$ hadoop jar target/UseHCat-1.0.jar com.cloudera.test.UseHCat -files $HCATJAR -libjars $LIBJARS groups groupids
<< Accessing Table Information with the HCatalog Command-line API | Accessing Table Data with Pig >> | |