Accessing Table Data with MapReduce
You can download an example of a MapReduce program that reads from the groups table (consisting of data from /etc/group), extracts the first and third
columns, and inserts them into the groupids table. Proceed as follows.
- Download the program from https://github.com/cloudera/hcatalog-examples.git.
- Build the example JAR file:
$ cd hcatalog-examples $ mvn package
- Load data from the local file system into the groups table:
$ hive -e "load data local inpath '/etc/group' overwrite into table groups"
- Set up the environment that is needed for copying the required JAR files to HDFS, for example:
$ export HCAT_HOME=/usr/lib/hive-hcatalog $ export HIVE_HOME=/usr/lib/hive $ HIVE_VERSION=0.11.0-cdh5.0.0 $ HCATJAR=$HCAT_HOME/share/hcatalog/hcatalog-core-$HIVE_VERSION.jar $ HCATPIGJAR=$HCAT_HOME/share/hcatalog/hcatalog-pig-adapter-$HIVE_VERSION.jar $ export HADOOP_CLASSPATH=$HCATJAR:$HCATPIGJAR:$HIVE_HOME/lib/hive-exec-$HIVE_VERSION.jar\ :$HIVE_HOME/lib/hive-metastore-$HIVE_VERSION.jar:$HIVE_HOME/lib/jdo-api-*.jar:$HIVE_HOME/lib/libfb303-*.jar\ :$HIVE_HOME/lib/libthrift-*.jar:$HIVE_HOME/lib/slf4j-api-*.jar:$HIVE_HOME/conf:/etc/hadoop/conf $ LIBJARS=`echo $HADOOP_CLASSPATH | sed -e 's/:/,/g'` $ export LIBJARS=$LIBJARS,$HIVE_HOME/lib/antlr-runtime-*.jar
- Run the job:
$ hadoop jar target/UseHCat-1.0.jar com.cloudera.test.UseHCat -files $HCATJAR -libjars $LIBJARS groups groupids