Installing Pig
To install Pig On RHEL-compatible systems:
$ sudo yum install pig
To install Pig on SLES systems:
$ sudo zypper install pig
To install Pig on Ubuntu and other Debian systems:
$ sudo apt-get install pig
To start Pig in interactive mode (MRv1)
Use the following command:
$ pig
You should see output similar to the following:
2012-02-08 23:39:41,819 [main] INFO org.apache.pig.Main - Logging error messages to: /home/user/pig-0.11.0-cdh5b1/bin/pig_1328773181817.log 2012-02-08 23:39:41,994 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://hostname:8020 ... grunt>
Examples
If you don't already have sample data, create a file and load it to HDFS. For example:
- Create the file hostlist and enter the following data:
daily03.acme.com,123221991 daily04.acme.com,120222101 daily05.acme.com,119220077 fixed01.best.com,218880024 daily03.best.com,234320024
- Load hostlist to a user directory in HDFS, in this case the user cloudera.
$ hadoop fs -copyFromLocal hostlist /user/cloudera
At the Grunt shell, list the HDFS directory:
grunt> ls hdfs://hostname:8020/user/cloudera hdfs://hostname:8020/user/cloudera/hostlist
To run a grep example job using Pig for grep inputs:
grunt> A = LOAD 'hostlist' AS (host:chararray, capacity:int); DUMP A; (daily03.acme.com,123221991) (daily04.acme.com,120222101) (daily05.acme.com,119220077) (fixed01.best.com,218880024) (fixed02.best.com,234320024) grunt> B = FILTER A BY $0 MATCHES '.*best.*'; grunt> DUMP B; (fixed01.best.com,218880024) (daily03.best.com,234320024)