Installing Pig
To install Pig On Red Hat-compatible systems:
$ sudo yum install pig
To install Pig on SLES systems:
$ sudo zypper install pig
To install Pig on Ubuntu and other Debian systems:
$ sudo apt-get install pig
Note:
Pig automatically uses the active Hadoop configuration (whether standalone, pseudo-distributed mode, or distributed). After installing the Pig package, you can start Pig.
To start Pig in interactive mode (YARN)
Important:
- For each user who will be submitting MapReduce jobs using
MapReduce v2 (YARN), or running Pig, Hive, or Sqoop in a YARN installation, make sure
that the HADOOP_MAPRED_HOME environment
variable is set correctly, as follows:
$ export HADOOP_MAPRED_HOME=/usr/lib/hadoop-mapreduce
- For each user who will be submitting MapReduce jobs using
MapReduce v1 (MRv1), or running Pig, Hive, or Sqoop in an MRv1 installation, set the
HADOOP_MAPRED_HOME environment
variable as follows:
$ export HADOOP_MAPRED_HOME=/usr/lib/hadoop-0.20-mapreduce
To start Pig, use the following command.
$ pig
To start Pig in interactive mode (MRv1)
Use the following command:
$ pig
You should see output similar to the
following:
2012-02-08 23:39:41,819 [main] INFO org.apache.pig.Main - Logging error messages to: /home/arvind/pig-0.11.0-cdh5b1/bin/pig_1328773181817.log 2012-02-08 23:39:41,994 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://localhost/ ... grunt>
Examples
To verify that the input and output directories from the YARN or MRv1 example grep job exist, list an HDFS directory from the
Grunt Shell:
grunt> ls hdfs://localhost/user/joe/input <dir> hdfs://localhost/user/joe/output <dir>
To run a grep example job using Pig for
grep inputs:
grunt> A = LOAD 'input'; grunt> B = FILTER A BY $0 MATCHES '.*dfs[a-z.]+.*'; grunt> DUMP B;
Note:
To check the status of your job while it is running, look at the ResourceManager web console (YARN) or JobTracker web console (MRv1).
<< Upgrading Pig | Using Pig with HBase >> | |