Setting up Pig


The following are the prerequisites for setting up Pig and running Pig scripts.

  • You should have the latest stable build of Hadoop up and running, to install hadoop, please check my previous blog article on Hadoop Setup.

  1. Download a stable version of Pig file from apache download mirrors,  For this tutorial we are using pig-0.11.1,this release works with Hadoop 0.20.X, 1.X, 0.23.X and 2.X


2. Copy the pig binaries into the /usr/local/pig directory.

cp -r pig-0.11.1.tar.gz /usr/local/pig

3. Change the directory to /usr/local/pig by using this command

cd /usr/local/pig

4. Unpack the compressed pig, in the directory /usr/local/pig

sudo tar xvzf pig-0.11.1.tar.gz

pig2 pig3

5. set PIG_HOME in $HOME/.bashrc so it will be set every time you login. Add the following line to it.

export PIG_HOME=<path_to_pig_home_directory>

export PIG_HOME='/usr/local/pig/pig-0.11.1'


6. Set the environment variable JAVA_HOME to point to the Java installation directory, which Pig uses internally.

export JAVA_HOME=<<Java_installation_directory>>

Execution Modes

Pig has two modes of execution – local mode and MapReduce mode.

Local Mode

Local mode is usually used to verify and debug Pig queries and/or scripts on smaller datasets which a single machine could handle. It runs on a single JVM and access the local filesystem.

To run in local mode, please pass the following command:

$ pig -x local 


MapReduce Mode

This is the default mode Pig translates the queries into MapReduce jobs, which requires access to a Hadoop cluster.

$ pig

2013-10-28 11:39:44,767 [main] INFO  org.apache.pig.Main – Apache Pig version 0.11.1 (r1459641) compiled Mar 22 2013, 02:13:53

2013-10-28 11:39:44,767 [main] INFO  org.apache.pig.Main – Logging error messages to: /home/hduser/pig_1382985584762.log

2013-10-28 11:39:44,797 [main] INFO  org.apache.pig.impl.util.Utils – Default bootup file /home/hduser/.pigbootup not found

2013-10-28 11:39:45,094 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine – Connecting to hadoop file system at: hdfs://Hadoopmaster:54310

2013-10-28 11:39:45,592 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine – Connecting to map-reduce job tracker at: Hadoopmaster:54311



You can see the log reports from Pig stating the filesystem and jobtracker it connected to. Grunt is an interactive shell for your Pig queries.


