Setting up Pig

Prerequisites

The following are the prerequisites for setting up Pig and running Pig scripts.

  • You should have the latest stable build of Hadoop up and running, to install hadoop, please check my previous blog article on Hadoop Setup.

Setting up Pig

Procedure

  1. Download a stable version of Pig file from apache download mirrors,  For this tutorial we are using pig-0.11.1,this release works with Hadoop 0.20.X, 1.X, 0.23.X and 2.X
wget http://apache.mirrors.hoobly.com/pig/pig-0.11.1/pig-0.11.1.tar.gz

pig1

2. Copy the pig binaries into the /usr/local/pig directory.

cp -r pig-0.11.1.tar.gz /usr/local/pig

3. Change the directory to /usr/local/pig by using this command

cd /usr/local/pig

4. Unpack the compressed pig, in the directory /usr/local/pig

sudo tar xvzf pig-0.11.1.tar.gz

pig2 pig3

5. set PIG_HOME in $HOME/.bashrc so it will be set every time you login. Add the following line to it.

export PIG_HOME=<path_to_pig_home_directory>

e.g.
export PIG_HOME='/usr/local/pig/pig-0.11.1'
export PATH=$HADOOP_HOME/bin:$PIG_HOME/bin:$JAVA_HOME/bin:$PATH

pig4

6. Set the environment variable JAVA_HOME to point to the Java installation directory, which Pig uses internally.

export JAVA_HOME=<<Java_installation_directory>>

Execution Modes

Pig has two modes of execution – local mode and MapReduce mode.

Local Mode

Local mode is usually used to verify and debug Pig queries and/or scripts on smaller datasets which a single machine could handle. It runs on a single JVM and access the local filesystem.

To run in local mode, please pass the following command:

$ pig -x local 
grunt>

pig5

MapReduce Mode

This is the default mode Pig translates the queries into MapReduce jobs, which requires access to a Hadoop cluster.

$ pig

2013-10-28 11:39:44,767 [main] INFO  org.apache.pig.Main – Apache Pig version 0.11.1 (r1459641) compiled Mar 22 2013, 02:13:53

2013-10-28 11:39:44,767 [main] INFO  org.apache.pig.Main – Logging error messages to: /home/hduser/pig_1382985584762.log

2013-10-28 11:39:44,797 [main] INFO  org.apache.pig.impl.util.Utils – Default bootup file /home/hduser/.pigbootup not found

2013-10-28 11:39:45,094 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine – Connecting to hadoop file system at: hdfs://Hadoopmaster:54310

2013-10-28 11:39:45,592 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine – Connecting to map-reduce job tracker at: Hadoopmaster:54311

grunt>

pig6

You can see the log reports from Pig stating the filesystem and jobtracker it connected to. Grunt is an interactive shell for your Pig queries.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s