How To Run Hadoop MapReduce Application from Eclipse

Today I learnt, how to run a Hadoop MapReduce Application from Eclipse.This is very useful for those who are not used to CLI.

There are two methods for running the MapReduce Application from Eclipse:

1)By using the MapReduce plugin.

But the plugin is not availabel for the Hadoop 2.4.1 but is availabel for the older versions.

Step 1: Installation

Download and copy the jar file of the plugin and paste it to the ‘plugins’ directory of your IDE.

Step 2: Configuration

 

  1. Goto Window –> Open Perspective –> Other and select ‘Map/Reduce’ perspective.

  2. Click ‘New Hadoop location…’ (Blue Elephant icon) and define Hadoop location to run MapReduce applications. Click ‘Finish’ button.

1

Map/Reduce(V2) Master:
Address of the Map/Reduce master node (The Job Tracker).
DFS Master:
Address of the Distributed FileSystem Master node (The Name Node).

     Now we can browse the Hadoop file system and perform different files / folder operations using the GUI only.

2

Also, we can easily create Map/Reduce Project, Mapper, Reducer and MapReduce Driver using the wizard (File –> New –> Other… –> Map/Reduce) and jump into Hadoop programming.

4

 

2)By using the .jar files.

Step 1: Create New Java Project

new_project

Step 2: Add Dependencies JARs

Right click on project properties and select Java build path.

Add all jars from $HADOOP_HOME/lib and $HADOOP_HOME.

hadoop_lib

hadoop_lib2

Step 3: Create Mapper

import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
public class MaxTemperatureMapper extends
 Mapper<LongWritable, Text, Text, IntWritable> {
private static final int MISSING = 9999;
@Override
 public void map(LongWritable key, Text value, Context context)
 throws IOException, InterruptedException {
String line = value.toString();
 String year = line.substring(15, 19);
 int airTemperature;
 if (line.charAt(87) == '+') { // parseInt doesn't like leading plus
 // signs
 airTemperature = Integer.parseInt(line.substring(88, 92));
 } else {
 airTemperature = Integer.parseInt(line.substring(87, 92));
 }
 String quality = line.substring(92, 93);
 if (airTemperature != MISSING && quality.matches("[01459]")) {
 context.write(new Text(year), new IntWritable(airTemperature));
 }
 }
}

Step 4: Create Reducer

import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
public class MaxTemperatureReducer
extends Reducer<Text, IntWritable, Text, IntWritable> {
@Override
public void reduce(Text key, Iterable<IntWritable> values,
 Context context)
 throws IOException, InterruptedException {

 int maxValue = Integer.MIN_VALUE;
 for (IntWritable value : values) {
 maxValue = Math.max(maxValue, value.get());
 }
 context.write(key, new IntWritable(maxValue));
}
}

Step 5: Create Driver for MapReduce Job

import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
/*This class is responsible for running map reduce job*/
public class MaxTemperatureDriver extends Configured implements Tool{
public int run(String[] args) throws Exception
 {

 if(args.length !=2) {
 System.err.println("Usage: MaxTemperatureDriver <input path> <outputpath>");
 System.exit(-1);
 }

 Job job = new Job();
 job.setJarByClass(MaxTemperatureDriver.class);
 job.setJobName("Max Temperature");

 FileInputFormat.addInputPath(job, new Path(args[0]));
 FileOutputFormat.setOutputPath(job,new Path(args[1]));

 job.setMapperClass(MaxTemperatureMapper.class);
 job.setReducerClass(MaxTemperatureReducer.class);

 job.setOutputKeyClass(Text.class);
 job.setOutputValueClass(IntWritable.class);

 System.exit(job.waitForCompletion(true) ? 0:1); 
 boolean success = job.waitForCompletion(true);
 return success ? 0 : 1;
 }
public static void main(String[] args) throws Exception {
 MaxTemperatureDriver driver = new MaxTemperatureDriver();
 int exitCode = ToolRunner.run(driver, args);
 System.exit(exitCode);
 }
}

Step 6: Supply Input and Output

We need to supply input file that will be used during Map phase and the final output will be generated in output directory by Reduct task. Edit Run Configuration and supply command line arguments. sample.txt reside in the project root.  Your project explorer should contain following

project_explorer

input_ourput

Step 7: Map Reduce Job Execution

mapred_output

Step 8: Final Output

If you managed to come this far, Once the job is complete, it will create output directory with _SUCCESS and part_nnnnn , double click to view it in eclipse editor and you will see we have supplied 5 rows of weather data (downloaded from NCDC  weather) and we wanted to find out the maximum temperature in a given year from input file and the output will contain 2 rows with max temperature in (Centigrade) for each supplied year

1949 111 (11.1 C)
1950 22 (2.2 C)

output

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s