What is Hadoop?

Imagine the scenario:

You have 1GB of data that you need to process.
The data are stored in a relational database in your desktop computer and this desktop computer
has no problem handling this load.

Then your data grows to 10GB.
And then 100GB.
And you start to reach the limits of your current desktop computer.
So you scale-up by investing in a larger computer, and you are then OK for a few more months.

When your data grows to 10TB, and then 100TB.
And you are fast approaching the limits of that computer.

Moreover, you are now asked to feed your application with unstructured data coming from sources
like Facebook, Twitter, sensors, and so on.
Your want to derive information from both the relational data and the unstructured
data, and want this information as soon as possible.
What should you do? Hadoop may be the answer!

Hadoop is an open source project of the Apache Foundation.
It is a framework written in Java originally developed by Doug Cutting who named it after his
son’s toy elephant.

Hadoop uses Google’s MapReduce and Google File System technologies as its foundation.
It is optimized to handle massive quantities of data which could be structured, unstructured.

Hadoop replicates its data across different computers, so that if one goes down, the data are
processed on one of the replicated computers.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s