Hadoop: Hadoop software is a framework that permits for the distributed processing of huge data sets across clusters of computers using simple programming models. In simple terms, Hadoop is a framework for processing ‘Big Data’. Hadoop was created by Doug Cutting.it was also created by Mike Cafarella. It is designed to divide from single servers to thousands of machines, each having local computation and storage. Hadoop is an open-source software. The core of Apache Hadoop consists of a storage part, known as the Hadoop Distributed File System(HDFS), and a processing part which may be a Map-Reduce programming model. Hadoop splits files into large blocks and distributes them across nodes during a cluster. It then transfers packaged code into nodes to process the info in parallel.
Mapreduce: MapReduce is a programming model that is used for processing and generating large data sets on clusters of computers. It was introduced by Google. Mapreduce is a concept or a method for large scale parallelization.It is inspired by functional programming’s map()
and reduce()
functions.
MapReduce program is executed in three stages they are:
- Mapping: Mapper’s job is to process input data.Each node applies the map function to the local data.
- Shuffle: Here nodes are redistributed where data is based on the output keys.(output keys are produced by map function).
- Reduce: Nodes are now processed into each group of output data, per key in parallel.
Below is a table of differences between Hadoop and MapReduce:
Based on | Hadoop | MapReduce |
---|---|---|
Definition | The Apache Hadoop is a software that allows all the distributed processing of large data sets across clusters of computers using simple programming | MapReduce is a programming model which is an implementation for processing and generating big data sets with distributed algorithm on a cluster. |
Meaning | The name “Hadoop” was the named after Doug cutting’s son’s toy elephant. He named this project as “Hadoop” as it was easy to pronounce it. | The “MapReduce” name came into existence as per the functionality itself of mapping and reducing in key-value pairs. |
Framework | Hadoop not only has storage framework which stores the data but creating name node’s and data node’s it also has other frameworks which include MapReduce itself. | MapReduce is a programming framework which uses a key, value mappings to sort/process the data |
Invention | Hadoop was created by Doug Cutting and Mike Cafarella. | Mapreduce is invented by Google. |
Features |
|
|
Concept | The Apache Hadoop is an eco-system which provides an environment which is reliable, scalable and ready for distributed computing. | MapReduce is a submodule of this project which is a programming model and is used to process huge datasets which sits on HDFS (Hadoop distributed file system). |
Language | Hadoop is a collection of all modules and hence may include other programming/scripting languages too | MapReduce is basically written in Java programming language |
Pre-requisites | Hadoop runs on HDFS (Hadoop Distributed File System) | MapReduce can run on HDFS/GFS/NDFS or any other distributed system for example MapR-FS |