How does the MapReduce work in Hadoop

  • 1. Client submits the MapReduce job to the JobTracker
  • 2. The JobTracker accepts the job from clients and schedules the map and reduce tasks on TaskTrackers to perform the work.
  • 3. The map task processes the input as key-value pairs and produces the intermediate key-value pair as an output to the reducer class.
  • 4. The reduce task collects the intermediate key value pairs belongs to the multiple map tasks and combines all intermediate values for particular key. The output of Reducer is a set of merged output values. The outputs are written back to HDFS.
  • //mapper class
    map(key1, value1) // map input
    //Process the map function
    map(key2, value2) // map output
    //Reducer class
    reduce(key2, list) // reduce input
    for each(value in value2)
    //process the reduce function
    end for
    reduce(key2, value3) // reduce output