List of Topics:
Research Breakthrough Possible @S-Logix pro@slogix.in

Office Address

Social List

How does the MapReduce work in Hadoop?

MapReduce Working

Working of MapReduce in Hadoop

  • Description:
    MapReduce is a programming model and processing framework designed to process large-scale data across distributed systems like Hadoop. It divides a task into smaller sub-tasks, processes them in parallel, and aggregates the results.
Steps in MapReduce
  • Objective:
    Count the frequency of words in a dataset.
  • Input:
    Two lines of text:
    Line 1: Hadoop MapReduce
    Line 2: MapReduce Processes Big Data
  • Steps:
    • Step 1: Input Splitting
      Input data is split into chunks based on the InputFormat (default: TextInputFormat). Each chunk is processed independently by a mapper.
      Input split:
      Line 1: ("Hadoop", 1), ("MapReduce", 1)
      Line 2: ("MapReduce", 1), ("Processes", 1), ("Big", 1), ("Data", 1)
    • Step 2: Mapping
      Each input split is processed by the Mapper class, generating intermediate key-value pairs, which are written to the local disk.
    • Step 3: Shuffling and Sorting
      Intermediate data is grouped and sorted by key.
      Grouped data:
      Hadoop: [1]
      MapReduce: [1, 1]
      Processes: [1]
      Big: [1]
      Data: [1]
    • Step 4: Reducing
      Reducers aggregate values for each key, generating the final output.
    • Step 5: Output Storage
      The output is stored in HDFS in the specified format.
      Output:
      ("Hadoop", 1), ("MapReduce", 2), ("Processes", 1), ("Big", 1), ("Data", 1)