Amazing technological breakthrough possible @S-Logix pro@slogix.in

Office Address

  • #5, First Floor, 4th Street Dr. Subbarayan Nagar Kodambakkam, Chennai-600 024 Landmark : Samiyar Madam
  • pro@slogix.in
  • +91- 81240 01111

Social List

Apache Hadoop MapReduce - Hadoop Open Source Code

Apache Hadoop MapReduce

  • It is a software framework used for writing applications that process large amounts of data(Structured and Unstructured)

  • It is useful for batch processing on terabytes or petabytes of data stored in Apache Hadoop.

Map and Reduce

  • The Hadoop framework partitions the input dataset into independent chunks, and then, map task process a set of chunks in a parallel manner. It sorts the outputs of the maps, and then, gives the processed task as the input to the reduce tasks. File-System stores both the input and output of the job. The Hadoop framework is responsible for task scheduling, monitoring, and failed task re-execution.

The Common Terms used in the MapReduce framework

  • PayLoad: PayLoad applications implement the two most vital functions such as the Map and Reduce functions

  • Mapper: It maps the input/value pair with the intermediate key/value pairs

  • NameNode: It is associated with HDFS

  • DataNode: It contains the data before the processing

  • MasterNode: It receives the job requests from the client, and JobTracker runs in Master node

  • SlaveNode: It runs both the Map and the Reduce tasks

  • JobTracker: allows job scheduling and reporting regarding the tracking of the jobs

  • TaskTracker: responsible for tracking the jobs and reporting about the job status to the JobTracker

  • Task: refers to the execution of the Mapper or the Reducer on a set of data

Features

  • Simplicity

  • Scalability

  • Speed

  • Recovery

  • Minimal data motion