It is a software framework used for writing applications that process large amounts of data(Structured and Unstructured)
It is useful for batch processing on terabytes or petabytes of data stored in Apache Hadoop.
The Hadoop framework partitions the input dataset into independent chunks, and then, map task process a set of chunks in a parallel manner. It sorts the outputs of the maps, and then, gives the processed task as the input to the reduce tasks. File-System stores both the input and output of the job. The Hadoop framework is responsible for task scheduling, monitoring, and failed task re-execution.
PayLoad: PayLoad applications implement the two most vital functions such as the Map and Reduce functions
Mapper: It maps the input/value pair with the intermediate key/value pairs
NameNode: It is associated with HDFS
DataNode: It contains the data before the processing
MasterNode: It receives the job requests from the client, and JobTracker runs in Master node
SlaveNode: It runs both the Map and the Reduce tasks
JobTracker: allows job scheduling and reporting regarding the tracking of the jobs
TaskTracker: responsible for tracking the jobs and reporting about the job status to the JobTracker
Task: refers to the execution of the Mapper or the Reducer on a set of data
Simplicity
Scalability
Speed
Recovery
Minimal data motion