Hadoop is an open source, scalable, reliable and distributed computing platform for MapReduce implementation developed by apache foundation. This framework consists of the Hadoop Distributed File System (HDFS) and MapReduce implementation. Hadoop has been the one stop solution for big data analytics. Hadoop framework is a preferred tool for major players including IBM, Google and Yahoo in storing, analyzing and managing big data. Hadoop has the ability to process large amounts of data, regardless of its structure. To solve the problem of creating web search indexes at Google, the MapReduce framework is the powerhouse behind most of today’s big data processing. The important innovation of MapReduce is the ability to take a query over a dataset, divide it, and run it in parallel over multiple nodes. Distributing the computation solves the issue of data too large to fit onto a single machine. Combine this technique with commodity Linux servers and you have a cost-effective alternative to massive computing arrays.