What is Apache Spark

  • Apache Spark is a fast parallel data processing Framework. It employs the concept of Resilient Distributed Dataset (RDD), which enables the Apache Spark for accumulating the data on memory. Unlike MapReduce Framework, Spark handles own cluster management, which allows the system to independently process the application. In order to increase the time efficiency of voluminous data especially for streaming data, Apache Spark stores the data as an object in distributed memory. It facilitates the HDFS read write operation in MapReduce.