Research Area:  Big Data
As new data and updates are constantly arriving, the results of data mining applications become stale and obsolete over time. Incremental processing is a promising approach to refresh mining results. It utilizes previously saved states to avoid the expense of re-computation from scratch. In this paper, we propose i 2 MapReduce, a novel incremental processing extension to MapReduce. Compared with the state-of-the-art work on Incoop, i 2 MapReduce (i) performs key-value pair level incremental processing rather than task level re-computation, (ii) supports not only one-step computation but also more sophisticated iterative computation, and (iii) incorporates a set of novel techniques to reduce I/O overhead for accessing preserved fine-grain computation states. Experimental results on Amazon EC2 show significant performance improvements of i 2 MapReduce compared to both plain and iterative MapReduce performing re-computation.
Keywords:  
Author(s) Name:  Yanfeng Zhang; Shimin Chen; Qiang Wang and Ge Yu
Journal name:  
Conferrence name:  IEEE 32nd International Conference on Data Engineering (ICDE)
Publisher name:  IEEE
DOI:  10.1109/ICDE.2016.7498385
Volume Information:  
Paper Link:   https://ieeexplore.ieee.org/document/7498385