Research Area:  Big Data
A new approach for big data forecasting based on the k-weighted nearest neighbours algorithm is introduced in this work. Such an algorithm has been developed for distributed computing under the Apache Spark framework. Every phase of the algorithm is explained in this work, along with how the optimal values of the input parameters required for the algorithm are obtained. In order to test the developed algorithm, a Spanish energy consumption big data time series has been used. The accuracy of the prediction has been assessed showing remarkable results. Additionally, the optimal configuration of a Spark cluster has been discussed. Finally, a scalability analysis of the algorithm has been conducted leading to the conclusion that the proposed algorithm is highly suitable for big data environments.
Keywords:  
Author(s) Name:  R. Talavera-Llames,R. Pérez-Chacón,A. Troncoso and F. Martínez-Álvarez
Journal name:  Knowledge-Based Systems
Conferrence name:  
Publisher name:  ELSEVIER
DOI:  10.1016/j.knosys.2018.07.026
Volume Information:  Volume 161, 1 December 2018, Pages 12-25
Paper Link:   https://www.sciencedirect.com/science/article/abs/pii/S0950705118303770