Research Area:  Big Data
Recent studies have reported that big data analytics clusters, such as Hadoop, can create substantial power peaks, bringing instability and inflexibility issues to the power grid. Substantial power peaks also lead to high penalty charges from electric utility companies, accounting for more than 30 percent of the electricity bill for a cluster operator according to empirical studies. To this end, in this paper, we present a framework that schedules computing jobs in large-scale data analytics clusters to mitigate power peaks. The scheduling model captures important properties of modern distributed data analytics clusters, including bundled resource provisioning and job-to-task decomposition with distributed processing. The scheduling problem is formulated as a nonlinear integer program. Its solution is derived by decomposing it into two classes of sub-problems and solving each class with an exact and efficient solution method. As a direct application, we detail the implementation of our proposed scheduling framework on a Hadoop cluster, and demonstrate its efficacy by extensive trace-driven simulations based on the CloudSim simulator.
Keywords:  
Author(s) Name:  Yuxuan Jiang,Zhe Huang and Danny H.K. Tsang
Journal name:  IEEE Transactions on Big Data
Conferrence name:  
Publisher name:  IEEE
DOI:  10.1109/TBDATA.2018.2874663
Volume Information:  June 2020, pp. 412-426, vol. 6
Paper Link:   https://www.computer.org/csdl/journal/bd/2020/02/08485786/14dcDYfu4Qv