Research Area:  Big Data
Distributed stream big data analytics platforms have emerged to tackle the continuously generated data streams. In stream big data analytics, the data processing workflow is abstracted as a directed graph referred to as a topology. Data are read from the storage and processed tuple by tuple, and these processing results are updated dynamically. The performance of a topology is evaluated by its throughput. This paper proposes an efficient resource allocation scheme for a heterogeneous stream big data analytics cluster shared by multiple topologies, in order to achieve max-min fairness in the utilities of the throughput for all the topologies. We first formulate a novel resource allocation problem, which is a mixed 0-1 integer program. The NP-hardness of the problem is rigorously proven. To tackle this problem, we transform the non-convex constraint to several linear constraints using linearization and reformulation techniques. Based on the analysis of the problem-specific structure and characteristics, we propose an approach that iteratively solves the continuous problem with a fixed set of discrete variables optimally, and updates the discrete variables heuristically. Simulations show that our proposed resource allocation scheme remarkably improves the max-min fairness in utilities of the topology throughput, and is low in computational complexity.
Keywords:  
Author(s) Name:  Yuxuan Jiang,Zhe Huang and Danny H.K. Tsang
Journal name:  IEEE Transactions on Big Data
Conferrence name:  
Publisher name:  IEEE
DOI:  10.1109/TBDATA.2016.2638860
Volume Information:  March 2018, pp. 130-137, vol. 4
Paper Link:   https://www.computer.org/csdl/journal/bd/2018/01/07782311/13rRUwciPeF