Research Area:  Cloud Computing
Among multiple fault-tolerant strategies, resubmission and replication are fundamental and widely recognized in distributed computing systems. In recent years, many algorithms based on replication or resubmission have been proposed. However, few of them consider these two techniques together, especially in Cloud systems. In this paper, we propose a Hybrid Fault-Tolerant Scheduling Algorithm (HFTSA) for independent tasks with deadlines by integrating the above techniques in virtualized Cloud systems. During the task scheduling process, HFTSA selects fault-tolerant strategies from resubmission and replication for each accepted task based on the characteristics of both task and Cloud resources and then reserves suitable resources. During the task execution process, HFTSA adopts an online adjustment scheme for fault-tolerant strategies of some tasks if necessary while providing an online scheduling scheme for faults. Moreover, an elastic resource provisioning mechanism is designed and incorporated into HFTSA to dynamically adjust the provided resources to improve resource utilization. Experiments on a real cloud platform and a simulated platform are conducted to verify the effectiveness of the proposed HFTSA. The results demonstrate that HFTSA can provide an efficient fault-tolerant scheduling strategy for deadline-constrained tasks with high resource utilization and performs better than corresponding competitors.
Author(s) Name:  Guangshun Yao; Qian Ren; Xiaoping Li; Shenghui Zhao; Rubén Ruiz
Journal name:  IEEE Transactions on Services Computing
Publisher name:  IEEE
Volume Information:  Page(s): 1 - 1
Paper Link:   https://ieeexplore.ieee.org/document/9089356