Research Area:  Cloud Computing
Reliability requirement is one of the most important quality of services (QoS) and should be satisfied for a reliable workflow in cloud computing. Primary-backup replication is an important software fault-tolerant technique used to satisfy reliability requirement. Recent works studied quantitative fault-tolerant scheduling to reduce execution cost by minimizing the number of replicas while satisfying the reliability requirement of a workflow on heterogeneous infrastructure as a service (IaaS) clouds. However, a minimum number of replicas does not necessarily lead to the minimum execution cost and shortest schedule length in a heterogeneous IaaS cloud. In this study, we propose the quantitative fault-tolerant scheduling algorithms QFEC and QFEC+ with minimum execution costs and QFSL and QFSL+ with shortest schedule lengths while satisfing the reliability requirements of workflows. Extensive experimental results show that (1) compared with the state-of-the-art algorithms, the proposed algorithms achieve less execution cost and shorter schedule length, although the number of replicas are not minimum; (2) QFEC and QFEC+ are designed to reduce execution cost, and QFEC+ is better than QFEC for all low-parallelism and high-parallelism workflows; and (3) QFSL and QFSL+ are designed to decrease schedule length, and QFSL+ is better than QFSL for all low-parallelism and high-parallelism workflows.
Author(s) Name:  Guoqi Xie; Gang Zeng; Renfa Li and Keqin Li
Journal name:   IEEE Transactions on Cloud Computing
Publisher name:  IEEE
Volume Information:   Volume: 8, Issue: 4, Oct.-Dec. 1 2020,Page(s): 1223 - 1236
Paper Link:   https://ieeexplore.ieee.org/document/8166813