Research Area:  Cloud Computing
Distributed storage systems, e.g., Hadoop Distributed File System (HDFS), have been widely used in datacenters for handling large amounts of data due to their excellent performance in terms of fault tolerance, reliability and scalability. However, these storage systems usually adopt the same replication and storage strategy to guarantee data availability, i.e., creating the same number of replicas for all data sets and randomly storing them across data nodes. Such strategies do not fully consider the difference requirements of data availability on different data sets. More servers than necessary should thus be used to store replicas of rarely-used data, which will lead to increased energy consumption. To address this issue, we propose an energy-efficient storage strategy for cloud datacenters based on a novel hypergraph coverage model. According to users requirements of data availability in different applications, our proposed algorithm can selectively determine the corresponding minimum hyperedge coverage, which represents the minimum set of data nodes required in the datacenter. Hence, some other data nodes can be turned off for the purpose of energy saving. We have also implemented our proposed algorithm as a dynamic runtime strategy in a HDFS based prototype datacenter for performance evaluation. Experimental results show that the variable hypergraph coverage based strategy can not only reduce energy consumption, but can also improve the network performance in the datacenter.
Author(s) Name:  Ting Yang; Haibo Pen; Wei Li; Dong Yuan and Albert Y. Zomaya
Journal name:   IEEE Transactions on Parallel and Distributed Systems
Publisher name:  IEEE
Volume Information:  Volume: 28, Issue: 12, Dec. 1 2017,Page(s): 3344 - 3355
Paper Link:   https://ieeexplore.ieee.org/document/7968334