Research Area:  Big Data
The increased use of cyber-enabled systems and Internet-of-Things (IoT) led to a massive amount of data with different structures. Most big data solutions are built on top of the Hadoop eco-system or use its distributed file system (HDFS). However, studies have shown inefficiency in such systems when dealing with todays data. Some research overcame these problems for specific types of graph data, but todays data are more than one type of data. Such efficiency issues may lead to large-scale problems, including larger space requirements in data centers, and waste in resources (like power consumption), that in turn lead to environmental problems (such as more carbon emission) [1] , as per scholars. We propose a data-aware module for the Hadoop eco-system. We also propose a distributed encoding technique for genetic algorithms efficient data processing. Our framework allows Hadoop to manage the distribution of data and its placement based on cluster analysis of the data itself. We are able to handle a broad range of data types as well as optimize query time and resource usage. We performed experiments on multiple datasets generated via LUBM (Lehigh University Benchmark) and reported results along with performance analysis.
Keywords:  
Author(s) Name:  Mustafa Hajeer and Dipankar Dasgupta
Journal name:  IEEE Transactions on Big Data
Conferrence name:  
Publisher name:  IEEE
DOI:  10.1109/TBDATA.2017.2782785
Volume Information:  June 2019, pp. 134-147, vol. 5
Paper Link:   https://www.computer.org/csdl/journal/bd/2019/02/08197362/13rRUILtJt4