Handling Big Data Using a Data-Aware HDFS - Python Projects

Research Area: Big Data

Abstract:

The increased use of cyber-enabled systems and Internet-of-Things (IoT) led to a massive amount of data with different structures. Most big data solutions are built on top of the Hadoop eco-system or use its distributed file system (HDFS). However, studies have shown inefficiency in such systems when dealing with todays data. Some research overcame these problems for specific types of graph data, but todays data are more than one type of data. Such efficiency issues may lead to large-scale problems, including larger space requirements in data centers, and waste in resources (like power consumption), that in turn lead to environmental problems (such as more carbon emission) [1] , as per scholars. We propose a data-aware module for the Hadoop eco-system. We also propose a distributed encoding technique for genetic algorithms efficient data processing. Our framework allows Hadoop to manage the distribution of data and its placement based on cluster analysis of the data itself. We are able to handle a broad range of data types as well as optimize query time and resource usage. We performed experiments on multiple datasets generated via LUBM (Lehigh University Benchmark) and reported results along with performance analysis.

Keywords:

Author(s) Name: Mustafa Hajeer and Dipankar Dasgupta

Journal name: IEEE Transactions on Big Data

Conferrence name:

Publisher name: IEEE

DOI: 10.1109/TBDATA.2017.2782785

Volume Information: June 2019, pp. 134-147, vol. 5

Paper Link: https://www.computer.org/csdl/journal/bd/2019/02/08197362/13rRUILtJt4

Office Address

Social List

Handling Big Data Using a Data-Aware HDFS and Evolutionary Clustering Technique - 2018

Abstract:

S-Logix (OPC) Private Limited

Office Address

Handling Big Data Using a Data-Aware HDFS and Evolutionary Clustering Technique - 2018

Abstract:

Related Papers