Amazing technological breakthrough possible @S-Logix pro@slogix.in

Office Address

  • #5, First Floor, 4th Street Dr. Subbarayan Nagar Kodambakkam, Chennai-600 024 Landmark : Samiyar Madam
  • pro@slogix.in
  • +91- 81240 01111

Social List

Research proposal in Ensemble Machine Learning for Big Data Stream Processing

Research proposal in Ensemble Machine Learning for Big Data Stream Processing

   In recent years, the progressive development of sensor technology and communication has enabled the rapid generation of infinite and continuous data referred to as data stream. This stream of data poses a severe challenge for data mining and machine learning methods owing to the evolving nature of size and speed. Thus, the machine learning and computational intelligence community shifted their attention towards the ensemble methods of machine learning, a set of the learning algorithm also referred to as an ensemble system for the data stream analysis. The ensemble methods of the machine learning model intend to ensure the improved accuracy of the prediction model by training the multiple learning algorithm or weak learners to mitigate the variance between the learning algorithm and improve the performance of the automated decision-making system.
   The ensemble methods have been significantly utilized in feature selection, incremental learning, confidence estimation, error correction, missing features, imbalanced data, and the learning of concept drift. It makes the right decision by weighing diverse conceptions and merging them to predict the formerly unseen record of data accurately. Currently, ensemble learning has several real-time applications, including image recognition, data mining, scene segmentation and analysis, object identification and tracking, information retrieval, characterized computer security issue, bankruptcy prediction, credit card fraud detection, species distributions prediction, and many more.
   Despite several developments in data stream mining, there remain many research issues and challenges that are unresolved need to vanish properly. Notably, in the dynamically evolving data stream, relation within the attributes and the target values most probably locally valid induces the complexity in the mining of data stream. Another open problem is tuning the streaming ensembles parameters that require additional attention. Almost most of the streaming ensembles attempt to handle the single-stream alone. Nevertheless, several applications bring the many parallel streams, for instance, studies on censored data and internet messages in which similar data event occurs in disparate time moments and possibly have the various descriptions that pose the challenges.
   In the stream data context, processing of asynchronous arrival of data and delayed information based on the ensembles still leaves the complexity. The works on stream data mining often suffer from the issues of concept drift, imbalance class, the absence of values, confined labeled instances, temporal dependencies, overfeeding, novel classes, and the insufficiency of resources. However, the ensemble learning methods have the capability to handle the large-scale stream data under the concept drift scenario. Even though the detailed features of drifts have still not been persistently studied, developing the ensembles to handle the various categories of drift is a non-trivial task. Also, the suspicious change of stream data induces complexity in the multi-label classification task.