Research Area:  Machine Learning
Multi-Label Classification (MLC) is an extension of the standard single-label classification where each data instance is associated with several labels simultaneously. MLC has gained much importance in recent years due to its wide range of application domains. However, the class imbalance problem has become an inherent characteristic of many multi-label datasets, where the samples and their corresponding labels are non-uniformly distributed over the data space. The imbalanced problem in MLC imposes challenges to multi-label data analytics which can be viewed from three perspectives: imbalance within labels, among labels, and label-sets. In this paper, we provide a review of the approaches for handling the imbalance problem in multi-label data by collecting the existing research work. As the first systematic study of approaches addressing an imbalanced problem in MLC, this paper provides a comprehensive survey of the state-of-the-art methods for imbalanced MLC, including the characteristics of imbalanced multi-label datasets, evaluation measures and comparative analysis of the proposed methods. The study also discusses important results reported so far in the literature and highlights some of their strengths and limitations to guide future research.
Keywords:  
Author(s) Name:  Adane Nega Tarekegn, Mario Giacobini, Krzysztof Michalak
Journal name:  Pattern Recognition
Conferrence name:  
Publisher name:  ELSEVIER
DOI:  https://doi.org/10.1016/j.patcog.2021.107965
Volume Information:  Volume 118, October 2021, 107965
Paper Link:   https://www.sciencedirect.com/science/article/abs/pii/S0031320321001527