Research Area:  Machine Learning
Arabic text categorization is an important task in text mining particularly with the fast-increasing quantity of the Arabic online data. Deep neural network models have shown promising performance and indicated great data modeling capacities in managing large and substantial datasets. This article investigates convolution neural networks (CNNs), long short-term memory (LSTM) and their combination for Arabic text categorization. This work additionally handles the morphological variety of Arabic words by exploring the word embeddings model using position weights and subword information. To guarantee the nearest vector representations for connected words, this article adopts a strategy for refining Arabic vector space representations using semantic information embedded in lexical resources. Several experiments utilizing different architectures have been conducted on the OSAC dataset. The obtained results show the effectiveness of CNN-LSTM without and with retrofitting for Arabic text categorization in comparison with major competing methods.
Author(s) Name:  Fatima-Zahra El-Alami, Said Ouatik El Alaoui, Noureddine Ennahnahi
Journal name:   International Journal of Intelligent Information Technologies (IJIIT)
Publisher name:  IGI Global
Volume Information:  Volume 16,Issue (2)