Indian news headlines classification using word embedding

Indian news headlines classification using word embedding techniques and lstm model - 2023

Research Paper On Indian news headlines classification using word embedding techniques and lstm model

Research Area: Machine Learning

Abstract:

Newspapers introduce us to the latest happenings around the world. Going paperless creates more opportunities for newspapers, like broadcasting news coverage and presenting breaking news conveniently. News headlines are considered under the short text category and are vibrant subjects for researchers. Creating a dense vector from short texts has become a challenging and essential task in many applications such as recommender systems, context analysis, decision making, text classification, etc. This work not only targeted creating a classification model for the short text but also categorized the headlines with the ‘unknown’ category. Our work uses Bidirectional Encoder Representations from Transformers (BERT), cosine similarity index, word embedding, and Long Short-Term Memory (LSTM) network to classify news headlines in multiple categories. Our proposed method outperforms labeling the unlabeled data with the help of a BERT sentence encoder. The system uses LSTM to learn the headlines as input vectors and classify the headline text by the classifier. At the end of this experiment, the designed pipeline achieves remarkable precision at the class level.

Keywords:

Author(s) Name: Madhusmita Khuntia , Deepa Gupta

Journal name: Procedia Computer Science

Conferrence name:

Publisher name: ScienceDirect

DOI: 10.1016/j.procs.2023.01.070

Volume Information: Volume 218, Pages 899-907, (2023)

Paper Link: https://www.sciencedirect.com/science/article/pii/S1877050923000704

Office Address

Social List