Research Area:  Machine Learning
Newspapers introduce us to the latest happenings around the world. Going paperless creates more opportunities for newspapers, like broadcasting news coverage and presenting breaking news conveniently. News headlines are considered under the short text category and are vibrant subjects for researchers. Creating a dense vector from short texts has become a challenging and essential task in many applications such as recommender systems, context analysis, decision making, text classification, etc. This work not only targeted creating a classification model for the short text but also categorized the headlines with the ‘unknown’ category. Our work uses Bidirectional Encoder Representations from Transformers (BERT), cosine similarity index, word embedding, and Long Short-Term Memory (LSTM) network to classify news headlines in multiple categories. Our proposed method outperforms labeling the unlabeled data with the help of a BERT sentence encoder. The system uses LSTM to learn the headlines as input vectors and classify the headline text by the classifier. At the end of this experiment, the designed pipeline achieves remarkable precision at the class level.
Keywords:  
Author(s) Name:  Madhusmita Khuntia , Deepa Gupta
Journal name:  Procedia Computer Science
Conferrence name:  
Publisher name:  ScienceDirect
DOI:  10.1016/j.procs.2023.01.070
Volume Information:  Volume 218, Pages 899-907, (2023)
Paper Link:   https://www.sciencedirect.com/science/article/pii/S1877050923000704