A deep learning-based bilingual Hindi and Punjabi named entity

Research Area: Machine Learning

Abstract:

The increasing availability of information on the web makes the task of named entity recognition (NER) more challenging. Named entity recognition is an important pre-processor tool that is concerned with the extraction of entities of our interest such as person, location, organization, gene, protein, number, measurement, etc. The success of earlier named entity recognition systems is highly dependent on rule-based techniques or traditional machine learning algorithms exploiting several linguistic and non-linguistic features. In this article, we propose a novel named entity recognition (NER) system that involves the use of deep learning strategies as well as an enhanced version of word embeddings. We develop a Bidirectional Gated Recurrent Unit (Bi-GRU) and Convolutional Neural Networks (CNN) based bilingual named entity recognition system which is built upon enhanced word embeddings (EWE). Enhanced word embeddings (EWE) are generated by concatenation of FastText word embeddings along with minimal feature embeddings, namely part of speech embeddings, word prefix embeddings, word suffix embeddings, and word length embeddings which improve the computational power of deep learning methods. We perform several experiments using corpora in two different languages. One is IJCNLP-08 NERSSEAL shared task corpora containing annotated dataset in Hindi language and the other is manually annotated dataset in Punjabi language. We also make several experiments on bilingual Hindi and Punjabi dataset. The results of the experiments performed in this work reveal that the Bidirectional GRU and CNN based model along with enhanced word embeddings (EWE) has excelled with Precision, Recall, and F-score value of 92.60%, 90.70%, 91.64% respectively for Hindi, 93.87%, 93.33%, 93.60% respectively for Punjabi and 93.78%, 92.66%, 93.22% respectively for bilingual Hindi and Punjabi named entity recognition. Enhanced word embeddings accelerate the performance of a Bi-GRU and CNN based named entity recognition system without using a large set of features and any sort of gazetteers.

Keywords:

Author(s) Name: Archana Goyal, Vishal Gupta, Manish Kumar

Journal name: Knowledge-Based Systems

Conferrence name:

Publisher name: Elsevier

DOI: 10.1016/j.knosys.2021.107601

Volume Information: Volume 234, 25 December 2021, 107601

Paper Link: https://www.sciencedirect.com/science/article/abs/pii/S0950705121008637

Office Address

Social List