List of Topics:
Location Research Breakthrough Possible @S-Logix pro@slogix.in

Office Address

Social List

Glove-Ing Attention: A Multi-Modal Neural Learning Approach to Image Captioning - 2023

glove-ing-attention-approach-to-image-captioning.png

Glove-Ing Attention Approach to Image Captioning | S-Logix

Research Area:  Machine Learning

Abstract:

Articulating pictures using natural language is a complex undertaking within the realm of computer vision. The process of generating image captions involves producing depictions of images which can be achieved through advanced learning frameworks utilizing convolutional neural networks (CNNs) and recurrent neural networks (RNNs). Yet, conventional RNNs face challenges such as gradient explosion and vanishing gradients, resulting in inferior outcomes when producing non-evocative sentences. In this paper, we proposed an encoder-decoder deep neural network to generate image captions using state-of-the-art backbone architecture EfficientNet as the encoder network. We used multimodal gated recurrent units (GrU) for the decoder, which incorporate GloVe word embeddings for the text data and visual attention for the image data. The network is trained on three different datasets, Indiana Chest X-ray, COCO and WIT, and the results are evaluated on the standard performance metrics of BLEU and METEOR. The quantitative results show that the network achieves promising results compared to the state-of-the-art models.

Keywords:  
natural language
computer vision
convolutional neural networks
recurrent neural networks
encoder-decoder
gated recurrent units

Author(s) Name:  Lars Halvor Anundskås, Hina Afridi, Adane Nega Tarekegn, Muhammad Mudassar Yamin, Mohib U

Journal name:  

Conferrence name:  2023 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW)

Publisher name:  IEEE

DOI:  10.1109/ICASSPW59220.2023.10193011

Volume Information:  -