List of Topics:
Location Research Breakthrough Possible @S-Logix pro@slogix.in

Office Address

Social List

Reference-based model using multimodal gated recurrent units for image captioning - 2020


Reference-based model using multimodal GRU for image captioning | S-Logix

Research Area:  Machine Learning

Abstract:

Describing images through natural language is a challenging task in the field of computer vision. Image captioning consists of creating image descriptions that can be accomplished via deep learning architectures that use convolutional neural networks (CNNs) and recurrent neural networks (RNNs). However, traditional RNNs encounter problems such as exploding and vanishing gradients, and they exhibit poor performance when generating non-descriptive sentences. To solve these issues, we proposed a model based on the encoder–decoder structure using CNNs to extract the image features and multimodal gated recurrent units (GRU) for descriptions. This model implements the part-of-speech (PoS) and likelihood function for weight generation in the GRU. The method performs knowledge transfer during a validation phase that uses the k-nearest neighbors technique (kNN). Experimental results using the Flickr30k and MSCOCO datasets demonstrated that the proposed PoS-based model presents competitive scores in comparison to state-of-the-art models. The system predicts more descriptive captions and closely approximates the expected captions both in the predicted and kNN selected captions.

Keywords:  
natural language
image captioning
convolutional neural network
recurrent neural networks
captions

Author(s) Name:  Tiago do Carmo Nogueira, Cassio Dener Noronha Vinhal, Gelson da Cruz Junior, Matheus Rudolfo Diedrich Ullmann

Journal name:  Multimedia Tools and Applications Article

Conferrence name:  

Publisher name:  Springer

DOI:  10.1007/s11042-020-09539-5

Volume Information:  Volume 79