A Reference-Based Model using Deep Learning for Image

A Reference-Based Model using Deep Learning for Image Captioning - 2022

Research Paper on A Reference-Based Model Using Deep Learning For Image Captioning

Research Area: Machine Learning

Abstract:

Describing images in natural language is a challenging task for computer vision. Image captioning is the task of creating image descriptions. Deep learning architectures that use convolutional neural networks (CNNs) and recurrent neural networks (RNNs) are beneficial in this task. However, traditional RNNs may cause problems, including exploding gradients, vanishing gradients, and non-descriptive sentences. To solve these problems, we propose a model based on the encoder–decoder structure, using CNNs to extract features from reference images and gated recurrent units (GRUs) to create the descriptions. Our model applies part-of-speech (PoS) analysis and the likelihood function to generate weights in GRU. This method also performs the knowledge transfer during a validation phase using the k-nearest neighbors (kNN) technique. Our experimental results using Flickr30k and MS-COCO datasets indicate that the proposed PoS-based model yields competitive scores compared to those of high-end models. The system predicts more descriptive captions and closely approximates the expected captions for both the predicted and kNN-selected captions.

Keywords:
Deep Learning
Image Captioning
convolutional neural networks (CNNs)
recurrent neural networks (RNNs)

Author(s) Name: Tiago do Carmo Nogueira, Cássio Dener Noronha Vinhal, Gélson da Cruz Júnior, Matheus Rudolfo Diedrich Ullmann & Thyago Carvalho Marques

Journal name: Multimedia Systems (2022)

Conferrence name:

Publisher name: Springer

DOI: 10.1007/s00530-022-00937-3

Volume Information:

Paper Link: https://link.springer.com/article/10.1007/s00530-022-00937-3

Office Address

Social List