Image Caption Generation-Deep Learning and Multimodal Attention

Automatic Image Caption Generation using Deep Learning and Multimodal Attention - 2022

Research Paper on Automatic Image Caption Generation Using Deep Learning And Multimodal Attention

Research Area: Machine Learning

Abstract:

We present an improved image caption generation model that incorporating multimodal attention mechanism. We use ResNet-101 to extract image features while incorporating channel attention mechanism and spatial attention mechanism. We use Faster R-CNN for object detection and use a multi-head attention structure consisting of spatial attention and self-attention. This allows our algorithm to improve the model-s capability to learn and use the internal grammatical features of natural sentences. Moreover, we use GPU parallel computing to accelerate the entire model training. We apply our model and algorithm to early education scenarios: show and tell for kids. We compare our algorithm with the state-of-the-art deep learning algorithms. Our experimental results show that our model improves the captioning accuracy in terms of standard automatic evaluation metrics.

Keywords:
Automatic Image Caption Generation
Deep Learning
Multimodal Attention
object detection

Author(s) Name: Jin Dai,Xinyu Zhang

Journal name: Computer Animation and Virtual Worlds

Conferrence name:

Publisher name: Wiley

DOI: 10.1002/cav.2072

Volume Information: Volume33, Issue3-4

Paper Link: https://onlinelibrary.wiley.com/doi/abs/10.1002/cav.2072

Office Address

Social List

Automatic Image Caption Generation using Deep Learning and Multimodal Attention - 2022

Research Paper on Automatic Image Caption Generation Using Deep Learning And Multimodal Attention

Abstract:

S-Logix (OPC) Private Limited

Office Address

Automatic Image Caption Generation using Deep Learning and Multimodal Attention - 2022

Research Paper on Automatic Image Caption Generation Using Deep Learning And Multimodal Attention

Abstract:

Related Papers