Amazing technological breakthrough possible @S-Logix pro@slogix.in

Office Address

  • #5, First Floor, 4th Street Dr. Subbarayan Nagar Kodambakkam, Chennai-600 024 Landmark : Samiyar Madam
  • pro@slogix.in
  • +91- 81240 01111

Social List

Transformer with sparse self-attention mechanism for image captioning - 2020


Sparse self-attention mechanism for image captioning | S-Logix

Research Area:  Machine Learning

Abstract:

Recently, transformer has been applied to the image caption model, in which the convolutional neural network and the transformer encoder act as the image encoder of the model, and the transformer decoder acts as the decoder of the model. However, transformer may suffer from the interference of non-critical objects of a scene and meet with difficulty to fully capture image information due to its self-attention mechanism dense characteristics. In this Letter, in order to address this issue, the authors propose a novel transformer model with decreasing attention gates and attention fusion module. Specifically, they firstly use attention gate to force transformer to overcome the interference of non-critical objects and capture objects information more efficiently via truncating all the attention weights that smaller than gate threshold. Secondly, through inheriting attentional matrix from the previous layer of each network layer, the attention fusion module enables each network layer to consider other objects without losing the most critical ones. Their method is evaluated using the benchmark Microsoft COCO dataset and achieves better performance compared to the state-of-the-art methods.

Keywords:  
transformer
convolutional neural network
decoder
image information
gate threshold
microsoft
COCO dataset

Author(s) Name:  Duofeng Wang, Haifeng Hu, Dihu Chen

Journal name:  Electronic Letters

Conferrence name:  

Publisher name:  Wiley

DOI:  https://doi.org/10.1049/el.2020.0635

Volume Information:  Volume 56