Fusion of Image-text attention for Transformer-based Multimodal

Fusion of Image-text attention for Transformer-based Multimodal Machine Translation - 2019

Research paper on Fusion of Image-text attention for Transformer-based Multimodal Machine Translation

Research Area: Machine Learning

Abstract:

In recent years, multimodal machine translation has become one of the hot research topics. In this paper, a machine translation model based on self-attention mechanism is extended for multimodal machine translation. In the model, an Image-text attention layer is added in the end of encoder layer to capture the relevant semantic information between image and text words. With this layer of attention, the model can capture the different weights between the words that is relevant to the image or appear in the image, and get a better text representation that fuses these weights, so that it can be better used for decoding of the model. Experiments are carried out on the original English-German sentence pairs of the multimodal machine translation dataset, Multi30k, and the Indonesian-Chinese sentence pairs which is manually annotated by human. The results show that our model performs better than the text-only transformer-based machine translation model and is comparable to most of the existing work, proves the effectiveness of our model.

Keywords:
Multimodal Machine Translation
Image-text attention
Transformer-based
Self-attention
Machine Learning
Deep Learning

Author(s) Name: Junteng Ma; Shihao Qin; Lan Su; Xia Li; Lixian Xiao

Journal name:

Conferrence name: 2019 International Conference on Asian Language Processing (IALP)

Publisher name: IEEE

DOI: 10.1109/IALP48816.2019.9037732

Volume Information:

Paper Link: https://ieeexplore.ieee.org/abstract/document/9037732

Office Address

Social List