Multimodal Fusion with Co-attention Mechanism

Multimodal Fusion with Co-attention Mechanism - 2020

Research paper on Multimodal Fusion with Co-attention Mechanism

Research Area: Machine Learning

Abstract:

Because the information from different modalities will complement each other when describing the same contents, multimodal information can be used to obtain better feature representations. Thus, how to represent and fuse the relevant information has become a current research topic. At present, most of the existing feature fusion methods consider the different levels of features representations, but they ignore the significant relevance between the local regions, especially in the high-level semantic representation. In this paper, a general multimodal fusion method based on the co-attention mechanism is proposed, which is similar to the transformer structure. We discuss two main issues: (1) Improving the applicability and generality of the transformer to different modal data; (2) By capturing and transmitting the relevant information between local features before fusion, the proposed method can allow for more robustness. We evaluate our model on the multimodal classification task, and the experiments demonstrate that our model can learn fused featnre representation effectively.

Keywords:
Multimodal feature fusion
Co-attention mechanism
Transformer
Deep neural network
Machine Learning

Author(s) Name: Pei Li; Xinde Li

Journal name:

Conferrence name: 2020 IEEE 23rd International Conference on Information Fusion (FUSION)

Publisher name: IEEE

DOI: 10.23919/FUSION45008.2020.9190483

Volume Information:

Paper Link: https://ieeexplore.ieee.org/abstract/document/9190483

Office Address

Social List