Visual question answering via Attention-based syntactic

Visual question answering via Attention-based syntactic structure tree-LSTM - 2019

Research paper on Visual question answering via Attention-based syntactic structure tree-LSTM

Research Area: Machine Learning

Abstract:

Due to the various patterns of the image and free-form language of the question, the performance of Visual Question Answering (VQA) still lags behind satisfaction. Existing approaches mainly infer answers from the low-level features and sequential question words, which neglects the syntactic structure information of the question sentence and its correlation with the spatial structure of the image. To address these problems, we propose a novel VQA model, i.e., Attention-based Syntactic Structure Tree-LSTM (ASST-LSTM). Specifically, a tree-structured LSTM is used to encode the syntactic structure of the question sentence. A spatial-semantic attention model is proposed to learn the visual-textual correlation and the alignment between image regions and question words. In the attention model, Siamese network is employed to explore the alignment between visual and textual contents. Then, the tree-structured LSTM and the spatial-semantic attention model are integrated with a joint deep model, in which the multi-task learning method is used to train the model for answer inferring. Experiments conducted on three widely used VQA benchmark datasets demonstrate the superiority of the proposed model compared with state-of-the-art approaches.

Keywords:
Visual question answering
Attention model
Siamese network
Attention-based Syntactic Structure Tree-LSTM
Deep Learning

Author(s) Name: Yun Liu, Xiaoming Zhang, Feiran Huang, Xianghong Tang, Zhoujun Li

Journal name: Applied Soft Computing

Conferrence name:

Publisher name: Elsevier

DOI: 10.1016/j.asoc.2019.105584

Volume Information: Volume 82, September 2019, 105584

Paper Link: https://www.sciencedirect.com/science/article/abs/pii/S1568494619303643

Office Address

Social List