A Comprehensive Survey on Visual Question Answering Debias

A Comprehensive Survey on Visual Question Answering Debias - 2022

Survey paper on Visual Question Answering Debias

Research Area: Machine Learning

Abstract:

With the rise of multi-modal computing, visual question answering (VQA) have attracted wide attention. It takes an image and a question as input, the VQA system can answer the question according to the given image. However, most of models suffer from language prior problem, they excessively rely on superficial linguistic correlations between the questions and answers without considering the image, which is coursed by inherent data bias. For example, for a certain type of question (e.g., How many apples are there on the table?), the system will tend to return the result (e.g., Tow) that appear frequently in the answer space rather than answering question based on facts (e.g., Five) in the image. Therefore, VQA debias becomes especially important for answering the question correctly. There are many methods have been proposed to deal with such a problem. We summarize the existing methods into the following three categories: 1) Data augmentation 2) Weaken language information 3) Enhance image information. They solve existing problems from data perspective and information perspective respectively, aiming to get a higher accuracy and making VQA system more robust.

Keywords:
Multi-modal computing
visual question answering
data debias
system robustness
Data augmentation
Machine Learning

Author(s) Name: Zaiwei Lu

Journal name:

Conferrence name: 2022 IEEE International Conference on Advances in Electrical Engineering and Computer Applications (AEECA)

Publisher name: IEEE

DOI: 10.1109/AEECA55500.2022.9918952

Volume Information:

Paper Link: https://ieeexplore.ieee.org/abstract/document/9918952

Office Address

Social List