Datasets and evaluation metrics for visual question

A survey of methods, datasets and evaluation metrics for visual question answering - 2021

Survey paper on methods, datasets and evaluation metrics for visual question answering

Research Area: Machine Learning

Abstract:

Visual Question Answering (VQA) is a multi-disciplinary research problem that has captured the attention of both computer vision as well as natural language processing researchers. In Visual Question Answering, a system is given an image; a question in a natural language related to that image as an input, and the VQA system is required to give an answer in natural language as an output. A VQA algorithm may require common sense reasoning over the information contained in the image and world knowledge to produce the right answer. In this paper, we have discussed some of the core concepts used in VQA systems and present a comprehensive survey of efforts in the past to address this problem. Apart from traditional VQA models, we have also discussed visual question answering models that require reading texts present in images and evaluated on recently developed datasets like TextVQA, ST-VQA, and OCR-VQA. Apart from standard datasets discussed in previous surveys, we have also discussed some new datasets developed in 2019 and 2020 such as GQA, OK-VQA, TextVQA, ST-VQA, and OCR-VQA. The new evaluation metrics such as BLEU, MPT, METEOR, Average Normalized Levenshtein Similarity (ANLS), Validity, Plausibility, Distribution, Consistency, Grounding, F1-Score are explained together with the evaluation metrics discussed by previous surveys. We conclude our survey with a discussion on open issues in each phase of the VQA task and present some promising future directions.

Keywords:
Visual question answering
Average Normalized Levenshtein Similarity (ANLS)
Machine Learning
Deep Learning

Author(s) Name: Himanshu Sharma, Anand Singh Jalal

Journal name: Image and Vision Computing

Conferrence name:

Publisher name: Elsevier

DOI: 10.1016/j.imavis.2021.104327

Volume Information: Volume 116, December 2021, 104327

Paper Link: https://www.sciencedirect.com/science/article/abs/pii/S0262885621002328

Office Address

Social List

A survey of methods, datasets and evaluation metrics for visual question answering - 2021

Survey paper on methods, datasets and evaluation metrics for visual question answering

Abstract:

S-Logix (OPC) Private Limited

Office Address

A survey of methods, datasets and evaluation metrics for visual question answering - 2021

Survey paper on methods, datasets and evaluation metrics for visual question answering

Abstract:

Related Papers