Amazing technological breakthrough possible @S-Logix pro@slogix.in

Office Address

  • #5, First Floor, 4th Street Dr. Subbarayan Nagar Kodambakkam, Chennai-600 024 Landmark : Samiyar Madam
  • pro@slogix.in
  • +91- 81240 01111

Social List

A survey of methods, datasets and evaluation metrics for visual question answering - 2021

A survey of methods, datasets and evaluation metrics for visual question answering

Survey paper on methods, datasets and evaluation metrics for visual question answering

Research Area:  Machine Learning

Abstract:

Visual Question Answering (VQA) is a multi-disciplinary research problem that has captured the attention of both computer vision as well as natural language processing researchers. In Visual Question Answering, a system is given an image; a question in a natural language related to that image as an input, and the VQA system is required to give an answer in natural language as an output. A VQA algorithm may require common sense reasoning over the information contained in the image and world knowledge to produce the right answer. In this paper, we have discussed some of the core concepts used in VQA systems and present a comprehensive survey of efforts in the past to address this problem. Apart from traditional VQA models, we have also discussed visual question answering models that require reading texts present in images and evaluated on recently developed datasets like TextVQA, ST-VQA, and OCR-VQA. Apart from standard datasets discussed in previous surveys, we have also discussed some new datasets developed in 2019 and 2020 such as GQA, OK-VQA, TextVQA, ST-VQA, and OCR-VQA. The new evaluation metrics such as BLEU, MPT, METEOR, Average Normalized Levenshtein Similarity (ANLS), Validity, Plausibility, Distribution, Consistency, Grounding, F1-Score are explained together with the evaluation metrics discussed by previous surveys. We conclude our survey with a discussion on open issues in each phase of the VQA task and present some promising future directions.

Keywords:  
Visual question answering
Average Normalized Levenshtein Similarity (ANLS)
Machine Learning
Deep Learning

Author(s) Name:  Himanshu Sharma, Anand Singh Jalal

Journal name:  Image and Vision Computing

Conferrence name:  

Publisher name:  Elsevier

DOI:  10.1016/j.imavis.2021.104327

Volume Information:  Volume 116, December 2021, 104327