Research Area:  Machine Learning
Visual question answering stands among the most researched computer vision problems, pattern recognition, and natural language processing. VQA extends the computer vision world’s challenges and directs us toward developing some basic reasonings on visual scenes to answer questions on the specific elements, actions, and relationships between different objects in the image. Developing reasonings on the image has always been popular among computer vision and natural language processing researchers. It is directly dependent on the expressivity of the representations learned from the datasets. In the past decade, with advancements in computing machinery, neural networks, and the introduction of highly optimized and efficient software, a substantial amount of research has been done to solve VQA efficiently. In this survey, we present an in-depth examination of representation learning of state-of-the-art methods proposed in the literature of VQA and compare them to discuss the future directions in the field.
Keywords:  
Computer vision
Visual Question Answering
Natural language processing
Representation learning
Author(s) Name:  Manish Sahani, Priyadarshan Singh, Sachin Jangpangi & Shailender Kumar
Journal name:  
Conferrence name:  International Conference on Machine Learning and Big Data Analytics
Publisher name:  Springer
DOI:  10.1007/978-3-030-82469-3_29
Volume Information:  
Paper Link:   https://link.springer.com/chapter/10.1007/978-3-030-82469-3_29