Research Area:  Machine Learning
In recent years, we have witnessed profound changes in the way people satisfy their information needs. For instance, with the ubiquitous 24/7 availability of mobile devices, the number of search engine queries on mobile devices has reportedly overtaken that of queries on regular personal computers. In this paper, we consider the task of multimodal question answering over structured data, in which a user supplies not just a natural language query but also an image. Our system addresses this by optimizing a non-convex objective function capturing multimodal constraints. Our experiments show that this enables it to answer even very challenging ambiguous entity queries with high accuracy.
Keywords:  
information
multimodal question answering
objective function
ambiguous entity queries
high accuracy
Author(s) Name:  Huadong Li, Yafang Wang, Gerard de Melo, Changhe Tu, Baoquan Chen
Journal name:  
Conferrence name:  Proceedings of the 26th International Conference on World Wide Web Companion
Publisher name:  ACM
DOI:  https://doi.org/10.1145/3041021.3054135
Volume Information:  -
Paper Link:   https://dl.acm.org/doi/abs/10.1145/3041021.3054135