Research Area:  Machine Learning
Recently, there has been growing interest in the field of multimodal dialogue systems. Different from traditional unimodal dialogue systems, our task needs to understand the context of multiple modalities before responding to users’ utterances. In this paper, we present a detailed survey of the recent advances in multimodal dialogue systems and discuss some possible research directions. In particular, we categorize the dialogue systems into two basic tasks, including outputting textual responses and outputting visual responses. In these tasks, there are two main challenges, that is, the heterogeneity gap and the semantic gap. Then, we analyze the key techniques used to solve these challenges. Moreover, we review benchmark datasets and popular evaluation metrics comprehensively. Finally, we give some promising directions for future works.
Keywords:  
multimodal dialogue systems
multimodal representation
multimodal alignment
feature fusion
graph neural networks
Author(s) Name:  Guangya Liu; Shiqi Wang; Jianxing Yu; Jian Yin
Journal name:  
Conferrence name:  2022 5th International Conference on Advanced Electronic Materials, Computers and Software Engineering (AEMCSE)
Publisher name:  IEEE
DOI:  10.1109/AEMCSE55572.2022.00170
Volume Information:  
Paper Link:   https://ieeexplore.ieee.org/abstract/document/9948337