Multimodal Dialogue Systems via Capturing Context-aware

Multimodal Dialogue Systems via Capturing Context-aware Dependencies of Semantic Elements - 2020

Research paper on Multimodal Dialogue Systems via Capturing Context-aware Dependencies of Semantic Elements

Research Area: Machine Learning

Abstract:

Recently, multimodal dialogue systems have engaged increasing attention in several domains such as retail, travel, etc. In spite of the promising performance of pioneer works, existing studies usually focus on utterance-level semantic representations with hierarchical structures, which ignore the context-aware dependencies of multimodal semantic elements, i.e., words and images. Moreover, when integrating the visual content, they only consider images of the current turn, leaving out ones of previous turns as well as their ordinal information. To address these issues, we propose a Multimodal diAlogue systems with semanTic Elements, MATE for short. Specifically, we unfold the multimodal inputs and devise a Multimodal Element-level Encoder to obtain the semantic representation at element-level. Besides, we take into consideration all images that might be relevant to the current turn and inject the sequential characteristics of images through position encoding. Finally, we make comprehensive experiments on a public multimodal dialogue dataset in the retail domain, and improve the BLUE-4 score by 9.49, and NIST score by 1.8469 compared with state-of-the-art methods.

Keywords:
Multimodal Dialogue Systems
Capturing
Context-aware
Dependencies
Semantic Elements

Author(s) Name: Weidong He , Zhi Li , Dongcai Lu , Enhong Chen , Tong Xu , Baoxing Huai , Jing Yuan

Journal name:

Conferrence name: MM -20: Proceedings of the 28th ACM International Conference on Multimedia

Publisher name: ACM

DOI: 10.1145/3394171.3413679

Volume Information: Pages 2755–2764

Paper Link: https://dl.acm.org/doi/abs/10.1145/3394171.3413679

Office Address

Social List