Amazing technological breakthrough possible @S-Logix pro@slogix.in

Office Address

  • #5, First Floor, 4th Street Dr. Subbarayan Nagar Kodambakkam, Chennai-600 024 Landmark : Samiyar Madam
  • pro@slogix.in
  • +91- 81240 01111

Social List

Multimodal Emotion Recognition With Temporal and Semantic Consistency - 2021


Multimodal Emotion Recognition With Temporal and Semantic Consistency | S-Logix

Research Area:  Machine Learning

Abstract:

Automated multimodal emotion recognition has become an emerging but challenging research topic in the fields of affective learning and sentiment analysis. The existing works mainly focus on developing multimodal fusion strategies to incorporate different emotion-related features. However, they fail to explore the inherent contextual consistency to reconcile the emotional information across modalities. In this paper, we propose a novel Time and Semantic Interaction Network (TSIN), which concurrently incorporates the advantages of temporal and semantic consistency into the multimodal emotion recognition task. Specifically, a well-designed Speech and Text Embedding (STE) module is devoted to formulating the initial embedding spaces by respectively building the modality-specific representations of speech and text. Instead of separately learning or directly fusing the acoustic and textual features, we propose a well-defined Time and Semantic Interaction (TSI) module to conduct the emotional parsing and sentiment refining by performing the fine-grained temporal alignment and cross-modal semantic interaction. Benefitting from temporal and semantic consistency constraints, both speech-text embeddings can be interactively optimized and fine-tuned in the learning process. In this way, the learnt acoustics and textual features can jointly and efficiently predict the final emotional state. Extensive experiments on the IEMOCAP dataset demonstrate the superiorities of our TSIN framework in comparison with state-of-the-art baselines.

Keywords:  
automated
multimodal
emotion recognition
semantic interaction network
acoustics
textual features

Author(s) Name:  Bingzhi Chen, Qi Cao, Mixiao Hou, Zheng Zhang, Guangming Lu, David Zhang

Journal name:  IEEE/ACM Transactions on Audio, Speech, and Language Processing

Conferrence name:  

Publisher name:  IEEE

DOI:  https://doi.org/10.1109/TASLP.2021.3129331

Volume Information:  Volume 29