Multi-Modal Depression Detection Based On Emotional Audio And Evaluation Text

Research Area:  Machine Learning


Early detection of depression is very important for the treatment of patients. In view of the current inefficient screening methods for depression, the research of depression identification technology is a complex problem with application value.Our research propose a new experimental method for depression detection based on audio and text. 160 Chinese subjects are investigated in this study. It is worth noting that we propose a text reading experiment to make subjects emotions change rapidly. It will be called Segmental Emotional Speech Experiment (SESE) below. We extract 384-dimensional Low-level audio features to find the differences of different emotional change in SESE. At the same time, our research propose a multi-modal fusion method based on DeepSpectrum features and word vector features to detect depression by using deep learning.Our experiment proved that SESE can improve the recognition accuracy of depression and found differences in Low-level audio features. Case group and Control group, gender and age are grouped for verification. It is also satisfactory that the multi-modal fusion model achieves accuracy of 0.912 and F1 score of 0.906.Our contribution is twofold. First, we propose and verify SESE, which can provide a new experimental idea for the follow-up researchers. Secondly, a new efficient multi-modal depression recognition model is proposed.

Multi-Modal Depression Detection
Emotional Audio
Evaluation Text
Deep Learning
Machine Learning

Author(s) Name:  Jiayu Ye , Yanhong Yu, Qingxiang Wang, Wentao Li, Hu Liang , Yunshao Zheng , Gang Fu

Journal name:  Journal of Affective Disorders

Publisher name:  Elsevier

DOI:  10.1016/j.jad.2021.08.090

Volume Information:  Volume 295, 1 December 2021, Pages 904-913