Research Area:  Machine Learning
We propose a multi-modal method with a hierarchical recurrent neural structure to integrate vision, audio and text features for depression detection. Such a method contains two hierarchies of bidirectional long short term memories to fuse multi-modal features and predict the severity of depression. An adaptive sample weighting mechanism is introduced to adapt to the diversity of training samples. Experiments on the testing set of a depression detection challenge demonstrate the effectiveness of the proposed method.
Author(s) Name:  Shi Yin , Cong Liang , Heyan Ding , Shangfei Wang
Conferrence name:  AVEC -19: Proceedings of the 9th International on Audio/Visual Emotion Challenge and Workshop
Publisher name:  ACM
Paper Link:   https://dl.acm.org/doi/abs/10.1145/3347320.3357696