Deep Learning for Mental Disorders Prediction from Audio Speech

Multimodal Deep Learning for Mental Disorders Prediction from Audio Speech Samples - 2019

Research Area: Machine Learning

Abstract:

Key features of mental illnesses are reflected in speech. Our research focuses on designing a multimodal deep learning structure that automatically extracts salient features from recorded speech samples for predicting various mental disorders including depression, bipolar, and schizophrenia. We adopt a variety of pre-trained models to extract embeddings from both audio and text segments. We use several state-of-the-art embedding techniques including BERT, FastText, and Doc2VecC for the text representation learning and WaveNet and VGG-ish models for audio encoding. We also leverage huge auxiliary emotion-labeled text and audio corpora to train emotion-specific embeddings and use transfer learning in order to address the problem of insufficient annotated multimodal data available. All these embeddings are then combined into a joint representation in a multimodal fusion layer and finally a recurrent neural network is used to predict the mental disorder. Our results show that mental disorders can be predicted with acceptable accuracy through multimodal analysis of clinical interviews.

Keywords:

Author(s) Name: Habibeh Naderi, Behrouz Haji Soleimani, Stan Matwin

Journal name: Computer Science

Conferrence name:

Publisher name: arXiv:1909.01067

DOI: 10.48550/arXiv.1909.01067

Volume Information:

Paper Link: https://arxiv.org/abs/1909.01067

Office Address

Social List

Multimodal Deep Learning for Mental Disorders Prediction from Audio Speech Samples - 2019

Abstract:

S-Logix (OPC) Private Limited

Office Address

Multimodal Deep Learning for Mental Disorders Prediction from Audio Speech Samples - 2019

Abstract:

Related Papers