Research Area:  Machine Learning
Depression is viewed as the largest contributor to global disability and a major reason for suicide. It has an impact on the language usage reflected in the written text. The key objective of our study is to examine Reddit users posts to detect any factors that may reveal the depression attitudes of relevant online users. For such purpose, we employ the Natural Language Processing (NLP) techniques and machine learning approaches to train the data and evaluate the efficiency of our proposed method. We identify a lexicon of terms that are more common among depressed accounts. The results show that our proposed method can significantly improve performance accuracy. The best single feature is bigram with the Support Vector Machine (SVM) classifier to detect depression with 80% accuracy and 0.80 F1 scores. The strength and effectiveness of the combined features (LIWC+LDA+bigram) are most successfully demonstrated with the Multilayer Perceptron (MLP) classifier resulting in the top performance for depression detection reaching 91% accuracy and 0.93 F1 scores. According to our study, better performance improvement can be achieved by proper feature selections and their multiple feature combinations.
Keywords:  
Detection Of Depression
Reddit Social Media
Natural Language Processing
machine learning
Support Vector Machine
Multilayer Perceptron
Author(s) Name:  Michael M. Tadesse; Hongfei Lin; Bo Xu; Liang Yang
Journal name:  IEEE Access
Conferrence name:  
Publisher name:  IEEE
DOI:  10.1109/ACCESS.2019.2909180
Volume Information:  ( Volume: 7) Page(s): 44883 - 44893
Paper Link:   https://ieeexplore.ieee.org/abstract/document/8681445