Detecting Unipolar and Bipolar Depressive Disorders from Elicited Speech Responses Using Latent Affective Structure Model

Mood disorders, including unipolar depression (UD) and bipolar disorder (BD) <xref ref-type="bibr" rid="ref1">[1] , are reported to be one of the most common mental illnesses in recent years. In diagnostic evaluation on the outpatients with mood disorder, a large portion of...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on affective computing Vol. 11; no. 3; pp. 393 - 404
Main Authors Huang, Kun-Yi, Wu, Chung-Hsien, Su, Ming-Hsiang, Kuo, Yu-Ting
Format Journal Article
LanguageEnglish
Published Piscataway IEEE 01.07.2020
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text
ISSN1949-3045
1949-3045
DOI10.1109/TAFFC.2018.2803178

Cover

More Information
Summary:Mood disorders, including unipolar depression (UD) and bipolar disorder (BD) <xref ref-type="bibr" rid="ref1">[1] , are reported to be one of the most common mental illnesses in recent years. In diagnostic evaluation on the outpatients with mood disorder, a large portion of BD patients are initially misdiagnosed as having UD <xref ref-type="bibr" rid="ref2">[2] . As most previous research focused on long-term monitoring of mood disorders, short-term detection which could be used in early detection and intervention is thus desirable. This work proposes an approach to short-term detection of mood disorder based on the patterns in emotion of elicited speech responses. To the best of our knowledge, there is no database for short-term detection on the discrimination between BD and UD currently. This work collected two databases containing an emotional database (MHMC-EM) collected by the Multimedia Human Machine Communication (MHMC) lab and a mood disorder database (CHI-MEI) collected by the CHI-MEI Medical Center, Taiwan. As the collected CHI-MEI mood disorder database is quite small and emotion annotation is difficult, the MHMC-EM emotional database is selected as a reference database for data adaptation. For the CHI-MEI mood disorder data collection, six eliciting emotional videos are selected and used to elicit the participants' emotions. After watching each of the six eliciting emotional video clips, the participants answer the questions raised by the clinician. The speech responses are then used to construct the CHI-MEI mood disorder database. Hierarchical spectral clustering is used to adapt the collected MHMC-EM emotional database to fit the CHI-MEI mood disorder database for dealing with the data bias problem. The adapted MHMC-EM emotional data are then fed to a denoising autoencoder for bottleneck feature extraction. The bottleneck features are used to construct a long short term memory (LSTM)-based emotion detector for generation of emotion profiles from each speech response. The emotion profiles are then clustered into emotion codewords using the K-means algorithm. Finally, a class-specific latent affective structure model (LASM) is proposed to model the structural relationships among the emotion codewords with respect to six emotional videos for mood disorder detection. Leave-one-group-out cross validation scheme was employed for the evaluation of the proposed class-specific LASM-based approaches. Experimental results show that the proposed class-specific LASM-based method achieved an accuracy of 73.33 percent for mood disorder detection, outperforming the classifiers based on SVM and LSTM.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:1949-3045
1949-3045
DOI:10.1109/TAFFC.2018.2803178