Fusing features of speech for depression classification based on higher-order spectral analysis
•Propose a feature fusion method based on the higher-order spectral analysis (HOSA) of bi-spectral features (BSFs) and non-linear bi-coherent features (BCFs).•The HOSA of speech-related features not only has a high accuracy in speech emotion recognition but is also accurate for depression recognitio...
Saved in:
| Published in | Speech communication Vol. 143; pp. 46 - 56 |
|---|---|
| Main Authors | , , , , , |
| Format | Journal Article |
| Language | English |
| Published |
Amsterdam
Elsevier B.V
01.09.2022
Elsevier Science Ltd |
| Subjects | |
| Online Access | Get full text |
| ISSN | 0167-6393 1872-7182 1872-7182 |
| DOI | 10.1016/j.specom.2022.07.006 |
Cover
| Summary: | •Propose a feature fusion method based on the higher-order spectral analysis (HOSA) of bi-spectral features (BSFs) and non-linear bi-coherent features (BCFs).•The HOSA of speech-related features not only has a high accuracy in speech emotion recognition but is also accurate for depression recognition.•Fused features were better than the speech-related features obtained from HOSA in terms of accuracy, and the latter were better than the COVAREP-extracted speech-related features.
Approximately 300 million people worldwide suffer from depression, and more than 60% of psychiatric patients do not have access to mental health services due to the shortage of psychiatrists and the high costs associated with clinical diagnosis and treatment. Correct and efficient diagnosis of depression can help overcome these straits. Automatic detection of depressive symptoms can help improve the accuracy and availability of diagnosis. In this paper, a fusion feature for Bispectral Features and Bicoherent Features by using higher-order spectral analysis. Experiments were performed on the Depression Sub-Challenge Dataset of the Audio/Visual Emotion Challenge 2017. The fusion feature fuses higher-order spectral features and traditional speech features with classification weights greater than 100 extracted by using A Collaborative Voice Analysis Repository. The support vector machine and k-nearest neighbor classification algorithms were used as the traditional machine learning models, and the convolutional neural network was used as the deep learning model to verify the proposed features. The experimental results show that under the support vector machine algorithm, the accuracies of extraction of speech-related features by using a collaborative voice analysis repository, The higher-order spectral analysis, and their fusion features were 63.15%, 68.42%, and 73.68%, respectively. Under the k-nearest neighbor classification algorithms model algorithm, the corresponding accuracies were 68.18%, 72.73%, and 77.27%, respectively. For the convolutional neural network model, the corresponding accuracies were 70%, 77%, and 85%, respectively. The results demonstrate that the fusion feature recognition accuracy is high and can be employed to improve the accuracy of depression identification by using traditional machine learning and deep learning models. |
|---|---|
| Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ISSN: | 0167-6393 1872-7182 1872-7182 |
| DOI: | 10.1016/j.specom.2022.07.006 |