Pathological voice detection using optimized deep residual neural network and explainable artificial intelligence

Voice disorders affect individuals’ vocal quality and communication abilities, which pose significant challenges for both individuals and healthcare providers. The accurate and timely detection of voice disorders is crucial in facilitating early intervention and effective treatment. This study propo...

Full description

Saved in:

Bibliographic Details
Published in	Multimedia tools and applications Vol. 84; no. 19; pp. 21863 - 21889
Main Authors	Jegan, Roohum, Jayagowri, R.
Format	Journal Article
Language	English
Published	New York Springer US 01.06.2025 Springer Nature B.V
Subjects	1239: Emerging Trends and Applications of Deep Learning for Biomedical Data Analysis Accuracy Algorithms Artificial intelligence Artificial neural networks Classification Computer Communication Networks Computer Science Data Structures and Information Theory Deep learning Disorders Effectiveness Explainable artificial intelligence Machine learning Multimedia Information Systems Neural networks Optimization Pathology Special Purpose and Application-Based Systems Speech disorders Speech therapy Time-frequency analysis Visualization Voice Speech pathology detection Snake optimization Voice disorder detection Voice pathology detection Explainable artificial intelligence Optimized deep learning Deep residual network
Online Access	Get full text
ISSN	1573-7721 1380-7501 1573-7721
DOI	10.1007/s11042-024-20348-y

Cover

More Information
Summary:	Voice disorders affect individuals’ vocal quality and communication abilities, which pose significant challenges for both individuals and healthcare providers. The accurate and timely detection of voice disorders is crucial in facilitating early intervention and effective treatment. This study proposes a new noninvasive approach for voice disorder detection based on an optimized deep residual neural network. Input speech samples are transformed into mel-spectrogram time-frequency images and applied to train the ResNet-50 transfer learning model. The spectrogram time-frequency representation effectively captures intricate patterns and features that might indicate the presence of voice disorders exploiting local and global characteristics. Four hyperparameters of the ResNet-50 model are optimized using the snake optimization algorithm, which delivers an optimum residual deep transfer learning (DTL) model with an enhanced voice pathology detection rate. The proposed snake-optimized ResNet-50 model is evaluated on four popular voice pathology datasets: AVPD, SVD, PdA and VOICED. The results demonstrate the efficacy of the optimized ResNet-50 framework in accurately classifying healthy and pathological voice samples with 98.13% accuracy. Comparisons with recent machine learning and deep learning models reveal the superiority of the proposed approach in terms of F1-score, sensitivity, specificity and accuracy. Finally, Gradient-weighted class activation mapping (Grad-CAM) explainable artificial intelligence (XAI) is utilized for visualizing and interpreting the decision-making process.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	1573-7721 1380-7501 1573-7721
DOI:	10.1007/s11042-024-20348-y