The Initial Screening of Laryngeal Tumors via Voice Acoustic Analysis Based on Siamese Network Under Small Samples

SummaryObjectiveThe initial screening of laryngeal tumors via voice acoustic analysis is based on the clinician’s experience that is subjective. This article introduces a Siamese network with an auxiliary gender classifier for automated, accurate, and objective initial screening of laryngeal tumors...

Full description

Saved in:
Bibliographic Details
Published inJournal of voice
Main Authors You, Zhenzhen, Sun, Delong, Shi, Zhenghao, Du, Shuangli, Hei, Xinhong, Kong, Demin, Du, Xiaoying, Yan, Jing, Ren, Xiaoyong, Hou, Jin
Format Journal Article
LanguageEnglish
Published United States Elsevier Inc 02.05.2025
Subjects
Online AccessGet full text
ISSN0892-1997
1873-4588
1873-4588
DOI10.1016/j.jvoice.2025.03.043

Cover

More Information
Summary:SummaryObjectiveThe initial screening of laryngeal tumors via voice acoustic analysis is based on the clinician’s experience that is subjective. This article introduces a Siamese network with an auxiliary gender classifier for automated, accurate, and objective initial screening of laryngeal tumors based on voice signals. MethodsThe study involved 71 tumor patients and 293 non-tumor subjects of Chinese Mandarin. This dataset was divided into a training set and a test set in a ratio of 4:1. We applied nine data augmentation techniques to enlarge the voice training set and extracted the corresponding mel-frequency cepstral coefficients (MFCC) maps. The MFCC maps were randomly paired and fed into the proposed Siamese network to achieve multitask classification for tumor and non-tumor, woman and man. The performance of the proposed model was compared with one machine learning method and six classical deep learning models with and without the auxiliary gender classifier. ResultsExperiments demonstrate the superiority of the proposed network compared with the reference models. The proposed model achieved an overall accuracy of 0.9437, an F score of 0.8462, a precision of 0.9167, a sensitivity of 0.7857, and a specificity of 0.9825. ConclusionThe proposed network can assist in the initial screening of laryngeal tumors through voice acoustic analysis. The initial screening solely through voice acoustic analysis can help individuals seek medical assistance outside the hospitals and reduce the burden on doctors as well.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:0892-1997
1873-4588
1873-4588
DOI:10.1016/j.jvoice.2025.03.043