Acoustic COVID-19 Detection Using Multiple Instance Learning

In the COVID-19 pandemic, a rigorous testing scheme was crucial. However, tests can be time-consuming and expensive. A machine learning-based diagnostic tool for audio recordings could enable widespread testing at low costs. In order to achieve comparability between such algorithms, the DiCOVA chall...

Full description

Saved in:

Bibliographic Details
Published in	IEEE journal of biomedical and health informatics Vol. 29; no. 1; pp. 620 - 630
Main Authors	Reiter, Michael, Pernkopf, Franz
Format	Journal Article
Language	English
Published	United States IEEE 01.01.2025
Subjects	Algorithms Annotations Antigens audio-based infection prediction Bioinformatics Biological system modeling Costs coswara COVID-19 COVID-19 - diagnosis COVID-19 - physiopathology crowdsourced datasets DiCOVA Feature extraction Humans Labeling Machine Learning multiple instance learning Multiple-Instance Learning Algorithms Pandemics Predictive models SARS-CoV-2 Signal Processing, Computer-Assisted
Online Access	Get full text
ISSN	2168-2194 2168-2208 2168-2208
DOI	10.1109/JBHI.2024.3474975

Cover

More Information
Summary:	In the COVID-19 pandemic, a rigorous testing scheme was crucial. However, tests can be time-consuming and expensive. A machine learning-based diagnostic tool for audio recordings could enable widespread testing at low costs. In order to achieve comparability between such algorithms, the DiCOVA challenge was created. It is based on the Coswara dataset offering the recording categories cough, speech, breath and vowel phonation. Recording durations vary greatly, ranging from one second to over a minute. A base model is pre-trained on random, short time intervals. Subsequently, a Multiple Instance Learning (MIL) model based on self-attention is incorporated to make collective predictions for multiple time segments within each audio recording, taking advantage of longer durations. In order to compete in the fusion category of the DiCOVA challenge, we utilize a linear regression approach among other fusion methods to combine predictions from the most successful models associated with each sound modality. The application of the MIL approach significantly improves generalizability, leading to an AUC ROC score of 86.6% in the fusion category. By incorporating previously unused data, including the sound modality 'sustained vowel phonation' and patient metadata, we were able to significantly improve our previous results reaching a score of 92.2%.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	2168-2194 2168-2208 2168-2208
DOI:	10.1109/JBHI.2024.3474975