Machine Learning Early Detection of SARS‐CoV‐2 High‐Risk Variants
The severe acute respiratory syndrome coronavirus 2 (SARS‐CoV‐2) has evolved many high‐risk variants, resulting in repeated COVID‐19 waves over the past years. Therefore, accurate early warning of high‐risk variants is vital for epidemic prevention and control. However, detecting high‐risk variants...
        Saved in:
      
    
          | Published in | Advanced science Vol. 11; no. 45; pp. e2405058 - n/a | 
|---|---|
| Main Authors | , , , , , , , , , | 
| Format | Journal Article | 
| Language | English | 
| Published | 
        Germany
          John Wiley & Sons, Inc
    
        01.12.2024
     John Wiley and Sons Inc Wiley  | 
| Subjects | |
| Online Access | Get full text | 
| ISSN | 2198-3844 2198-3844  | 
| DOI | 10.1002/advs.202405058 | 
Cover
| Summary: | The severe acute respiratory syndrome coronavirus 2 (SARS‐CoV‐2) has evolved many high‐risk variants, resulting in repeated COVID‐19 waves over the past years. Therefore, accurate early warning of high‐risk variants is vital for epidemic prevention and control. However, detecting high‐risk variants through experimental and epidemiological research is time‐consuming and often lags behind the emergence and spread of these variants. In this study, HiRisk‐Detector a machine learning algorithm based on haplotype network, is developed for computationally early detecting high‐risk SARS‐CoV‐2 variants. Leveraging over 7.6 million high‐quality and complete SARS‐CoV‐2 genomes and metadata, the effectiveness, robustness, and generalizability of HiRisk‐Detector are validated. First, HiRisk‐Detector is evaluated on actual empirical data, successfully detecting all 13 high‐risk variants, preceding World Health Organization announcements by 27 days on average. Second, its robustness is tested by reducing sequencing intensity to one‐fourth, noting only a minimal delay of 3.8 days, demonstrating its effectiveness. Third, HiRisk‐Detector is applied to detect risks among SARS‐CoV‐2 Omicron variant sub‐lineages, confirming its broad applicability and high ROC‐AUC and PR‐AUC performance. Overall, HiRisk‐Detector features powerful capacity for early detection of high‐risk variants, bearing great utility for any public emergency caused by infectious diseases or viruses.
This study first validates a correlation between haplotype network features and the risk levels of SARS‐CoV‐2 variants. Building on this, HiRisk‐Detector, a machine learning algorithm, is developed for the early detection of high‐risk variants. The effectiveness, robustness, and generalizability of HiRisk‐Detector are confirmed using over 7.6 million SARS‐CoV‐2 genomes. | 
|---|---|
| Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23  | 
| ISSN: | 2198-3844 2198-3844  | 
| DOI: | 10.1002/advs.202405058 |